Skip to main content
Log in

Credit risk classification: an integrated predictive accuracy algorithm using artificial and deep neural networks

  • Original Research
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

This study utilizes classification models to provide a robust algorithm for imbalanced data where the minority class is of the interest, that is, in the context of default payments. In developing an integrated predictive accuracy algorithm, this study proposes machine learning classifiers and applies DNN, SVM, KNN, and ANN. The proposed algorithm utilizes a 30,000 imbalanced dataset to improve the accuracy of the prediction of default payments by implementing oversampling and undersampling strategies, such as synthetic minority oversampling technique (SMOTE), SVM SMOTE, random undersampling, and ALL-KNN. The results indicate that the SVM under the ALL-KNN sampling technique is able to achieve an accuracy of 98.6%, with the lowest cross entropy loss measurement of 0.028. Through the accurate implementation of the neural networks and neurons used in the proposed algorithm, this paper presents better insights into the functioning of the neural networks when used in conjunction with the resampling techniques. Using the methodology and algorithm presented in this study, credit risk assessments can be more accurately predicted in practical applications where most of the clients are categorized as non-default payments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • 2019 Global payments trends report—Canada Country Insights. (2019). Retrieved from https://www.jpmorgan.com/merchant-services/insights/reports/Canada

  • Abdelmoula, A. K. (2015). Bank credit risk analysis with k-nearest neighbor classifier: Case of Tunisian banks. Accounting and Management Information Systems/Contabilitate Si Informatica de Gestiune, 14(1), 79–106.

    Google Scholar 

  • Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185.

    Google Scholar 

  • Bahillo, J. A., Ganguly, S., Kremer, A., & Kristensen, I. (2016). The value in digitally transforming credit risk management. Retrieved from https://www.mckinsey.com/business-functions/risk/our-insights/the-value-in-digitally-transforming-credit-risk-management.

  • Basel I: International Convergence of Capital Measurement and Capital Standards (1988). Retrieved from https://www.bis.org/publ/bcbs04a.htm

  • Basel II: International Convergence of Capital Measurement and Capital Standards: A Revised Framework. (2004). Retrieved from https://www.bis.org/publ/bcbs107.htm

  • Basel III: A global regulatory framework for more resilient banks and banking systems—revised version June 2011. (2011). Retrieved from https://www.bis.org/publ/bcbs189.htm

  • Bayraci, S., & Susuz, O. (2019). A Deep Neural Network (DNN) based classification model in application to loan default prediction. Theoretical and Applied Economics, 4, 75–84.

    Google Scholar 

  • Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), 1–3.

    Article  Google Scholar 

  • Canadian Demands for Speed and Convenience Influencing Payments Innovation. (2018). Retrieved from https://www.payments.ca/industry-info/our-research/canadian-demands-speed-and-convenience-influencing-payments-innovation

  • CBA—Credit Card Statistics. (2019). Retrieved from https://cba.ca/credit-card-statistics

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.

    Article  Google Scholar 

  • Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. https://doi.org/10.1109/TIT.1967.1053964

    Article  Google Scholar 

  • Cao, J., Lu, H., Wang, W., & Wang, J. (2013). A loan default discrimination model using cost-sensitive support vector machine improved by PSO. Information Technology and Management, 14(3), 193–204. https://doi.org/10.1007/s10799-013-0161-1

    Article  Google Scholar 

  • Chen, S., Härdle, W. K., & Moro, R. A. (2011). Modeling default risk with support vector machines. Quantitative Finance, 11(1), 135–154. https://doi.org/10.1080/14697680903410015

    Article  Google Scholar 

  • Cimpoeru, S. S. (2011). Neural networks and their application in credit risk assessment. Evidence from the Romanian Market. Technological and Economic Development of Economy, 17(3), 519–534. https://doi.org/10.3846/20294913.2011.606339

    Article  Google Scholar 

  • Danenas, P., & Garsva, G. (2015). Selection of Support Vector Machines based classifiers for credit risk domain. Expert Systems with Applications, 42(6), 3194–3204. https://doi.org/10.1016/j.eswa.2014.12.001

    Article  Google Scholar 

  • Finlay, S. (2015). Multiple classifier architectures and their application to credit risk assessment. European Journal of Operational Research, 210(2), 368–378.

    Article  Google Scholar 

  • Fix, E., & Hodges, Jr., J. L. (1951). Discriminatory analysis, nonparametric discrimination. Retrieved from https://apps.dtic.mil/dtic/tr/fulltext/u2/a800276.pdf

  • Gu, Q., & Han, J. (2013 April). Clustered support vector machines. In Artificial intelligence and statistics (pp. 307–315). PMLR.

  • Hamori, S., Kawai, M., Kume, T., Murakami, Y., & Watanabe, C. (2018). Ensemble learning or deep learning? Application to default risk analysis. Journal of Risk and Financial Management, 11(1), 12. https://doi.org/10.3390/jrfm11010012

    Article  Google Scholar 

  • Harris, T. (2015). Credit scoring using the clustered support vector machine. Expert Systems with Applications, 42(2), 741–750. https://doi.org/10.1016/j.eswa.2014.08.029.

    Article  Google Scholar 

  • Härle, P., Havas, A., & Samandari, H. (2015). The future of bank risk management. Retrieved from https://www.mckinsey.com/business-functions/risk/our-insights/the-future-of-bank-risk-management

  • Haykin, S. S. (1998). Neural networks:Aa comprehensive foundation. Prentice-Hall.

    Google Scholar 

  • Henley, W. E., & Hand, D. J. (1996). A k-nearest-neighbour classifier for assessing consumer credit risk. Journal of the Royal Statistical Society, Series D, 45(1), 77. https://doi.org/10.2307/2348414

    Article  Google Scholar 

  • Kalid, S. N., Ng, K., Tong, G., & Khor, K. (2020). A Multiple classifiers system for anomaly detection in credit card data with unbalanced and overlapped classes. IEEE Access, 8, 28210–28221. https://doi.org/10.1109/ACCESS.2020.2972009

    Article  Google Scholar 

  • Karaa, A., & Krichene, A. (2012). Credit-risk assessment using support vectors machine and multilayer neural network models: A comparative study case of a tunisian bank. Accounting and Management Information Systems/Contabilitate Si Informatica De Gestiune, 11(4), 587–620.

    Google Scholar 

  • Khashman, A. (2010). Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2010.02.101

    Article  Google Scholar 

  • Khemakhem, S., & Boujelbènea, Y. (2015). Credit risk prediction: A comparative study between discriminant analysis and the neural network approach. Accounting and Management Information Systems/Contabilitate Si Informatica De Gestiune, 14(1), 60–78.

    Google Scholar 

  • Kvamme, H., Sellereite, N., Aas, K., & Sjursen, S. (2018). Predicting mortgage default using convolutional neural networks. Expert Systems with Applications, 102, 207–217. https://doi.org/10.1016/j.eswa.2018.02.029

    Article  Google Scholar 

  • Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136.

    Article  Google Scholar 

  • Marinakis, Y., Marinaki, M., Doumpos, M., Matsatsinis, N., & Zopounidis, C. (2008). Optimization of nearest neighbor classifiers via metaheuristic algorithms for credit risk assessment. Journal of Global Optimization, 42(2), 279–293.

    Article  Google Scholar 

  • Martino, A., Rizzi, A., & Frattale Mascioli, F. M. (2019). Efficient approaches for solving the largescale k-medoids problem: Towards structured data. In C. Sabourin, J. J. Merelo, K. Madani, & K. Warwick (Eds.), Computational Intelligence: 9th International Joint Conference, IJCCI 2017 FunchalMadeira, Portugal, November 1–3, 2017 Revised Selected Papers (pp. 199–219). Cham: Springer International Publishing.

    Chapter  Google Scholar 

  • Massaron, L., & Boschetti, A. (2016). Regression analysis with Python. Packt Publishing.

    Google Scholar 

  • McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133.

    Article  Google Scholar 

  • McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in econometrics (pp. 104–142). Academic Press.

    Google Scholar 

  • Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press.

    Google Scholar 

  • Nadkarni, P., & Nadkarni, P. (2016). Core technologies: Data mining and “Big Data”. Clinical Research Computing, 9, 187–204.

    Google Scholar 

  • Oreski, S., Oreski, D., & Oreski, G. (2012). Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment. Expert Systems with Applications, 39(16), 12605–12617. https://doi.org/10.1016/j.eswa.2012.05.023

    Article  Google Scholar 

  • Rao, C., Liu, M., Goh, M., & Wen, J. (2020). 2-stage modified random forest model for credit risk assessment of P2P network lending to “Three Rurals” borrowers. Applied Soft Computing, 95, 106570.

    Article  Google Scholar 

  • Rtayli, N., & Enneya, N. (2020). Selection features and support vector machine for credit card risk identification. Procedia Manufacturing, 46, 941–948. https://doi.org/10.1016/j.promfg.2020.05.012

    Article  Google Scholar 

  • Sariannidis, N., Papadakis, S., Garefalakis, A., Lemonakis, C., & Kyriaki-Argyro, T. (2020). Default avoidance on credit card portfolios using accounting, demographical and exploratory factors: Decision making based on machine learning (ML) techniques. Annals of Operations Research, 294(1), 715–739.

    Article  Google Scholar 

  • Sun, T., & Vasarhelyi, M. A. (2018). Predicting credit card delinquencies: An application of deep neural networks. Intelligent Systems in Accounting, Finance and Management, 25(4), 174–189. https://doi.org/10.1002/isaf.1437

    Article  Google Scholar 

  • Tomek, I. (2007). An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics SMC-6, 6, 448–452. https://doi.org/10.1109/TSMC.1976.4309523

    Article  Google Scholar 

  • Trustorff, J. H., Konrad, P. M., & Leker, J. (2011). Credit risk prediction using support vector machines. Review of Quantitative Finance and Accounting, 36(4), 565–581.

    Article  Google Scholar 

  • Vapnik, V. N. (2000). The nature of statistical learning theory (2nd ed.). Springer.

    Book  Google Scholar 

  • Wang, J., Hedar, A. R., Wang, S., & Ma, J. (2012). Rough set and scatter search metaheuristic based feature selection for credit scoring. Expert Systems with Applications, 39(6), 6123–6128.

    Article  Google Scholar 

  • Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473–2480.

    Article  Google Scholar 

  • Yu, L., Yue, W., Wang, S., & Lai, K. K. (2010). Support vector machine based multiagent ensemble learning for credit risk evaluation. Expert Systems with Applications, 37(2), 1351–1360.

    Article  Google Scholar 

  • Zhu, B., Yang, W., Wang, H., & Yuan, Y. (2018). A hybrid deep learning model for consumer credit scoring. In 2018 international conference on artificial intelligence and big data (ICAIBD) (pp. 205–208). https://doi.org/10.1109/ICAIBD.2018.8396195

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Mahbobi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1: Descriptive statistics of the data set

Appendix 1: Descriptive statistics of the data set

 

Mean

SD

Min

Max

Q1

Median

Q3

Range

IQR

Mode

Skewness

Kurtosis

LIMIT_BAL

167,484

129,748

10,000

1,000,000

50,000

140,000

240,000

990,000

190,000

50,000

0.99

0.54

SEX

1.6037

0.4891

1

2

1

2

2

1

1

2

− 0.42

− 1.82

EDUCATION

1.8531

0.7903

0

6

1

2

2

6

1

2

0.97

2.08

MARRIAGE

1.5519

0.522

0

3

1

2

2

3

1

2

− 0.02

− 1.36

AGE

35.486

9.218

21

79

28

34

41

58

13

29

0.73

0.04

PAY_0

− 0.0167

1.1238

− 2

8

− 1

0

0

10

1

0

0.73

2.72

PAY_2

− 0.13377

1.19719

− 2

8

− 1

0

0

10

1

0

0.79

1.57

PAY_3

− 0.1662

1.19687

− 2

8

− 1

0

0

10

1

0

0.84

2.08

PAY_4

− 0.22067

1.16914

− 2

8

− 1

0

0

10

1

0

1

3.5

PAY_5

− 0.2662

1.13319

− 2

8

− 1

0

0

10

1

0

1.01

3.99

PAY_6

− 0.2911

1.14999

− 2

8

− 1

0

0

10

1

0

0.95

3.43

BILL_AMT1

51,223

73,636

− 165,580

964,511

3558

22,382

67,093

1,130,091

63,535

0

2.66

9.81

BILL_AMT2

49,179

71,174

− 69,777

983,931

2984

21,200

64,011

1,053,708

61,027

0

2.71

10.3

BILL_AMT3

47,013

69,349

− 157,264

1,664,089

2665

20,089

60,166

1,821,353

57,502

0

3.09

19.78

BILL_AMT4

43,263

64,333

− 170,000

891,586

2326

19,052

54,512

1,061,586

52,186

0

2.82

11.31

BILL_AMT5

40,311

60,797

− 81,334

927,171

1763

18,105

50,202

1,008,505

48,439

0

2.88

12.31

BILL_AMT6

38,872

59,554

− 339,603

961,664

1256

17,071

49,203

1,301,267

47,947

0

2.85

12.27

PAY_AMT1

5664

16,563

0

873,552

1000

2100

5006

873,552

4006

0

14.67

415.25

PAY_AMT2

5921

23,041

0

1,684,259

833

2009

5000

1,684,259

4167

0

30.45

1641.63

PAY_AMT3

5226

17,607

0

896,040

390

1800

4505

896,040

4115

0

17.22

564.31

PAY_AMT4

4826

15,666

0

621,000

296

1500

4014

621,000

3718

0

12.9

277.33

PAY_AMT5

4799

15,278

0

426,529

252

1500

4033

426,529

3781

0

11.13

180.06

PAY_AMT6

5216

17,777

0

528,666

117

1500

4000

528,666

3883

0

10.64

167.16

Default payment next month

0.2212

0.41506

0

1

0

0

0

1

0

0

1.34

− 0.2

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahbobi, M., Kimiagari, S. & Vasudevan, M. Credit risk classification: an integrated predictive accuracy algorithm using artificial and deep neural networks. Ann Oper Res 330, 609–637 (2023). https://doi.org/10.1007/s10479-021-04114-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-021-04114-z

Keywords

Navigation