Credit Default Risk Analysis Using Machine Learning Algorithms with Hyperparameter Optimization

Inga, Juan; Sacoto-Cabrera, Erwin

doi:10.1007/978-3-031-24327-1_8

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 607))

Included in the following conference series:

International Conference on Science, Technology and Innovation for Society

346 Accesses
3 Citations

Abstract

Machine learning models are an important tool that provide a scientific method to identify potential debtors early and predict which clients are more likely to default on their debts, improving the accuracy of assessment in credit risk analysis in financial companies. The purpose of this study was to analyze the performance of gradient boosting machine learning algorithms (CatBoost, LightGBM, and XGBoost) in predicting customer default risk, and the ability of the RandomUnderSampler sampling technique to address unbalanced categories of credit risk. The exploratory analysis of the data set was carried out, then the data preprocessing, finally the training with hyperparameter adjustments with the GridSearchCV method to identify the largest number of clients with credit risk. The model is evaluated based on metrics of sensitivity, specificity and precision, on a set of consumer credit data. Among the proposed algorithms, XGBoost outperformed the LightGBM and catBoost models. Experimental results confirmed that the XGBoost model performs better for credit risk prediction with historical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tames, L.A.L., Campos, B.C., Navarro, F.A.C.: Inteligencia artificial para la transformaci on digital en toma de decisiones. Tecnologıa Vital 4(7) (2020)
Google Scholar
Redero Juanes, B., et al.: La transformaci on digital de las empresas. estudio del caso del sector financiero. el banco santander (2020)
Google Scholar
Borrero-Tigreros, D., Bedoya-Leiva, O.F.: Predicci on de riesgo crediticio en colombia usando t ecnicas de inteligencia artificial. Revista UIS Ingenier ıas 19(4), 37–52 (2020)
Article Google Scholar
Shwartz-Ziv, R., Armon, A.: Deep Learning in not all you need (2021)
Google Scholar
Pinedo Borobio, P.: Modelos recientes de la estad ıstica y el aprendizaje m aquina para la valoración del riesgo de incumplimiento crediticio.
Google Scholar
Saqib Aziz, M.D.: Ai and machine learning for risk management. SSRN Electronic Journal (2018)
Google Scholar
Guiterrez-Portela, F., Moreno-Hernandez, J.J., Echeverry, B.D., Jaramillo, A.S.: Uso de los sistemas inteligentes para la deteccion de fraudes financieros. Revista Sinergia 1(6), 6–30 (2019)
Google Scholar
Peter Martey Addo, D.G., Hassani, B.: Credit risk analysis using machine and deep. Computational Methods for Risk Management in Economics (2018)
Google Scholar
Ravid Shwartz-Ziv, A.A.: Deep learning is not all you need (2021)
Google Scholar
Anastasios Petropoulos, E.S., Siakoulis, V., Klamargias, A.: A robust machine learning approach for credit risk. Ninth IFC Conference (2018)
Google Scholar
Alvarez, J.G.: Machine learning y riesgo de crédito. Facultad de Ciencias Económicas y Empresariales (2020)
Google Scholar
Bastos, R.: Credit risk analysis with machine learning. Towards Data Science (2020)
Google Scholar
Ala’raj, M., Abbod, M.F., Majdalawieh, M.: Modelling customers credit card behaviour using bidirectional lstm neural networks. Journal of Big Data 8 (2021)
Google Scholar
Sacoto Cabrera, E.: Análisis basado en teoría de juegos de modelos de negocio de operadores m oviles virtuales en redes 4g y 5g, Ph.D. dissertation, Universitat Polit`ecnica de Valéncia (2021)
Google Scholar
Rojo, M.R.A.: Modelo predictivo de análisis de riesgo crediticio usando machine learning en una entidad del sector microfinanciero (2019)
Google Scholar
Fernandez, D.T.: Aplicacion de metodologias machine learning en la gestion de riesgo de crédito. Universidad Politecnica de Madrid (2017)
Google Scholar
Tianqi Chen, A.G.: XGBoost: A scalable tree boosting system (2016)
Google Scholar
Malbrn, A.E.: Modelo de predicción de operaciones de crédito con posible default financiero (2020)
Google Scholar
Wu, S.: How to choose between different boosting algorithms. Towards Data Science (2021)
Google Scholar
Li, Y.: Credit risk prediction based on machine learning (2019)
Google Scholar
Sacoto Cabrera, A.J., Palaguachi, S., Leon-Paredes, G.A., Gallegos-Segovia, P.L., Bravo-Quezada, O.G.: Industrial communication based on mqtt and modbus communication applied in a meteorological network. In: The International Conference on Advances in Emerging Trends and Technologies, pp. 29–41. Springer (2020)
Google Scholar
Ligang Zhou, H.D.R.M., Fujita, H.: Credit risk modeling on data with two timestamps in peer-to-peer lending by gradient boosting (2021)
Google Scholar
Sacoto-Cabrera, A.J., Leon-Paredes, G., Verdugo-Romero, W.: Lorawan: Application of nonlinear optimization to base stations location. In: Communication, Smart Technologies and Innovation for Society, pp. 515–524. Springer (2022)
Google Scholar
Daoud, A.A.: Comparison between xgboost, lightgbm and catboost using a home credit dataset. International Journal of Computer and Information Engineering (2019)
Google Scholar
Naik, K.: Predicting credit risk for unsecured lending: A machine learning approach (2021)
Google Scholar
Coser, A.: Predictive models for loan default risk (2019)
Google Scholar
Hancock, J.T., Khoshgoftaar, T.M.: Catboost for big data: an interdisciplinary review. Journal of big data 7(1), 1–45 (2020)
Article Google Scholar
Liudmila Prokhorenkova, A.V.A.V.D.A.G., Gusev, G.: Catboost: unbiased boosting with categorical features (2017)
Google Scholar
Biarnes, A.: How catboost encodes categorical variables? Towards Data Science (2021)
Google Scholar
Sujoy Barua, P.S.L.S.J.R., Gavandi, D.: Predicting the probability of loan defaults using CatBoost algorithm. In: 5th International Conference on Computing Methodologies and Communication (ICCMC) (2021)
Google Scholar
Wirot Yotsawat, A., Wattuya, P.: Improved credit scoring model using xgboost with bayesian hyper-parameter optimization. International Journal of Electrical and Computer Engineering (IJECE) (2021)
Google Scholar
Fisnik Doko, S.K., Mishkovski, I.: Credit risk model based on central bank credit registry data (2021)
Google Scholar
de Juan Fernandez, A.: Importancia de los valores atípicos en el modelo de regresión causas, consecuencias, detección y tratamiento (1995)
Google Scholar
Arias, L.A.P.: Evaluación de modelos de machine learning para sistemas de detección de intrusos en redes iot (2021)
Google Scholar
Tarawneh, A.S., Hassanat, A.B., Altarawneh, G.A., Almuhaimeed, A.: Stop oversampling for class imbalance learning: A review. IEEE Access 10, 47 643–47 660 (2022)
Google Scholar
Tharwat, A.: Classification assessment methods. Applied Computing and Informatics 17, 168–192 (2021)
Article Google Scholar
Pushpa Singh, K.K.S.A.S., Singh, N.: Diagnosing of disease using machine learning (2021)
Google Scholar
Nitesh, L.O.H.W.P.K., Chawla, V., Bowyer, K.W.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research (2002)
Google Scholar
Ma, T., Wu, L., Zhu, S., Zhu, H.: Multiclassification prediction of clay sensitivity using extreme gradient boosting based on imbalanced dataset. Applied Sciences 12(3) (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Politécnica Salesiana, Cuenca, Ecuador
Juan Inga & Erwin Sacoto-Cabrera

Authors

Juan Inga
View author publications
You can also search for this author in PubMed Google Scholar
Erwin Sacoto-Cabrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Inga .

Editor information

Editors and Affiliations

Campus El Vecino, Universidad Politécnica Salesiana, Cuenca, Ecuador
Vladimir Robles-Bykbaev
Research Centre on Production Management and Engineering - CIGIP, Universitat Politècnica de València, Alcoy, Spain
Josefa Mula
Rua Imaculada Conceiçao, Pontifícia Universidade Católica do Paraná – PUCPR, Curitiba, Brazil
Gilberto Reynoso-Meza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Inga, J., Sacoto-Cabrera, E. (2023). Credit Default Risk Analysis Using Machine Learning Algorithms with Hyperparameter Optimization. In: Robles-Bykbaev, V., Mula, J., Reynoso-Meza, G. (eds) Intelligent Technologies: Design and Applications for Society. CITIS 2022. Lecture Notes in Networks and Systems, vol 607. Springer, Cham. https://doi.org/10.1007/978-3-031-24327-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-24327-1_8
Published: 01 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24326-4
Online ISBN: 978-3-031-24327-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics