Abstract
Machine learning models are an important tool that provide a scientific method to identify potential debtors early and predict which clients are more likely to default on their debts, improving the accuracy of assessment in credit risk analysis in financial companies. The purpose of this study was to analyze the performance of gradient boosting machine learning algorithms (CatBoost, LightGBM, and XGBoost) in predicting customer default risk, and the ability of the RandomUnderSampler sampling technique to address unbalanced categories of credit risk. The exploratory analysis of the data set was carried out, then the data preprocessing, finally the training with hyperparameter adjustments with the GridSearchCV method to identify the largest number of clients with credit risk. The model is evaluated based on metrics of sensitivity, specificity and precision, on a set of consumer credit data. Among the proposed algorithms, XGBoost outperformed the LightGBM and catBoost models. Experimental results confirmed that the XGBoost model performs better for credit risk prediction with historical data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tames, L.A.L., Campos, B.C., Navarro, F.A.C.: Inteligencia artificial para la transformaci on digital en toma de decisiones. Tecnologıa Vital 4(7) (2020)
Redero Juanes, B., et al.: La transformaci on digital de las empresas. estudio del caso del sector financiero. el banco santander (2020)
Borrero-Tigreros, D., Bedoya-Leiva, O.F.: Predicci on de riesgo crediticio en colombia usando t ecnicas de inteligencia artificial. Revista UIS Ingenier ıas 19(4), 37–52 (2020)
Shwartz-Ziv, R., Armon, A.: Deep Learning in not all you need (2021)
Pinedo Borobio, P.: Modelos recientes de la estad ıstica y el aprendizaje m aquina para la valoración del riesgo de incumplimiento crediticio.
Saqib Aziz, M.D.: Ai and machine learning for risk management. SSRN Electronic Journal (2018)
Guiterrez-Portela, F., Moreno-Hernandez, J.J., Echeverry, B.D., Jaramillo, A.S.: Uso de los sistemas inteligentes para la deteccion de fraudes financieros. Revista Sinergia 1(6), 6–30 (2019)
Peter Martey Addo, D.G., Hassani, B.: Credit risk analysis using machine and deep. Computational Methods for Risk Management in Economics (2018)
Ravid Shwartz-Ziv, A.A.: Deep learning is not all you need (2021)
Anastasios Petropoulos, E.S., Siakoulis, V., Klamargias, A.: A robust machine learning approach for credit risk. Ninth IFC Conference (2018)
Alvarez, J.G.: Machine learning y riesgo de crédito. Facultad de Ciencias Económicas y Empresariales (2020)
Bastos, R.: Credit risk analysis with machine learning. Towards Data Science (2020)
Ala’raj, M., Abbod, M.F., Majdalawieh, M.: Modelling customers credit card behaviour using bidirectional lstm neural networks. Journal of Big Data 8 (2021)
Sacoto Cabrera, E.: Análisis basado en teoría de juegos de modelos de negocio de operadores m oviles virtuales en redes 4g y 5g, Ph.D. dissertation, Universitat Polit`ecnica de Valéncia (2021)
Rojo, M.R.A.: Modelo predictivo de análisis de riesgo crediticio usando machine learning en una entidad del sector microfinanciero (2019)
Fernandez, D.T.: Aplicacion de metodologias machine learning en la gestion de riesgo de crédito. Universidad Politecnica de Madrid (2017)
Tianqi Chen, A.G.: XGBoost: A scalable tree boosting system (2016)
Malbrn, A.E.: Modelo de predicción de operaciones de crédito con posible default financiero (2020)
Wu, S.: How to choose between different boosting algorithms. Towards Data Science (2021)
Li, Y.: Credit risk prediction based on machine learning (2019)
Sacoto Cabrera, A.J., Palaguachi, S., Leon-Paredes, G.A., Gallegos-Segovia, P.L., Bravo-Quezada, O.G.: Industrial communication based on mqtt and modbus communication applied in a meteorological network. In: The International Conference on Advances in Emerging Trends and Technologies, pp. 29–41. Springer (2020)
Ligang Zhou, H.D.R.M., Fujita, H.: Credit risk modeling on data with two timestamps in peer-to-peer lending by gradient boosting (2021)
Sacoto-Cabrera, A.J., Leon-Paredes, G., Verdugo-Romero, W.: Lorawan: Application of nonlinear optimization to base stations location. In: Communication, Smart Technologies and Innovation for Society, pp. 515–524. Springer (2022)
Daoud, A.A.: Comparison between xgboost, lightgbm and catboost using a home credit dataset. International Journal of Computer and Information Engineering (2019)
Naik, K.: Predicting credit risk for unsecured lending: A machine learning approach (2021)
Coser, A.: Predictive models for loan default risk (2019)
Hancock, J.T., Khoshgoftaar, T.M.: Catboost for big data: an interdisciplinary review. Journal of big data 7(1), 1–45 (2020)
Liudmila Prokhorenkova, A.V.A.V.D.A.G., Gusev, G.: Catboost: unbiased boosting with categorical features (2017)
Biarnes, A.: How catboost encodes categorical variables? Towards Data Science (2021)
Sujoy Barua, P.S.L.S.J.R., Gavandi, D.: Predicting the probability of loan defaults using CatBoost algorithm. In: 5th International Conference on Computing Methodologies and Communication (ICCMC) (2021)
Wirot Yotsawat, A., Wattuya, P.: Improved credit scoring model using xgboost with bayesian hyper-parameter optimization. International Journal of Electrical and Computer Engineering (IJECE) (2021)
Fisnik Doko, S.K., Mishkovski, I.: Credit risk model based on central bank credit registry data (2021)
de Juan Fernandez, A.: Importancia de los valores atípicos en el modelo de regresión causas, consecuencias, detección y tratamiento (1995)
Arias, L.A.P.: Evaluación de modelos de machine learning para sistemas de detección de intrusos en redes iot (2021)
Tarawneh, A.S., Hassanat, A.B., Altarawneh, G.A., Almuhaimeed, A.: Stop oversampling for class imbalance learning: A review. IEEE Access 10, 47 643–47 660 (2022)
Tharwat, A.: Classification assessment methods. Applied Computing and Informatics 17, 168–192 (2021)
Pushpa Singh, K.K.S.A.S., Singh, N.: Diagnosing of disease using machine learning (2021)
Nitesh, L.O.H.W.P.K., Chawla, V., Bowyer, K.W.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research (2002)
Ma, T., Wu, L., Zhu, S., Zhu, H.: Multiclassification prediction of clay sensitivity using extreme gradient boosting based on imbalanced dataset. Applied Sciences 12(3) (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Inga, J., Sacoto-Cabrera, E. (2023). Credit Default Risk Analysis Using Machine Learning Algorithms with Hyperparameter Optimization. In: Robles-Bykbaev, V., Mula, J., Reynoso-Meza, G. (eds) Intelligent Technologies: Design and Applications for Society. CITIS 2022. Lecture Notes in Networks and Systems, vol 607. Springer, Cham. https://doi.org/10.1007/978-3-031-24327-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-24327-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24326-4
Online ISBN: 978-3-031-24327-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)