Feature Selection for Credit Risk Classification

Atif, Dalia; Salmi, Mabrouka

doi:10.1007/978-3-031-08277-1_14

Dalia Atif⁹ &
Mabrouka Salmi¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1589))

Included in the following conference series:

International Conference on Intelligent Systems and Pattern Recognition

537 Accesses
1 Citations

Abstract

With the advancement of storage methods, feature selection has become increasingly important in many fields of study, including credit risk classification. To improve model robustness, feature screening has predominated, but it suffers from being trapped at the local optimum. Among the various proposed strategies to deal with this issue is integrating feature selection into the training phase. We compare two of the most commonly used methods in the related field, one parametric (logistic regression) with regularization of the L1 norm and the second non-parametric (random forests) with wrapper-based strategy; while integrating feature selection into the training process. We used the German credit dataset and employed preprocessing steps such as class merging, data standardization, and dummy coding. The results formulated on classification based-measures built on a 70:30 split revealed that logistic regression outperformed with Accuracy = 0.75, Sensitivity (Recall) = 0.9825, Precision = 0.742, F1-score = 0.845, AUC = 0.8, and PR-AUC = 0.877.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acharjee, A., Larkman, J., Xu, Y., Cardoso, V.R., Gkoutos, G.V.: A random forest based biomarker discovery and power analysis framework for diagnostics research. BMC Med. Genomics 13(1), 1–14 (2020)
Article Google Scholar
Arora, N., Kaur, P.D.: A bolasso based consistent feature selection enabled random forest classification algorithm: an application to credit risk assessment. Appl. Soft Comput. 86, 105936 (2020)
Article Google Scholar
Arutjothi, G., Senthamarai, C.: Credit risk evaluation using hybrid feature selection method. Softw. Eng. 9(2), 23–26 (2017)
Google Scholar
Bahl, A., et al.: Recursive feature elimination in random forest classification supports nanomaterial grouping. NanoImpact 15, 100179 (2019)
Article Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996). https://doi.org/10.1007/BF00058655
Article MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/a:1010933404324
Article MATH Google Scholar
Chen, W., Li, Z., hui Guo, J.: A vns-eda algorithm-based feature selection for credit risk classification. Math. Prob. Eng. 2020, 1–14 (2020)
Google Scholar
Chi, G., Uddin, M.S., Habib, T., Zhou, Y., Islam, M.R., Chowdhury, M.A.I.: A hybrid model for credit risk assessment: empirical validation by real-world credit data. J. Risk Model Validation, 14(4) (2019)
Google Scholar
Dahiya, S., Handa, S., Singh, N.: A rank aggregation algorithm for ensemble of multiple feature selection techniques in credit risk evaluation. Int. J. Adv. Res. Artif. Intell. 5(9), 1–8 (2016)
Article Google Scholar
Darst, B.F., Malecki, K.C., Engelman, C.D.: Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC Genet. 19(1), 1–6 (2018)
Google Scholar
Elavarasan, D., Vincent, P.M.D.R., Srinivasan, K., Chang, C.Y.: A hybrid cfs filter and rf-rfe wrapper-based feature extraction for enhanced agricultural crop yield prediction modeling. Agriculture 10(9), 400 (2020)
Article Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1 (2010)
Article Google Scholar
Genuer, R., Poggi, J.M.: Arbres cart et forêts aléatoires, importance et sélection de variables (2017). arXiv preprint arXiv: 1610.08203
Genuer, R., Poggi, J.M., Tuleau-Malot, C.: Variable selection using random forests. Pattern Recogn. Lett. 31(14), 2225–2236 (2010)
Article Google Scholar
Genuer, R., Poggi, J.M., Tuleau-Malot, C.: Vsurf: an r package for variable selection using random forests. R J. 7(2), 19–33 (2015)
Article Google Scholar
Gregorutti, B., Michel, B., Saint-Pierre, P.: Correlation and variable importance in random forests. Stat. Comput. 27(3), 659–678 (2017). https://doi.org/10.1007/s11222-016-9646-1
Article MathSciNet MATH Google Scholar
Harrell, F.E.: Regression Modeling Strategies. SSS, Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19425-7
Book MATH Google Scholar
Hasan, M.A.M., Nasser, M., Ahmad, S., Molla, K.I.: Feature selection for intrusion detection using random forest. J. Inf. Secur. 7(3), 129–140 (2016)
Google Scholar
Hastie, T., Tibshirani, R., Tibshirani, R.: Best subset, forward stepwise or lasso? analysis and recommendations based on extensive comparisons. Stat. Sci. 35(4), 579–592 (2020)
MathSciNet MATH Google Scholar
Huang, Y., Montoya, A.: Lack of robustness of lasso and group lasso with categorical predictors: impact of coding strategy on variable selection and prediction (2020). arXiv preprint arXiv:40b200z6
Jović, A., Brkić, K., Bogunović, N.: A review of feature selection methods with applications. In: 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1200–1205. IEEE (2015)
Google Scholar
Kruppa, J., Schwarz, A., Arminger, G., Ziegler, A.: Consumer credit risk: Individual probability estimates using machine learning. Expert Syst. Appl. 40(13), 5125–5131 (2013)
Article Google Scholar
Laborda, J., Ryoo, S.: Feature selection in a credit scoring model. Mathematics 9(7), 746 (2021)
Article Google Scholar
Lappas, P.Z., Yannacopoulos, A.N.: A machine learning approach combining expert knowledge with genetic algorithms in feature selection for credit risk assessment. Appl. Soft Comput. 107, 107391 (2021)
Article Google Scholar
Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)
Article Google Scholar
Mariammal, G., Suruliandi, A., Raja, S., Poongothai, E.: Prediction of land suitability for crop cultivation based on soil and environmental characteristics using modified recursive feature elimination technique with various classifiers. IEEE Trans. Comput. Soc. Syst. 8(5), 1132–1142 (2021)
Article Google Scholar
McEligot, A.J., Poynor, V., Sharma, R., Panangadan, A.: Logistic lasso regression for dietary intakes and breast cancer. Nutrients 12(9), 2652 (2020)
Article Google Scholar
Molina, L.C., Belanche, L., Nebot, À.: Feature selection algorithms: a survey and experimental evaluation. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, pp. 306–313. IEEE (2002)
Google Scholar
Mustaqeem, A., Anwar, S.M., Majid, M., Khan, A.R.: Wrapper method for feature selection to classify cardiac arrhythmia. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 3656–3659. IEEE (2017)
Google Scholar
Nazih, W., Hifny, Y., Elkilani, W., Abdelkader, T., Faheem, H.: Efficient detection of attacks in sip based voip networks using linear l1-svm classifier. Int. J. Comput. Commun. Control 14(4), 518–529 (2019)
Article Google Scholar
Pandey, T.N., Jagadev, A.K., Mohapatra, S.K., Dehuri, S.: Credit risk analysis using machine learning classifiers. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp. 1850–1854. IEEE (2017)
Google Scholar
Peng, X., et al.: Random forest based optimal feature selection for partial discharge pattern recognition in hv cables. IEEE Trans. Power Deliv. 34(4), 1715–1724 (2019)
Article Google Scholar
Rahman, M.S., Rahman, M.K., Kaykobad, M., Rahman, M.S.: isGPT: an optimized model to identify sub-golgi protein types using svm and random forest based feature selection. Artif. Intell. Med. 84, 90–100 (2018)
Article Google Scholar
Ramya, R., Kumaresan, S.: Analysis of feature selection techniques in credit risk assessment. In: 2015 International Conference on Advanced Computing and Communication Systems, pp. 1–6. IEEE (2015)
Google Scholar
Salmi, M., Atif, D.: Using a data mining approach to detect automobile insurance fraud. In: International Conference on Soft Computing and Pattern Recognition, pp. 55–66. Springer (2021). https://doi.org/10.1007/978-3-030-96302-6_5
Seijo-Pardo, B., et al.: Biases in feature selection with missing data. Neurocomputing 342, 97–112 (2019)
Article Google Scholar
Smith, G.: Step away from stepwise. J. Big Data 5(1), 1–12 (2018). https://doi.org/10.1186/s40537-018-0143-6
Article Google Scholar
Svetnik, V., Liaw, A., Tong, C., Wang, T.: Application of breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 334–343. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25966-4_33
Chapter Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: B (Methodol) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Wang, H., Xu, Q., Zhou, L.: Large unbalanced credit scoring using lasso-logistic regression ensemble. PLOS ONE 10(2), e0117844 (2015)
Article Google Scholar
Zhou, Y., Uddin, M.S., Habib, T., Chi, G., Yuan, K.: Feature selection in credit risk modeling: an international evidence. Economic Research-Ekonomska Istraživanja, pp. 1–31 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

University Center of Tipaza, Tipaza, Algeria
Dalia Atif
National School of Statistics and Applied Economics, Tipaza, Algeria
Mabrouka Salmi

Authors

Dalia Atif
View author publications
You can also search for this author in PubMed Google Scholar
Mabrouka Salmi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dalia Atif .

Editor information

Editors and Affiliations

Larbi Tebessi University, Tebessa, Algeria
Akram Bennour
Arkansas Tech University, Russellville, AR, USA
Tolga Ensari
Digital Research Centre of Sfax, Sakiet Ezzit, Tunisia
Yousri Kessentini
Southeast Missouri State University, Cape Girardeau, MO, USA
Sean Eom

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Atif, D., Salmi, M. (2022). Feature Selection for Credit Risk Classification. In: Bennour, A., Ensari, T., Kessentini, Y., Eom, S. (eds) Intelligent Systems and Pattern Recognition. ISPR 2022. Communications in Computer and Information Science, vol 1589. Springer, Cham. https://doi.org/10.1007/978-3-031-08277-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-08277-1_14
Published: 17 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08276-4
Online ISBN: 978-3-031-08277-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Feature Selection for Credit Risk Classification