Skip to main content

Feature Selection for Credit Risk Classification

  • Conference paper
  • First Online:
Intelligent Systems and Pattern Recognition (ISPR 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1589))

Abstract

With the advancement of storage methods, feature selection has become increasingly important in many fields of study, including credit risk classification. To improve model robustness, feature screening has predominated, but it suffers from being trapped at the local optimum. Among the various proposed strategies to deal with this issue is integrating feature selection into the training phase. We compare two of the most commonly used methods in the related field, one parametric (logistic regression) with regularization of the L1 norm and the second non-parametric (random forests) with wrapper-based strategy; while integrating feature selection into the training process. We used the German credit dataset and employed preprocessing steps such as class merging, data standardization, and dummy coding. The results formulated on classification based-measures built on a 70:30 split revealed that logistic regression outperformed with Accuracy = 0.75, Sensitivity (Recall) = 0.9825, Precision = 0.742, F1-score = 0.845, AUC = 0.8, and PR-AUC = 0.877.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Acharjee, A., Larkman, J., Xu, Y., Cardoso, V.R., Gkoutos, G.V.: A random forest based biomarker discovery and power analysis framework for diagnostics research. BMC Med. Genomics 13(1), 1–14 (2020)

    Article  Google Scholar 

  2. Arora, N., Kaur, P.D.: A bolasso based consistent feature selection enabled random forest classification algorithm: an application to credit risk assessment. Appl. Soft Comput. 86, 105936 (2020)

    Article  Google Scholar 

  3. Arutjothi, G., Senthamarai, C.: Credit risk evaluation using hybrid feature selection method. Softw. Eng. 9(2), 23–26 (2017)

    Google Scholar 

  4. Bahl, A., et al.: Recursive feature elimination in random forest classification supports nanomaterial grouping. NanoImpact 15, 100179 (2019)

    Article  Google Scholar 

  5. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996). https://doi.org/10.1007/BF00058655

    Article  MATH  Google Scholar 

  6. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/a:1010933404324

    Article  MATH  Google Scholar 

  7. Chen, W., Li, Z., hui Guo, J.: A vns-eda algorithm-based feature selection for credit risk classification. Math. Prob. Eng. 2020, 1–14 (2020)

    Google Scholar 

  8. Chi, G., Uddin, M.S., Habib, T., Zhou, Y., Islam, M.R., Chowdhury, M.A.I.: A hybrid model for credit risk assessment: empirical validation by real-world credit data. J. Risk Model Validation, 14(4) (2019)

    Google Scholar 

  9. Dahiya, S., Handa, S., Singh, N.: A rank aggregation algorithm for ensemble of multiple feature selection techniques in credit risk evaluation. Int. J. Adv. Res. Artif. Intell. 5(9), 1–8 (2016)

    Article  Google Scholar 

  10. Darst, B.F., Malecki, K.C., Engelman, C.D.: Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC Genet. 19(1), 1–6 (2018)

    Google Scholar 

  11. Elavarasan, D., Vincent, P.M.D.R., Srinivasan, K., Chang, C.Y.: A hybrid cfs filter and rf-rfe wrapper-based feature extraction for enhanced agricultural crop yield prediction modeling. Agriculture 10(9), 400 (2020)

    Article  Google Scholar 

  12. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1 (2010)

    Article  Google Scholar 

  13. Genuer, R., Poggi, J.M.: Arbres cart et forêts aléatoires, importance et sélection de variables (2017). arXiv preprint arXiv: 1610.08203

  14. Genuer, R., Poggi, J.M., Tuleau-Malot, C.: Variable selection using random forests. Pattern Recogn. Lett. 31(14), 2225–2236 (2010)

    Article  Google Scholar 

  15. Genuer, R., Poggi, J.M., Tuleau-Malot, C.: Vsurf: an r package for variable selection using random forests. R J. 7(2), 19–33 (2015)

    Article  Google Scholar 

  16. Gregorutti, B., Michel, B., Saint-Pierre, P.: Correlation and variable importance in random forests. Stat. Comput. 27(3), 659–678 (2017). https://doi.org/10.1007/s11222-016-9646-1

    Article  MathSciNet  MATH  Google Scholar 

  17. Harrell, F.E.: Regression Modeling Strategies. SSS, Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19425-7

    Book  MATH  Google Scholar 

  18. Hasan, M.A.M., Nasser, M., Ahmad, S., Molla, K.I.: Feature selection for intrusion detection using random forest. J. Inf. Secur. 7(3), 129–140 (2016)

    Google Scholar 

  19. Hastie, T., Tibshirani, R., Tibshirani, R.: Best subset, forward stepwise or lasso? analysis and recommendations based on extensive comparisons. Stat. Sci. 35(4), 579–592 (2020)

    MathSciNet  MATH  Google Scholar 

  20. Huang, Y., Montoya, A.: Lack of robustness of lasso and group lasso with categorical predictors: impact of coding strategy on variable selection and prediction (2020). arXiv preprint arXiv:40b200z6

  21. Jović, A., Brkić, K., Bogunović, N.: A review of feature selection methods with applications. In: 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1200–1205. IEEE (2015)

    Google Scholar 

  22. Kruppa, J., Schwarz, A., Arminger, G., Ziegler, A.: Consumer credit risk: Individual probability estimates using machine learning. Expert Syst. Appl. 40(13), 5125–5131 (2013)

    Article  Google Scholar 

  23. Laborda, J., Ryoo, S.: Feature selection in a credit scoring model. Mathematics 9(7), 746 (2021)

    Article  Google Scholar 

  24. Lappas, P.Z., Yannacopoulos, A.N.: A machine learning approach combining expert knowledge with genetic algorithms in feature selection for credit risk assessment. Appl. Soft Comput. 107, 107391 (2021)

    Article  Google Scholar 

  25. Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)

    Article  Google Scholar 

  26. Mariammal, G., Suruliandi, A., Raja, S., Poongothai, E.: Prediction of land suitability for crop cultivation based on soil and environmental characteristics using modified recursive feature elimination technique with various classifiers. IEEE Trans. Comput. Soc. Syst. 8(5), 1132–1142 (2021)

    Article  Google Scholar 

  27. McEligot, A.J., Poynor, V., Sharma, R., Panangadan, A.: Logistic lasso regression for dietary intakes and breast cancer. Nutrients 12(9), 2652 (2020)

    Article  Google Scholar 

  28. Molina, L.C., Belanche, L., Nebot, À.: Feature selection algorithms: a survey and experimental evaluation. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, pp. 306–313. IEEE (2002)

    Google Scholar 

  29. Mustaqeem, A., Anwar, S.M., Majid, M., Khan, A.R.: Wrapper method for feature selection to classify cardiac arrhythmia. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 3656–3659. IEEE (2017)

    Google Scholar 

  30. Nazih, W., Hifny, Y., Elkilani, W., Abdelkader, T., Faheem, H.: Efficient detection of attacks in sip based voip networks using linear l1-svm classifier. Int. J. Comput. Commun. Control 14(4), 518–529 (2019)

    Article  Google Scholar 

  31. Pandey, T.N., Jagadev, A.K., Mohapatra, S.K., Dehuri, S.: Credit risk analysis using machine learning classifiers. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp. 1850–1854. IEEE (2017)

    Google Scholar 

  32. Peng, X., et al.: Random forest based optimal feature selection for partial discharge pattern recognition in hv cables. IEEE Trans. Power Deliv. 34(4), 1715–1724 (2019)

    Article  Google Scholar 

  33. Rahman, M.S., Rahman, M.K., Kaykobad, M., Rahman, M.S.: isGPT: an optimized model to identify sub-golgi protein types using svm and random forest based feature selection. Artif. Intell. Med. 84, 90–100 (2018)

    Article  Google Scholar 

  34. Ramya, R., Kumaresan, S.: Analysis of feature selection techniques in credit risk assessment. In: 2015 International Conference on Advanced Computing and Communication Systems, pp. 1–6. IEEE (2015)

    Google Scholar 

  35. Salmi, M., Atif, D.: Using a data mining approach to detect automobile insurance fraud. In: International Conference on Soft Computing and Pattern Recognition, pp. 55–66. Springer (2021). https://doi.org/10.1007/978-3-030-96302-6_5

  36. Seijo-Pardo, B., et al.: Biases in feature selection with missing data. Neurocomputing 342, 97–112 (2019)

    Article  Google Scholar 

  37. Smith, G.: Step away from stepwise. J. Big Data 5(1), 1–12 (2018). https://doi.org/10.1186/s40537-018-0143-6

    Article  Google Scholar 

  38. Svetnik, V., Liaw, A., Tong, C., Wang, T.: Application of breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 334–343. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25966-4_33

    Chapter  Google Scholar 

  39. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: B (Methodol) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  40. Wang, H., Xu, Q., Zhou, L.: Large unbalanced credit scoring using lasso-logistic regression ensemble. PLOS ONE 10(2), e0117844 (2015)

    Article  Google Scholar 

  41. Zhou, Y., Uddin, M.S., Habib, T., Chi, G., Yuan, K.: Feature selection in credit risk modeling: an international evidence. Economic Research-Ekonomska Istraživanja, pp. 1–31 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dalia Atif .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Atif, D., Salmi, M. (2022). Feature Selection for Credit Risk Classification. In: Bennour, A., Ensari, T., Kessentini, Y., Eom, S. (eds) Intelligent Systems and Pattern Recognition. ISPR 2022. Communications in Computer and Information Science, vol 1589. Springer, Cham. https://doi.org/10.1007/978-3-031-08277-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08277-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08276-4

  • Online ISBN: 978-3-031-08277-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics