Abstract
Credit scoring concerns with emerging empirical model to assist the financial institutions for financial decision-making process. Credit risk analysis plays a vital role for decision-making process; statistical and machine learning approaches are utilized to estimate the risk associated with a credit applicant. Enhancing the performance of credit scoring model, particularly toward non-trustworthy “or non-creditworthy” group, may result incredible effect for financial institution. However, credit scoring data may have excess and unimportant data and features which degrades the performance of model. So, selection of important features (or reduction in irrelevant and redundant features) may play the key role for improving the effectiveness and reducing the complexity of the model. This study presents a experimental results analysis of various combinations of feature selection approaches with various classification approaches and impact of feature selection approaches. For experimental results analysis, nine feature selection and sixteen classification state-of-the-art approaches have been applied on seven benched marked credit scoring datasets.
Similar content being viewed by others
References
Abdou, H.A., Pointon, J.: Credit scoring, statistical techniques and evaluation criteria: a review of the literature. Intell. Syst. Account. Finance Manag. 18(2–3), 59–88 (2011)
Abellán, J., Castellano, J.G.: A comparative study on base classifiers in ensemble methods for credit scoring. Expert Syst. Appl. 73, 1–10 (2017)
Ala’raj, M., Abbod, M.F.: Classifiers consensus system approach for credit scoring. Knowl. Based Syst. 104, 89–105 (2016)
Ala’raj, M., Abbod, M.F.: A new hybrid ensemble credit scoring model based on classifiers consensus system approach. Expert Syst. Appl. 64, 36–55 (2016)
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
Atiya, A.F.: Bankruptcy prediction for credit risk using neural networks: a survey and new results. IEEE Trans. Neural Netw. 12(4), 929–935 (2001)
Bashir, S., Qamar, U., Khan, F.H.: IntelliHealth: a medical decision support application using a novel weighted multi-layer classifier ensemble framework. J. Biomed. Inform. 59, 185–200 (2016)
Bashir, S., Qamar, U., Khan, F.H., Naseem, L.: HMV: a medical decision support framework using multi-layer classifiers for disease prediction. J. Comput. Sci. 13, 10–25 (2016)
Bequé, A., Lessmann, S.: Extreme learning machines for credit scoring: an empirical evaluation. Expert Syst. Appl. 86, 42–53 (2017)
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. ICML 98, 82–90 (1998)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Broomhead, D.S., Lowe, D.: Radial basis functions, multi-variable functional interpolation and adaptive networks. Technical report. Royal Signals and Radar Establishment Malvern (United Kingdom) (1988)
Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342. ACM (2010)
Chakravarthy, H., Bachan, P., Roshini, P., Ch, R.K.: Bio inspired approach as a problem solving technique (2012)
Chen, W., Ma, C., Ma, L.: Mining the customer credit using hybrid support vector machine technique. Expert Syst. Appl. 36(4), 7611–7616 (2009)
Chi, B.W., Hsu, C.C.: A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model. Expert Syst. Appl. 39(3), 2650–2661 (2012)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Desai, V.S., Crook, J.N., Overstreet, G.A., Jr.: A comparison of neural networks and linear scoring models in the credit union environment. Eur. J. Oper. Res. 95(1), 24–37 (1996)
Dua, D., Graff, C.: UCI machine learning repository. https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/ (2017)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2012)
Edla, D.R., Tripathi, D., Cheruku, R., Kuppili, V.: An efficient multi-layer ensemble framework with BPSOGSA-based feature selection for credit scoring data analysis. Arab. J. Sci. Eng. 43(12), 6909–6928 (2018)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization (1998)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001)
Fulwari, A.: Issues of housing finance in urban India a symptomatic study. Ph.D. thesis (2013)
Gorzałczany, M.B., Rudziński, F.: A multi-objective genetic optimization for fast, fuzzy rule-based credit classification with balanced accuracy and interpretability. Appl. Soft Comput. 40, 206–220 (2016)
Green, S., Salkind, N.: Using SPSS for Windows and Macintosh: Analyzing and Understanding Data. Prentice Hall, Uppersaddle River (2010)
Guo, S., He, H., Huang, X.: A multi-stage self-adaptive classifier ensemble model with application in credit scoring. IEEE Access 7, 78549–78559 (2019)
Hall, M.A.: Correlation-based feature selection for machine learning (1999)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin (2009)
Hayashi, Y.: Application of a rule extraction algorithm family based on the Re-RX algorithm to financial credit risk assessment from a pareto optimal perspective. Oper. Res. Perspect. 3, 32–42 (2016)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River (1994)
Haykin, S.S.: Neural Networks: A Comprehensive Foundation. Tsinghua University Press, Beijing (2001)
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, pp. 507–514 (2006)
Hens, A.B., Tiwari, M.K.: Computational time reduction for credit scoring: an integrated approach based on support vector machine and stratified sampling method. Expert Syst. Appl. 39(8), 6774–6781 (2012)
Hoffmann, F., Baesens, B., Mues, C., Van Gestel, T., Vanthienen, J.: Inferring descriptive and approximate fuzzy rules for credit scoring using evolutionary algorithms. Eur. J. Oper. Res. 177(1), 540–555 (2007)
Hu, Q., Yu, D., Liu, J., Wu, C.: Neighborhood rough set based heterogeneous feature subset selection. Inf. Sci. 178(18), 3577–3594 (2008)
Hu, Z., Bao, Y., Xiong, T., Chiong, R.: Hybrid filter-wrapper feature selection for short-term load forecasting. Eng. Appl. Artif. Intell. 40, 17–27 (2015)
Huang, C.L., Chen, M.C., Wang, C.J.: Credit scoring with a data mining approach based on support vector machines. Expert Syst. Appl. 33(4), 847–856 (2007)
Huang, C.L., Dun, J.F.: A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Appl. Soft Comput. 8(4), 1381–1391 (2008)
Huang, C.L., Wang, C.J.: A GA-based feature selection and parameters optimization for support vector machines. Expert Syst. Appl. 31(2), 231–240 (2006)
Huang, G.B., Chen, L.: Convex incremental extreme learning machine. Neurocomputing 70(16), 3056–3062 (2007)
Huang, G.B., Chen, L.: Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16), 3460–3468 (2008)
Huang, G.B., Wang, D.H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybernet. 2(2), 107–122 (2011)
Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(2), 513–529 (2012)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of 2004 IEEE International Joint Conference on of Neural Networks, vol. 2, pp. 985–990. IEEE (2004)
Jimbo Santana, P., Villa Monte, A., Rucci, E., Lanzarini, L.C., Fernández Bariviera, A.: Analysis of methods for generating classification rules applicable to credit risk. J. Comput. Sci. Technol. 17, 20–28 (2017)
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)
Kala, R., Vazirani, H., Khanwalkar, N., Bhattacharya, M.: Evolutionary radial basis function network for classificatory problems. IJCSA 7(4), 34–49 (2010)
Kang, S., Cho, S., Kang, P.: Multi-class classification via heterogeneous ensemble of one-class classifiers. Eng. Appl. Artif. Intell. 43, 35–43 (2015)
Kaynak, C., Alpaydin, E.: Multistage cascading of multiple classifiers: one man’s noise is another man’s data. In: ICML, pp. 455–462 (2000)
Kim, M.J., Kang, D.K., Kim, H.B.: Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction. Expert Syst. Appl. 42(3), 1074–1082 (2015)
Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. Aaai 2, 129–134 (1992)
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Machine Learning Proceedings 1992, pp. 249–256. Elsevier (1992)
Kuppili, V., Tripathi, D., Reddy Edla, D.: Credit score classification using spiking extreme learning machine. Comput. Intell. 36(2), 402–426 (2020)
Lahsasna, A., Ainon, R.N., Teh, Y.W.: Credit scoring models using soft computing methods: a survey. Int. Arab J. Inf. Technol. 7(2), 115–123 (2010)
Lang, K.J.: A time-delay neural network architecture for speech recognition. Technical Report (1988)
Le Cessie, S., Van Houwelingen, J.C.: Ridge estimators in logistic regression. J. Appl. Stat. 41, 191–201 (1992)
Lee, T.S., Chen, I.F.: A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Syst. Appl. 28(4), 743–752 (2005)
Lee, T.S., Chiu, C.C., Lu, C.J., Chen, I.F.: Credit scoring using the hybrid neural discriminant technique. Expert Syst. Appl. 23(3), 245–254 (2002)
Li, S.T., Shiue, W., Huang, M.H.: The evaluation of consumer loans using support vector machines. Expert Syst. Appl. 30(4), 772–782 (2006)
Liang, D., Tsai, C.F., Dai, A.J., Eberle, W.: A novel classifier ensemble approach for financial distress prediction. Knowl. Inf. Syst. 54, 437–462 (2017)
Liang, D., Tsai, C.F., Wu, H.T.: The effect of feature selection on financial distress prediction. Knowl.-Based Syst. 73, 289–297 (2015)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Lin, W.Y., Hu, Y.H., Tsai, C.F.: Machine learning in financial crisis prediction: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 421–436 (2012)
Liu, Y., Wang, G., Chen, H., Dong, H., Zhu, X., Wang, S.: An improved particle swarm optimization for feature selection. J. Bionic Eng. 8(2), 191–200 (2011)
Louzada, F., Ara, A., Fernandes, G.B.: Classification methods applied to credit scoring: systematic review and overall comparison. Surv. Oper. Res. Manag. Sci. 21, 117–134 (2016)
Martens, D., Baesens, B., Van Gestel, T., Vanthienen, J.: Comprehensible credit scoring models using rule extraction from support vector machines. Eur. J. Oper. Res. 183(3), 1466–1476 (2007)
Martens, D., De Backer, M., Haesen, R., Vanthienen, J., Snoeck, M., Baesens, B.: Classification with ant colony optimization. IEEE Trans. Evol. Comput. 11(5), 651–665 (2007)
Mester, L.J., et al.: What’s the point of credit scoring? Bus. Rev. 3(Sep/Oct), 3–16 (1997)
Moretti, F., Pizzuti, S., Panzieri, S., Annunziato, M.: Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling. Neurocomputing 167, 3–7 (2015)
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
Nanni, L., Lumini, A.: An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Syst. Appl. 36(2), 3028–3033 (2009)
Neumann, F., Witt, C.: Bioinspired Computation in Combinatorial Optimization: Algorithms and Their Computational Complexity. Springer, Berlin (2010)
Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4), 2052–2064 (2014)
Paleologo, G., Elisseeff, A., Antonini, G.: Subagging for credit scoring models. Eur. J. Oper. Res. 201(2), 490–499 (2010)
Parvin, H., MirnabiBaboli, M., Alinejad-Rokny, H.: Proposing a classifier ensemble framework based on classifier selection and decision tree. Eng. Appl. Artif. Intell. 37, 34–42 (2015)
Ping, Y., Yongheng, L.: Neighborhood rough set and SVM based hybrid credit scoring classifier. Expert Syst. Appl. 38(9), 11300–11304 (2011)
Platt, J.C.: 12 fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods, pp. 185–208 (1999)
Polat, K., Güneş, S., Arslan, A.: A cascade learning system for classification of diabetes disease: generalized discriminant analysis and least square support vector machine. Expert Syst. Appl. 34(1), 482–487 (2008)
Qualitative bankruptcy data set. https://archive.ics.uci.edu/ml/machine-learning-databases/00281/. Accessed 1 Oct 2019
Rifkin, R.M.: Everything old is new again: a fresh look at historical approaches in machine learning. Ph.D. thesis, Massachusetts Institute of Technology (2002)
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006)
Roffo, G., Melzi, S.: Features selection via eigenvector centrality. In: Proceedings of New Frontiers in Mining Complex Patterns (NFMCP 2016) (Oct 2016) (2016)
Roffo, G., Melzi, S.: Ranking to learn. In: International Workshop on New Frontiers in Mining Complex Patterns, pp. 19–35. Springer (2016)
Roffo, G., Melzi, S., Castellani, U., Vinciarelli, A.: Infinite latent feature selection: a probabilistic latent graph-based ranking approach. In: Computer Vision and Pattern Recognition (2017)
Rokach, L., Maimon, O.Z.: Data Mining with Decision Trees: Theory and Applications, vol. 69
Rosenblatt, F.: Principles of neurodynamics. Perceptrons and the theory of brain mechanisms. Technical report, Cornell Aeronautical Lab Inc., Buffalo, NY (1961)
Rudziński, F.: A multi-objective genetic optimization of interpretability-oriented fuzzy rule-based classifiers. Appl. Soft Comput. 38, 118–133 (2016)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science (1985)
Saha, M.: Credit cards issued. http://www.thehindu.com/business/Industry/Credit-cards-issued-touch-24.5-million/article14378386.ece (2017). Accessed 1 Oct 2019
Schölkopf, B., Tsuda, K., Vert, J.P., Istrail, D.S., Pevzner, P.A., Waterman, M.S., et al.: Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)
Senliol, B., Gulgezen, G., Yu, L., Cataltepe, Z.: Fast correlation based filter (FCBF) with a different search strategy. In: 23rd International Symposium on Computer and Information Sciences, 2008. ISCIS’08, pp. 1–4. IEEE (2008)
Shahani, K., Udpa, L., Udpa, S.: Time delay neural networks for classification of ultrasonic NDT signals. In: Review of Progress in Quantitative Nondestructive Evaluation, pp. 693–700. Springer (1992)
Shukla, A.K., Singh, P., Vardhan, M.: A two-stage gene selection method for biomarker discovery from microarray data for cancer classification. Chemometr. Intell. Lab. Syst. 183, 47–58 (2018)
Shukla, A.K., Tripathi, D.: Detecting biomarkers from microarray data using distributed correlation based gene selection. Genes Genomics 42, 449–465 (2020)
Shukla, A.K., Tripathi, D., Reddy, B.R., Chandramohan, D.: A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges. Evolut. Intell. 13, 309–329 (2019)
Statlog: Australian credit approval data set. http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/australian/australian.dat. Accessed 1 Oct 2019
Statlog: German dataset. https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/. Accessed 1 Oct 2019
Thomas, L.C., Edelman, D.B., Crook, J.N.: Credit Scoring and Its Applications. SIAM, Philadelphia (2002)
Tripathi, D., Cheruku, R., Bablani, A.: Relative performance evaluation of ensemble classification with feature reduction in credit scoring datasets. In: Reddy Edla, D., Lingras, P., Venkatanareshbabu, K. (eds.) Advances in Machine Learning and Data Science, pp. 293–304. Springer, Berlin (2018)
Tripathi, D., Edla, D.R., Cheruku, R.: Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification. J. Intell. Fuzzy Syst. 34(3), 1543–1549 (2018)
Tripathi, D., Edla, D.R., Cheruku, R., Kuppili, V.: A novel hybrid credit scoring model based on ensemble feature selection and multilayer ensemble classification. Comput. Intell. 35, 371–394 (2019)
Tripathi, D., Edla, D.R., Kuppili, V., Bablani, A.: Evolutionary extreme learning machine with novel activation function for credit scoring. Eng. Appl. Artif. Intell. 96, 103980 (2020)
Tripathi, D., Edla, D.R., Kuppili, V., Bablani, A., Dharavath, R.: Credit scoring model based on weighted voting and cluster based feature selection. Procedia Comput. Sci. 132, 22–31 (2018)
Tripathi, D., Edla, D.R., Kuppili, V., Dharavath, R.: Binary BAT algorithm and RBFN based hybrid credit scoring model. Multimed. Tools Appl. 79(43), 31889–31912 (2020)
Tsai, C.F.: Feature selection in bankruptcy prediction. Knowl.-Based Syst. 22(2), 120–127 (2009)
Tsai, C.F., Wu, J.W.: Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst. Appl. 34(4), 2639–2649 (2008)
Van Gestel, T., Baesens, B., Suykens, J.A., Van den Poel, D., Baestaens, D.E., Willekens, M.: Bayesian kernel based classification for financial distress detection. Eur. J. Oper. Res. 172(3), 979–1003 (2006)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (2013)
Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.J.: Phoneme recognition using time-delay neural networks. In: Readings in speech recognition, pp. 393–404. Elsevier (1990)
Wang, G., Hao, J., Ma, J., Jiang, H.: A comparative assessment of ensemble learning for credit scoring. Expert Syst. Appl. 38(1), 223–230 (2011)
Wang, G., Ma, J., Huang, L., Xu, K.: Two credit scoring models based on dual strategy ensemble trees. Knowl.-Based Syst. 26, 61–68 (2012)
Wang, J., Guo, K., Wang, S.: Rough set and Tabu search based feature selection for credit scoring. Procedia Comput. Sci. 1(1), 2425–2432 (2010)
West, D.: Neural network credit scoring models. Comput. Oper. Res. 27(11), 1131–1152 (2000)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Wongchinsri, P., Kuratach, W.: Sr-based binary classification in credit scoring. In: 2017 14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 385–388. IEEE (2017)
Xia, Y., Liu, C., Da, B., Xie, F.: A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Syst. Appl. 93, 182–199 (2018)
Xiao, W.B., Fei, Q.: A study of personal credit scoring models on support vector machine with optimal choice of kernel function parameters. Syst. Eng. Theory Pract. 10, 010 (2006)
Xu, D., Zhang, X., Feng, H.: Generalized fuzzy soft sets theory-based novel hybrid ensemble credit scoring model. Int. J. Finance Econ. 24(2), 903–921 (2019)
Yang, Y.: Adaptive credit scoring with kernel learning methods. Eur. J. Oper. Res. 183(3), 1521–1536 (2007)
Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: l2, 1-norm regularized discriminative feature selection for unsupervised learning. In: IJCAI Proceedings-international Joint Conference on Artificial Intelligence, vol. 22, p. 1589 (2011)
Yeh, I.C., Lien, C.H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009)
Yu, L., Wang, S., Lai, K.K.: An intelligent-agent-based fuzzy group decision making model for financial multicriteria decision support: the case of credit scoring. Eur. J. Oper. Res. 195(3), 942–959 (2009)
Zeng, H., Cheung, Y.M.: Feature selection and kernel learning for local learning-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1532–1547 (2011)
Zhang, D., Zhou, X., Leung, S.C., Zheng, J.: Vertical bagging decision trees model for credit scoring. Expert Syst. Appl. 37(12), 7838–7843 (2010)
Zhang, W., He, H., Zhang, S.: A novel multi-stage hybrid model with enhanced multi-population niche genetic algorithm: an application in credit scoring. Expert Syst. Appl. 121, 221–232 (2019)
Zhou, L., Lai, K.K., Yen, J.: Credit scoring models with AUC maximization based on weighted SVM. Int. J. Inf. Technol. Decis. Mak. 8(04), 677–696 (2009)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tripathi, D., Edla, D.R., Bablani, A. et al. Experimental analysis of machine learning methods for credit score classification. Prog Artif Intell 10, 217–243 (2021). https://doi.org/10.1007/s13748-021-00238-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-021-00238-2