Abstract
Credit scoring on class imbalance data, where the class of defaulters is insufficiently represented compared with the class of non-defaulters, is an important but challenging task. In this paper, we propose an imbalanced generative adversarial fusion network (IGAFN) to cope with the class imbalance credit scoring based on multi-source heterogeneous credit data. Concretely, we design a fusion module to integrate the heterogeneous credit data from multiple sources into a unified latent feature space. A generative adversarial network-based balance module is then designed to generate latent representations of new samples for the minority class of the imbalanced datasets. The performance of IGAFN is compared against multiple conventional machine learning and deep learning algorithms. Extensive experiments show that the proposed IGAFN exhibits significantly better performance than the compared methods on two real-life datasets.
Similar content being viewed by others
References
Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113
Abellán J, Mantas CJ (2014) Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 41(8):3825–3830
Batista GE, Bazzan AL, Monard MC (2003) Balancing training data for automated annotation of keywords: a case study. In: Brazilian Workshop on Bioinformatics, pp 35–43
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29
Bellotti T, Crook J (2009) Support vector machines for credit scoring and discovery of significant features. Expert Syst Appl 36(2):3302–3308
Ben-David A (2008) Rule effectiveness in rule-based systems: a credit scoring case study. Expert Syst Appl 34(4):2783–2788
Blanco A, Pino-Mejías R, Lara J, Rayo S (2013) Credit scoring models for the microfinance industry using neural networks: evidence from Peru. Expert Syst Appl 40(1):356–364
Brennan P (2012) A comprehensive survey of methods for overcoming the class imbalance problem in fraud detection. Institute of Technology Blanchardstown, Dublin
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chen MY (2011) Predicting corporate financial distress based on integration of decision tree classification and logistic regression. Expert Syst Appl 38(9):11261–11272
Chen RC, Chen TS, Lin CC (2006) A new binary support vector system for increasing detection rate of credit card fraud. Int J Pattern Recognit Artif Intell 20(02):227–239
Crook JN, Edelman DB, Thomas LC (2007) Recent developments in consumer credit risk assessment. Eur J Oper Res 183(3):1447–1465
Douzas G, Bacao F (2018) Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst Appl 91:464–471
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874
Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F (2017) Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci 479:448–455
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, pp 878–887. Springer
Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42(2):741–750
Henley W, Hand Dj (1997) Construction of a k-nearest-neighbour credit-scoring system. IMA J Manag Math 8(4):305–321
Hua Z, Wang Y, Xu X, Zhang B, Liang L (2007) Predicting corporate financial distress based on integration of support vector machine and logistic regression. Expert Syst Appl 33(2):434–440
Huang CL, Chen MC, Wang CJ (2007) Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl 33(4):847–856
Joanes DN (1993) Reject inference applied to logistic regression for credit scoring. IMA J Manag Math 5(1):35–43
Kvamme H, Sellereite N, Aas K, Sjursen S (2018) Predicting mortgage default using convolutional neural networks. Expert Syst Appl 102:207–217
Lessmann S, Baesens B, Seow HV, Thomas LC (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur J Oper Res 247(1):124–136
Li FC (2009) The hybrid credit scoring strategies based on KNN classifier. In: Sixth international conference on fuzzy systems and knowledge discovery, 2009. FSKD’09, vol 1, pp 330–334. IEEE
Li S, Tsang IW, Chaudhari NS (2012) Relevance vector machine based infinite decision agent ensemble learning for credit risk analysis. Expert Syst Appl 39(5):4947–4953
Liu L, Zhang H, Ji Y, Wu QJ (2019) Towards AI fashion design: an attribute-GAN model for clothing match. Neurocomputing 341:156–167
Luo C, Wu D, Wu D (2017) A deep learning approach for credit scoring using credit default swaps. Eng Appl Artif Intell 65:465–470
Marqués AI, García V, Sánchez JS (2013) On the suitability of resampling techniques for the class imbalance problem in credit scoring. J Oper Res Soc 64(7):1060–1070
Martens D, Baesens B, Van Gestel T, Vanthienen J (2007) Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 183(3):1466–1476
Mirza M, Osindero S (2014) Conditional generative adversarial nets. ArXiv preprint arXiv:1411.1784
Nanni L, Lumini A (2006) An experimental comparison of ensemble of classifiers for biometric data. Neurocomputing 69(13–15):1670–1673
Odena A (2016) Semi-supervised learning with generative adversarial networks. ArXiv preprint arXiv:1606.01583
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242
Tomczak JM, Zieba M (2015) Classification restricted Boltzmann machine for comprehensible credit scoring model. Expert Syst Appl 42(4):1789–1796
Tsai CF (2014) Combining cluster analysis with classifier ensembles to predict financial distress. Inf Fusion 16:46–58
Yeh IC, Lien Ch (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36(2):2473–2480
Zhang D, Zhou X, Leung SC, Zheng J (2010) Vertical bagging decision trees model for credit scoring. Expert Syst Appl 37(12):7838–7843
Zhang H, Sun Y, Liu L, Wang X, Li L, Liu W (2018) ClothingOut: a category-supervised GAN model for clothing segmentation and retrieval. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3691-y
Zhang Y, Wang D, Chen Y, Shang H, Tian Q (2017) Credit risk assessment based on long short-term memory model. In: International conference on intelligent computing, pp 700–712. Springer
Zojaji Z, Atani RE, Monadjemi AH et al (2016) A survey of credit card fraud detection techniques: data and technique oriented perspective. ArXiv preprint arXiv:1611.06439
Acknowledgements
This work was financially supported by the Shenzhen Project (ZDSYS201802051831427), National Natural Science Foundation of China (No. 61602013), and the Shenzhen Fundamental Research Project (No. JCYJ20170818091546869). Min Yang was sponsored by CCF-Tencent Open Research Fund.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lei, K., Xie, Y., Zhong, S. et al. Generative adversarial fusion network for class imbalance credit scoring. Neural Comput & Applic 32, 8451–8462 (2020). https://doi.org/10.1007/s00521-019-04335-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04335-1