Advances in Data Analysis and Classification

, Volume 11, Issue 1, pp 121–138 | Cite as

Advances in credit scoring: combining performance and interpretation in kernel discriminant analysis

  • Caterina Liberati
  • Furio Camillo
  • Gilbert Saporta
Regular Article

Abstract

Due to the recent financial turmoil, a discussion in the banking sector about how to accomplish long term success, and how to follow an exhaustive and powerful strategy in credit scoring is being raised up. Recently, the significant theoretical advances in machine learning algorithms have pushed the application of kernel-based classifiers, producing very effective results. Unfortunately, such tools have an inability to provide an explanation, or comprehensible justification, for the solutions they supply. In this paper, we propose a new strategy to model credit scoring data, which exploits, indirectly, the classification power of the kernel machines into an operative field. A reconstruction process of the kernel classifier is performed via linear regression, if all predictors are numerical, or via a general linear model, if some or all predictors are categorical. The loss of performance, due to such approximation, is balanced by better interpretability for the end user, which is able to order, understand and to rank the influence of each category of the variables set in the prediction. An Italian bank case study has been illustrated and discussed; empirical results reveal a promising performance of the introduced strategy.

Keywords

Credit scoring Kernel discriminant analysis DISQUAL  Small and medium enterprises 

Mathematics Subject Classification

62H30 62P20 

References

  1. Abdou H, Pointon J, El Masry A (2008) Neural nets versus conventional techniques in credit scoring in Egyptian banking. Expert Syst Appl 35:1275–1292CrossRefGoogle Scholar
  2. Akkoç S (2012) An empirical comparison of conventional techniques, neural networks and the three stage hybrid adaptive neuro fuzzy inference system (anfis) model for credit scoring analysis: The case of Turkish credit card data. Eur J Oper Res 222:168–178CrossRefGoogle Scholar
  3. Altman E, Sabato G (2007) Modeling credit risk for SMES: evidence from U.S. market. ABACUS 43(3):332–357CrossRefGoogle Scholar
  4. Altman E, Sabato G, Wilson N (2010) The value of non-financial information in small and medium-sized enterprise risk management. J Credit Risk 6(2):95–127CrossRefGoogle Scholar
  5. Angelini E, Di Tollo G, Roli A (2008) A neural network approach for credit risk evaluation. Q Rev Econ Finance 48:733–755CrossRefGoogle Scholar
  6. Back B, Laitinen T, Sere K, van Wezel M (1996) Choosing bankruptcy predictors using discriminant analysis, logit analysis, and genetic algorithms. In: Proceedings of the 1st international meeting on artificial intelligence in accounting, finance and tax, pp 337–356Google Scholar
  7. Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003) Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635CrossRefMATHGoogle Scholar
  8. Barakat N, Bradley AP (2010) Evaluating consumer loans using neural networks. Neurocomputing 74:178–190CrossRefGoogle Scholar
  9. Basel I (2011) A global regulatory framework for more resilient banks and banking systemsGoogle Scholar
  10. Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12:2385–2404CrossRefGoogle Scholar
  11. Benzécri J (1973) L’analyse des données, No. v. 2. L’analyse des données, DunodGoogle Scholar
  12. Benzécri JP (1979) Sur le calcul des taux d’inertie dans l’analyse d’un questionnaire, addendum et erratum à (bin. mult.). Cah Anal Données 4(3):377–378Google Scholar
  13. Bozdogan H, Camillo F, Liberati C (2006) On the choice of the kernel function in kernel discriminant analysis using information complexity. In: Zani S, Cerioli A, Riani M, Vichi M (eds) Data analysis, classification and the forward search, studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 11–21CrossRefGoogle Scholar
  14. Cawley GC, Talbot NLC (2003) Efficient leave-one-out cross-validation of kernel Fisher discriminant classifiers. Pattern Recognit 36(11):2585–2592CrossRefMATHGoogle Scholar
  15. Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46:131–159CrossRefMATHGoogle Scholar
  16. Cunningham P, Doyle D, Loughrey J (2003) An evaluation of the usefulness of case-based explanation. In: Langley P (ed) Proceedings of the fifth international conference on case-based reasoning (ICCBR 2003). Morgan Kaufmann, New York, pp 122–130Google Scholar
  17. Derelioğlu G, Gürgen F (2011) Knowledge discovery using neural approach for SME+S credit risk analysis problem in Turkey. Expert Syst Appl 38:9313–9318CrossRefGoogle Scholar
  18. Duda RO, Hart P, Stork D (2000) Pattern classification. Wiley, New YorkGoogle Scholar
  19. Friedman JH (1989) Regularized discriminant analysis. J Am Stat Assoc 84(405):165–175MathSciNetCrossRefGoogle Scholar
  20. Gönen GB, Gönen M, Gürgen F (2012) Credit rating analysis with support vector machines and neural networks: a market comparative study. Expert Syst Appl 39:11709–11717Google Scholar
  21. Greenacre MJ (1984) Theory and applications of correspondence analysis. Academic Press, LondonMATHGoogle Scholar
  22. Grunet J, Norden L, Weber M (2008) The role of non-financial factors in internal credit ratings. J Bank Finance 2:509–531Google Scholar
  23. Hill P, Wilson N (2007) Predicting the insolvency of unlisted companies. In: Working paper, CMRC, Leeds UniversityGoogle Scholar
  24. Hosmer D, Lemeshow S (1989) Applied logistic regression. Wiley, New YorkMATHGoogle Scholar
  25. Huang CL, Wang CJ (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31:231–240CrossRefGoogle Scholar
  26. Huang Z, Chen H, Hsu CJ, Chen WH, Wu S (2004) Credit rating analysis with support vector machines and neural networks: a market comparative study. Decis Support Syst 37(4):543–558CrossRefGoogle Scholar
  27. Huang YM, Hung C, Jiau HC (2006) Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Anal Real World Appl 7:720–747MathSciNetCrossRefMATHGoogle Scholar
  28. Huang CL, Chen MC, Wang CJ (2007) Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl 33:847–856CrossRefGoogle Scholar
  29. Karush W (1939) Minima of functions of several variables with inequalities as side constraints. M.sc. thesis, University of ChicagoGoogle Scholar
  30. Khandani AE, Kim AJ, Lo AW (2010) Consumer credit-risk models via machine-learning algorithms. J Bank Finance 34:2767–2787CrossRefGoogle Scholar
  31. Khashman A (2010) Neural networks for credit risk evaluation: investigation of different neural models and learning schemes. Expert Syst Appl 37:6233–6239CrossRefGoogle Scholar
  32. Kim HS, Sohn SY (2010) Support vector machines for default prediction of SMES based on technology credit. Eur J Oper Res 201:838–846CrossRefMATHGoogle Scholar
  33. Kuhn HW, Tucker AW (1951) Nonlinear programming. Proceedings of the second Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 481–492Google Scholar
  34. Lebart L, Morineau A, Warwick K (1984) Multivariate descriptive statistical analysis. Wiley, New YorkGoogle Scholar
  35. Liberati C, Howe A, Bozdogan H (2009) Data adaptive simultaneous parameter and kernel selection in kernel discriminant analysis (KDA) using information complexity. J Pattern Recognit Res 4(1):119–132CrossRefGoogle Scholar
  36. Malhotra R, Malhotra DK (2003) Evaluating consumer loans using neural networks. Omega 31:83–96CrossRefGoogle Scholar
  37. Mavri M, Angelis V, Loannou G (2008) A two-stage dynamic credit scoring model based on customers profiles and time horizon. J Financ Serv Market 13(1):17–27CrossRefGoogle Scholar
  38. Mays E (2004) Credit scoring for risk managers. The handbook for lenders, Thomson LearningGoogle Scholar
  39. Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans R Soc LondGoogle Scholar
  40. Mika S, Rätsch G, Weston J, Schölkopf B, Müller KR (1999) Fisher discriminant analysis with kernels. In: Neural networks for signal processing, vol IX. Proceedings of the 1999 IEEE signal processing society workshop, pp 41–48Google Scholar
  41. Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 2:181–201CrossRefGoogle Scholar
  42. Ong C, Huang J, Tzeng GH (2005) Building credit scoring models using genetic programming. Expert Syst Appl 29(1):41–47CrossRefGoogle Scholar
  43. Peel M, Peel D (1989) A multi-logit approach to predicting corporate failure—some evidence for the UK corporate sector. Omega Int J Manag Sci 16(4):309–318CrossRefGoogle Scholar
  44. Ping Y, Yongheng L (2011) Neighborhood rough set and SVM based hybrid credit scoring classifier. Expert Syst Appl 38:11300–11304Google Scholar
  45. Press S (1975) Estimation of a normal covariance matrix. Santa Monica Rand Corporation, Santa MonicaGoogle Scholar
  46. Saporta G (1977) Une méthode et un programme d’analyse discriminante sur variables qualitatives. In: Diday E (ed) Analyse des Données et Informatique, INRIA, pp 201–210Google Scholar
  47. Schölkopf B, Burges C, Smola AJ (1999a) Advances in kernel methods. MIT Press, CambrigeMATHGoogle Scholar
  48. Schölkopf B, Mika S, Burges C, Knirsch P, Müller KR, Rätsch G, Smola AJ (1999b) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 5:1000–1017CrossRefGoogle Scholar
  49. Shi Y, Wise M, Luo M, Lin Y (2001) Data mining in credit card portfolio management: a multiple criteria decision making approach. In: Koksalan M, Zionts S (eds) Multiple criteria decision making in the new millennium. Springer, Heidelberg, pp 427–436CrossRefGoogle Scholar
  50. Shi Y, Peng Y, Xu W, Tang X (2002) Data mining via multiple criteria linear programming: applications in credit card portfolio management. Int J Inf Technol Decis Mak 1:131–151CrossRefGoogle Scholar
  51. Smalz R, Conrad M (1994) Combining evolution with credit apportionment: a new learning algorithm for neural nets. Neural Netw 7(2):341–351CrossRefGoogle Scholar
  52. Soares C, Brazdil PB (2006) Selecting parameters of SVM using meta-learning and kernel matrix-based meta-features. In: Proceedings of the 2006 ACM symposium on applied computing, ACM, New York, SAC ’06, pp 564–568. doi:10.1145/1141277.1141408
  53. Suykens J, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300CrossRefMATHGoogle Scholar
  54. Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific, SingaporeCrossRefMATHGoogle Scholar
  55. Thomaz C, Boardman J, Hill D, Hajnal J, Edwards D, Rutherford M, Gillies D, Rueckert D (2004) Using a maximum uncertainty LDA-based approach to classify and analyse MR brain images. Medical image computing and computer-assisted intervention MICCAI 2004. Springer, Berlin, pp 291–300CrossRefGoogle Scholar
  56. Van Gestel T, Baesens B, Suykens JAK, Van den Poel D, Baestaens DE, Willekens M (2006) Bayesian kernel based classification for financial distress detection. Eur J Oper Res 172:979–1003CrossRefMATHGoogle Scholar
  57. Vapnik V (1995) The nature of statistical learning theory. Springer, New YorkCrossRefMATHGoogle Scholar
  58. Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATHGoogle Scholar
  59. Varetto F (1998) Genetic algorithms applications in the analysis of insolvency risk. J Bank Finance 22:1421–1439CrossRefGoogle Scholar
  60. Wiginton JC (1980) A note on the comparison of logit and discriminant models of consumer credit behavior. J Financ Quant Anal 15:757–770CrossRefGoogle Scholar
  61. Yao P, Wu C, Yang M (2009) Credit risk assessment model of commercial banks based on fuzzy neural network. In: Proceedings of the sixth international symposium on neural networksGoogle Scholar
  62. Yap BW, Ong SH, Husain N (2011) Using data mining to improve assessment of credit worthiness via credit scoring models. Expert Syst Appl 38:13274–13283Google Scholar
  63. Yoon JS, Kwon YS (2010) A practical approach to bankruptcy prediction for small businesses: substituting the unavailable financial data for credit card sales information. Expert Syst Appl 37:3624–3629CrossRefGoogle Scholar
  64. Zhang K, Lan L, Wang Z, Moerchen F (2012) Scaling up kernel svm on limited resources: a low-rank linearization approach. J Mach Learn Res Proc Track 22:1425–1434Google Scholar
  65. Zhou X, Shi W, Tian Y (2011) Genetic algorithms applications in the analysis of insolvency risk. Expert Syst Appl 38:4272–4279CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Caterina Liberati
    • 1
  • Furio Camillo
    • 2
  • Gilbert Saporta
    • 3
  1. 1.Department of Economics Management and Statistics (DEMS)Università degli Studi di Milano-BicoccaMilanItaly
  2. 2.Department of Statistical SciencesUniversità di BolognaBolognaItaly
  3. 3.CEDRIC-CNAMParisFrance

Personalised recommendations