Introducing a Vector Space Model to Perform a Proactive Credit Scoring

  • Roberto SaiaEmail author
  • Salvatore CartaEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 914)


Many authoritative studies report how in these last years the consumer credit was up year on year, making it necessary to develop instruments able to assist the financial operators in some crucial tasks. The most important of them is to classify the loan applications as reliable or unreliable, on the basis of the customer information at their disposal. Such instruments of credit scoring allow the operators to reduce the financial losses, and for this reason they play a very important role. However, the design of effective credit scoring models is not an easy task, since it must face some problems, first among them the data imbalance in the model training. This problem arises because the number of default cases is usually much smaller than that of the non-default ones and this kind of distribution worsens the effectiveness of the state-of-the-art approaches used to define these models. This paper proposes a novel Linear Dependence Based (LDB) approach able to build a credit scoring model by using only the past non-default cases, overcoming both the imbalanced class distribution and the cold-start issues. It relies on the concept of linear dependence between the vector representations of the past and new loan applications, evaluating it in the context of a matrix. The experiments, performed by using two real-world datasets with a strong unbalanced distribution of data, show that the proposed approach achieves performance closer or better than that of one of the best state-of-the-art approaches of credit scoring such as random forests, even using only past non-default cases.


Business intelligence Decision support system Credit scoring Data mining Algorithms Metrics 



This research is partially funded by Regione Sardegna under project “Next generation Open Mobile Apps Development” (NOMAD), “Pacchetti Integrati di Agevolazione” (PIA) - Industria Artigianato e Servizi - Annualità 2013.


  1. 1.
    Henley, W., et al.: Construction of a k-nearest-neighbour credit-scoring system. IMA J. Manag. Math. 8, 305–321 (1997)CrossRefGoogle Scholar
  2. 2.
    Mester, L.J.: What’s the point of credit scoring? Bus. Rev. 3, 3–16 (1997)Google Scholar
  3. 3.
    Morrison, J.: Introduction to survival analysis in business. J. Bus. Forecast. 23, 18 (2004)Google Scholar
  4. 4.
    Brill, J.: The importance of credit scoring models in improving cash flow and collections. Bus. Credit. 100, 16–17 (1998)Google Scholar
  5. 5.
    Pozzolo, A.D., Caelen, O., Borgne, Y.L., Waterschoot, S., Bontempi, G.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41, 4915–4928 (2014)CrossRefGoogle Scholar
  6. 6.
    Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6, 20–29 (2004)CrossRefGoogle Scholar
  7. 7.
    Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429–449 (2002)CrossRefGoogle Scholar
  8. 8.
    Lessmann, S., Baesens, B., Seow, H., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247, 124–136 (2015)CrossRefGoogle Scholar
  9. 9.
    Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446–3453 (2012)CrossRefGoogle Scholar
  10. 10.
    Bhattacharyya, S., Jha, S., Tharakunnel, K.K., Westland, J.C.: Data mining for credit card fraud: a comparative study. Decis. Support. Syst. 50, 602–613 (2011)CrossRefGoogle Scholar
  11. 11.
    Saia, R., Carta, S.: A linear-dependence-based approach to design proactive credit scoring models. In: Fred, A.L.N., Dietz, J.L.G., Aveiro, D., Liu, K., Bernardino, J., Filipe, J. (eds.) Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2016), KDIR, vol. 1, Porto, Portugal, 9–11 November 2016, pp. 111–120. SciTePress (2016)Google Scholar
  12. 12.
    Doumpos, M., Zopounidis, C.: Credit scoring. In: Doumpos, M., Zopounidis, C. (eds.) Multicriteria Analysis in Finance, pp. 43–59. Springer, Cham (2014). Scholar
  13. 13.
    Ali, S., Smith, K.A.: On learning algorithm selection for classification. Appl. Soft Comput. 6, 119–138 (2006)CrossRefGoogle Scholar
  14. 14.
    Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)CrossRefGoogle Scholar
  15. 15.
    Siami, M., Hajimohammadi, Z., et al.: Credit scoring in banks and financial institutions via data mining techniques: a literature review. J. AI Data Min. 1, 119–129 (2013)Google Scholar
  16. 16.
    Chen, S.Y., Liu, X.: The contribution of data mining to information science. J. Inf. Sci. 30, 550–558 (2004)CrossRefGoogle Scholar
  17. 17.
    Alborzi, M., Khanbabaei, M.: Using data mining and neural networks techniques to propose a new hybrid customer behaviour analysis and credit scoring model in banking services based on a developed RFM analysis method. IJBIS 23, 1–22 (2016)CrossRefGoogle Scholar
  18. 18.
    Reichert, A.K., Cho, C.C., Wagner, G.M.: An examination of the conceptual issues involved in developing credit-scoring models. J. Bus. Econ. Stat. 1, 101–114 (1983)Google Scholar
  19. 19.
    Henley, W.E.: Statistical aspects of credit scoring. Ph.D. thesis, Open University (1994)Google Scholar
  20. 20.
    Desai, V.S., Crook, J.N., Overstreet, G.A.: A comparison of neural networks and linear scoring models in the credit union environment. Eur. J. Oper. Res. 95, 24–37 (1996)CrossRefGoogle Scholar
  21. 21.
    Blanco-Oliver, A., Pino-Mejías, R., Lara-Rubio, J., Rayo, S.: Credit scoring models for the microfinance industry using neural networks: evidence from Peru. Expert Syst. Appl. 40, 356–364 (2013)CrossRefGoogle Scholar
  22. 22.
    Henley, W.: A k-nearest-neighbour classifier for assessing consumer credit risk. Statistician 45, 77–95 (1996)CrossRefGoogle Scholar
  23. 23.
    Ong, C.S., Huang, J.J., Tzeng, G.H.: Building credit scoring models using genetic programming. Expert. Syst. Appl. 29, 41–47 (2005)CrossRefGoogle Scholar
  24. 24.
    Chi, B., Hsu, C.: A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model. Expert Syst. Appl. 39, 2650–2661 (2012)CrossRefGoogle Scholar
  25. 25.
    Saia, R., Carta, S.: An entropy based algorithm for credit scoring. In: Tjoa, A.M., Xu, L.D., Raffai, M., Novak, N.M. (eds.) CONFENIS 2016. LNBIP, vol. 268, pp. 263–276. Springer, Cham (2016). Scholar
  26. 26.
    Davis, R., Edelman, D., Gammerman, A.: Machine-learning algorithms for credit-card applications. IMA J. Manag. Math. 4, 43–51 (1992)CrossRefGoogle Scholar
  27. 27.
    Wang, G., Ma, J., Huang, L., Xu, K.: Two credit scoring models based on dual strategy ensemble trees. Knowl.-Based Syst. 26, 61–68 (2012)CrossRefGoogle Scholar
  28. 28.
    Hsieh, N.C.: Hybrid mining approach in the design of credit scoring models. Expert. Syst. Appl. 28, 655–665 (2005)CrossRefGoogle Scholar
  29. 29.
    Lee, T.S., Chen, I.F.: A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert. Syst. Appl. 28, 743–752 (2005)CrossRefGoogle Scholar
  30. 30.
    Wang, G., Hao, J., Ma, J., Jiang, H.: A comparative assessment of ensemble learning for credit scoring. Expert Syst. Appl. 38, 223–230 (2011)CrossRefGoogle Scholar
  31. 31.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)CrossRefGoogle Scholar
  32. 32.
    Vinciotti, V., Hand, D.J.: Scorecard construction with unbalanced class sizes. J. Iran. Stat. Soc. 2, 189–205 (2003)zbMATHGoogle Scholar
  33. 33.
    Marqués, A.I., García, V., Sánchez, J.S.: On the suitability of resampling techniques for the class imbalance problem in credit scoring. JORS 64, 1060–1070 (2013)CrossRefGoogle Scholar
  34. 34.
    Crone, S.F., Finlay, S.: Instance sampling in credit scoring: an empirical study of sample size and balancing. Int. J. Forecast. 28, 224–238 (2012)CrossRefGoogle Scholar
  35. 35.
    Zhu, J., Wang, H., Yao, T., Tsou, B.K.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Scott, D., Uszkoreit, H. (eds.) COLING 2008, 22nd International Conference on Computational Linguistics, Proceedings of the Conference, 18–22 August 2008, Manchester, UK, pp. 1137–1144 (2008)Google Scholar
  36. 36.
    Donmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 116–127. Springer, Heidelberg (2007). Scholar
  37. 37.
    Attenberg, J., Provost, F.J.: Inactive learning? Difficulties employing active learning in practice. SIGKDD Explor. 12, 36–41 (2010)CrossRefGoogle Scholar
  38. 38.
    Thanuja, V., Venkateswarlu, B., Anjaneyulu, G.: Applications of data mining in customer relationship management. J. Comput. Math. Sci. 2, 399–580 (2011)Google Scholar
  39. 39.
    Lika, B., Kolomvatsos, K., Hadjiefthymiades, S.: Facing the cold start problem in recommender systems. Expert Syst. Appl. 41, 2065–2073 (2014)CrossRefGoogle Scholar
  40. 40.
    Son, L.H.: Dealing with the new user cold-start problem in recommender systems: a comparative review. Inf. Syst. 58, 87–104 (2016)CrossRefGoogle Scholar
  41. 41.
    Fernández-Tobías, I., Tomeo, P., Cantador, I., Noia, T.D., Sciascio, E.D.: Accuracy and diversity in cross-domain recommendations for cold-start users with positive-only feedback. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016, pp. 119–122. ACM (2016)Google Scholar
  42. 42.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefGoogle Scholar
  43. 43.
    Moler, C.B.: Numerical Computing with MATLAB. SIAM, Philadelphia (2004)CrossRefGoogle Scholar
  44. 44.
    Quah, J.T.S., Sriganesh, M.: Real-time credit card fraud detection using computational intelligence. Expert Syst. Appl. 35, 1721–1732 (2008)CrossRefGoogle Scholar
  45. 45.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)CrossRefGoogle Scholar
  46. 46.
    Shannon, C.E.: A mathematical theory of communication. Mob. Comput. Commun. Rev. 5, 3–55 (2001)CrossRefGoogle Scholar
  47. 47.
    Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)CrossRefGoogle Scholar
  48. 48.
    Kwak, N., Choi, C.: Input feature selection for classification problems. IEEE Trans. Neural Netw. 13, 143–159 (2002)CrossRefGoogle Scholar
  49. 49.
    Jiang, F., Sui, Y., Zhou, L.: A relative decision entropy-based feature selection approach. Pattern Recognit. 48, 2151–2163 (2015)CrossRefGoogle Scholar
  50. 50.
    Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation (2011)Google Scholar
  51. 51.
    Faraggi, D., Reiser, B.: Estimation of the area under the ROC curve. Stat. Med. 21, 3093–3106 (2002)CrossRefGoogle Scholar
  52. 52.
    Salzberg, S.: On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min. Knowl. Discov. 1, 317–328 (1997)CrossRefGoogle Scholar
  53. 53.
    Liu, Y., Schumann, M.: Data mining feature selection for credit scoring models. J. Oper. Res. Soc. 56, 1099–1108 (2005)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Dipartimento di Matematica e InformaticaUniversità di CagliariCagliariItaly

Personalised recommendations