A Semi-supervised Approach for Reject Inference in Credit Scoring Using SVMs

  • Sebastián Maldonado
  • Gonzalo Paredes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6171)


This paper presents a novel semi-supervised approach that determines a linear predictor using Support Vector Machines (SVMs) and incorporates information on rejected loans, assuming that the labeled data (accepted applicants) and unlabeled data (rejected applicants) are not drawn from the same distribution. We use a self-training algorithm in order to predict how likely a rejected applicant would have repaid had the applicant received credit. A modification to the self-training algorithm based on Platt’s probabilistic output for SVMs is introduced. Experiments with two toy data sets; one well-known benchmark Credit Scoring data set, and one project performed for a Chilean financial institution demonstrate that our approach accomplishes the best classification performance compared to well-known reject inference alternatives and another state-of-the-art semi-supervised method for SVMs (Transductive SVM).


Semi-supervised learning Credit scoring Support vector machines Reject inference 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawala, A.K.: Learning with a probabilistic teacher. IEEE Transactions on Information Theory 16, 373–379 (1970)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Berger, A.N., Frame, W.S., Miller, N.H.: Credit scoring and the availability, price, and risk of small business credit. Journal of Money, Credit and Banking 37(2), 191–222 (2005)CrossRefGoogle Scholar
  3. 3.
    Blum, M.T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)Google Scholar
  4. 4.
    Castelli, V., Cover, T.M.: On the exponential value of labeled samples. Pattern Recognition Letters 16, 105–111 (1995)CrossRefGoogle Scholar
  5. 5.
    Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceeding of the Tenth International Workshop on Artificial Intelligence and Statistic (AISTAT 2005) (2005)Google Scholar
  6. 6.
    Chapelle, O., Scholkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2005)Google Scholar
  7. 7.
    Chen, G., Astebro, T.: A Maximum Likelihood Approach for Reject Inference in Credit scoring. Rotman School of Management Working Paper No. 07-05 (2006)Google Scholar
  8. 8.
    Chye, K.H., Chin, T.W., Peng, G.C.: Credit scoring using data mining techniques. Singapore Management Review 26(2), 25(23) (2004)Google Scholar
  9. 9.
    Collobert, R., Weston, J., Bottou, L.: Trading convexity for scalability. In: ICML 2006, 23rd International Conference on Machine Learning, Pittsburgh, USA (2006)Google Scholar
  10. 10.
    Culp, M., Michailidis, G.: An iterative algorithm for extending learners to a semisupervised setting. In: The 2007 Joint Statistical Meetings (2007)Google Scholar
  11. 11.
    Haffari, G., Sarkar, A.: Analysis of semi-supervised learning with the Yarowsky algorithm. In: 23rd Conference on Uncertainty in Artificial Intelligence (2007)Google Scholar
  12. 12.
    Hartley, H.O., Rao, J.N.K.: Classification and estimation in analysis of variance problems. Review of the International Statistical Institute 36, 141–147 (1968)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine, CA (1999), Google Scholar
  14. 14.
    Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: International Conference on Machine Learning, pp. 200–209 (1999)Google Scholar
  15. 15.
    Johnson, R., Zhang, T.: Two-view feature generation model for semi-supervised learning. In: The 24th International Conference on Machine Learning, pp. 25–27 (2007)Google Scholar
  16. 16.
    Maeireizo, B., Litman, D., Hwa, R.: Co-training for predicting emotions with spoken dialogue dat. In: The Companion Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, ACL (2004)Google Scholar
  17. 17.
    Maldonado, S., Weber, R.: A wrapper method for feature selection using Support Vector Machines. Information Sciences 179(13), 2208–2217 (2009)CrossRefGoogle Scholar
  18. 18.
    Martens, D., Baesens, B., Van Gestel, T., Vanthienen, J.: Comprehensible credit scoring models using rule extraction from Support Vector Machines. European Journal of Operational Research 183(3), 1466–1476 (2007)zbMATHCrossRefGoogle Scholar
  19. 19.
    Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Ninth International Conference on Information and Knowledge Management, pp. 86–93 (2000)Google Scholar
  20. 20.
    Platt, J.: Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)Google Scholar
  21. 21.
    Scudder, H.J.: Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory 11, 363–371 (1965)zbMATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Siddiqi, N.: Credit Risk Scorecards, Developing and Implementing Intelligent Credit scoring, 1st edn. Wiley & Sons, Chichester (2005)Google Scholar
  23. 23.
    Thomas, L.C.: A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. International Journal of Forecasting 16(2), 149–162 (2002)CrossRefGoogle Scholar
  24. 24.
    Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)zbMATHCrossRefGoogle Scholar
  25. 25.
    Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, New York (1998)zbMATHGoogle Scholar
  26. 26.
    Xu, J.-M., Fumera, G., Roli, F., Zhou, Z.-H.: Training SpamAssassin with active semi-supervised learning. In: Proceedings of the 6th Conference on Email and Anti-Spam (CEAS 2009), Mountain View, CA (2009)Google Scholar
  27. 27.
    Zhu, X.: Semi-Supervised Learning Literature Survey. Computer Sciences TR 1530, University of Wisconsin, Madison (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Sebastián Maldonado
    • 1
  • Gonzalo Paredes
    • 1
  1. 1.Department of Industrial EngineeringUniversity of Chile 

Personalised recommendations