Skip to main content

On the Discriminative Power of Credit Scoring Systems Trained on Independent Samples

  • Conference paper
  • First Online:

Abstract

The aim of this work is to assess the importance of independence assumption in behavioral scorings created using logistic regression. We develop four sampling methods that control which observations associated to each client are to be included in the training set, avoiding a functional dependence between observations of the same client. We then calibrate logistic regressions with variable selection on the samples created by each method, plus one using all the data in the training set (biased base method), and validate the models on an independent data set. We find that the regression built using all the observations shows the highest area under the ROC curve and Kolmogorv–Smirnov statistics, while the regression that uses the least amount of observations shows the lowest performance and highest variance of these indicators. Nevertheless, the fourth selection algorithm presented shows almost the same performance as the base method using just 14 % of the dataset, and 14 less variables. We conclude that violating the independence assumption does not impact strongly on results and, furthermore, trying to control it by using less data can harm the performance of calibrated models, although a better sampling method does lead to equivalent results with a far smaller dataset needed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Archer, K. J., Lemeshow, S., & Hosmer, D. W. (2007). Goodness-of-fit tests for logistic regression models when data are collected using a complex sampling design. Computational Statistics & Data Analysis, 51, 4450–4464.

    Article  MathSciNet  MATH  Google Scholar 

  • Basel committee on banking supervision (2006). Basel II: International convergence of capital measurement and capital standards: A revised framework—comprehensive version. http://www.bis.org/publ/bcbsca.htm. Accessed 15 October 2011.

  • Hosmer, D., & Lemeshow, H. (2000). Applied logistic regression. New York: Wiley.

    Book  MATH  Google Scholar 

  • Medema, L., Koning, R. H., & Lensink, R. (2007). A practical approach to validating a PD model. Journal of Banking and Finance, 33, 701–708.

    Article  Google Scholar 

  • Thomas, L. C., Crook, J. N., & Edelman, D. B. (2002). Credit scoring and its applications. Philadelphia: SIAM.

    Book  MATH  Google Scholar 

  • Verstraeten, G., & van der Poel, D. (2005). The impact of sample bias on consumer credit scoring performance and profitability. Journal of the Operational Research Society, 56, 981–992.

    Article  MATH  Google Scholar 

  • White, H., & Domowitz, I. (1984). Nonlinear regression with dependent observations. Econometrica, 52, 143–162.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The work reported in this paper has been partially funded by the Finance Center, DII, Universidad de Chile, with the support of bank Bci.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miguel Biron .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Biron, M., Bravo, C. (2014). On the Discriminative Power of Credit Scoring Systems Trained on Independent Samples. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds) Data Analysis, Machine Learning and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01595-8_27

Download citation

Publish with us

Policies and ethics