Journal of the Operational Research Society

, Volume 65, Issue 3, pp 408–415 | Cite as

Selection bias in credit scorecard evaluation

Special Issue Paper


Selection bias is a perennial problem when constructing and evaluating scorecards. It is familiar in the context of reject inference, but crops up in many other situations as well. In this paper, we examine the impact of how accepting or rejecting customers using one scorecard leads to biased comparisons of performance between that scorecard and others. This has important implications for organisations seeking to improve or replace scorecards.


credit scoring Kolmogorov–Smirnov statistic area under the ROC curve selection bias 



We are grateful to the UK bank that provided the UPL data for the real example, and to the anonymous referees for deep and helpful comments.


  1. Adams NM, Tasoulis DK, Anagnostopoulos C and Hand DJ (2010). Temporally-adaptive linear classification for handling population drift in credit scoring. In: Lechevallier Y and Saporta G (eds). COMPSTAT2010, Proceedings of the 19th International Conference on Computational Statistics. Heidelberg: Springer, pp 167–176.Google Scholar
  2. Arnold BC, Beaver RJ, Groeneveld RA and Meeker WQ (1993). The nontruncated marginal of a truncated bivariate normal distribution. Psychometrika 58 (3): 471–488.CrossRefGoogle Scholar
  3. Babbage C (1830). Reflections on the Decline of Science in England, and on Some of Its Causes. B. Fellowes: London.Google Scholar
  4. Bolton RJ and Hand DJ (2002). Statistical fraud detection: A review (with discussion). Statistical Science 17 (3): 235–255.CrossRefGoogle Scholar
  5. Boyes WJ, Hoffman DL and Low SA (1989). An econometric analysis of the bank credit scoring problem. Journal of Econometrics 40 (1): 3–14.CrossRefGoogle Scholar
  6. Cohen AC (1991). Truncated and Censored Samples: Theory and Applications. Marcel Dekker: New York.CrossRefGoogle Scholar
  7. Crook J and Banasik J (2004). Does reject inference really improve the performance of application scoring models? Journal of Banking and Finance 28 (4): 857–874.CrossRefGoogle Scholar
  8. Ellenberg JH (1994). Selection bias in observational and experimental studies. Statistics in Medicine 13 (5–7): 557–567.CrossRefGoogle Scholar
  9. Feelders AJ (2000). Credit scoring and reject inference with mixture models. International Journal of Intelligent Systems in Accounting, Finance, and Management 9 (1): 1–8.CrossRefGoogle Scholar
  10. Glennon DC (2001). Model design and validation: Identifying potential sources of model risk. In: Elizabeth M (ed). Handbook of Credit Scoring. Glenlake Publishing Co. Ltd.: Chicago, pp 243–274.Google Scholar
  11. Hand DJ (2001). Reject inference in credit operations. In: Elizabeth M (ed) Handbook of Credit Scoring. Chicago: Glenlake Publishing, pp 225–240.Google Scholar
  12. Hand DJ (2009). Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learning 77 (1): 103–123.CrossRefGoogle Scholar
  13. Hand DJ (2010). Fraud detection in telecommunications and banking: Discussion of Becker, Volinsky, and Wilks (2010) and Sudjianto et al (2010). Technometrics 52 (1): 34–38.CrossRefGoogle Scholar
  14. Hand DJ and Crowder MJ (2012). Overcoming selectivity bias in evaluating new fraud detection systems for revolving credit operations. International Journal of Forecasting 28 (1): 216–223.CrossRefGoogle Scholar
  15. Hand DJ and Henley WE (1993). Can reject inference ever work? IMA Journal of Mathematics Applied in Business and Industry 5 (1): 45–55.Google Scholar
  16. Heckman JJ (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement 5 (4): 475–492.Google Scholar
  17. Heckman JJ (1979). Sample selection bias as a specification error. Econometrica 47 (1): 153–161.CrossRefGoogle Scholar
  18. Judson HF (2004). The Great Betrayal: Fraud in Science. Harcourt Inc: Orlando.Google Scholar
  19. Kelly MG (1998). Tackling change and uncertainty in credit scoring. PhD Thesis, Department of Mathematics, The Open University, UK.Google Scholar
  20. Kelly MG, Hand DJ and Adams NM (1999). The impact of changing populations on classifier performance. In: Chaudhuri S and Madigan D (eds). Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery: New York, pp 367–371.CrossRefGoogle Scholar
  21. Maddala GS (1983). Limited-dependent and Qualitative Variables in Econometrics. Cambridge University Press: New York.CrossRefGoogle Scholar
  22. Pavlidis NG, Tasoulis DK, Adams NM and Hand DJ (2012). Adaptive consumer credit classification. Journal of the Operational Research Society 63 (12): 1645–1654.CrossRefGoogle Scholar
  23. Pires AM and Branco JA (2010). A statistical model to explain the Mendel–Fisher controversy. Statistical Science 25 (4): 545–565.CrossRefGoogle Scholar
  24. Phua C, Lee V, Smith K and Gayler R (2010). A comprehensive survey of data mining-based fraud detection research. arXiv: 1009.6119v1.Google Scholar
  25. Sackett DL (1979). Bias in analytic research. Journal of Chronic Diseases 32 (1–2): 51–63.CrossRefGoogle Scholar

Copyright information

© Operational Research Society Ltd. 2013

Authors and Affiliations

  1. 1.Department of MathematicsImperial College LondonLondonUK
  2. 2.Heilbronn Institute for Mathematical Research, University of BristolBristolUK

Personalised recommendations