On Class Imbalance Correction for Classification Algorithms in Credit Scoring

Conference paper
Part of the Operations Research Proceedings book series (ORP)


Credit scoring is often modeled as a binary classification task where defaults rarely occur and the classes generally are highly unbalanced. Although many new algorithms have been proposed in the recent past to mitigate this specific problem, the aspect of class imbalance is still underrepresented in research despite its great relevance for many business applications. Within the “Machine Learning in R” (mlr) framework methods for imbalance correction are readily available and can be integrated into a systematic classifier optimization process. Different strategies are discussed, extended and compared.


Random Forest Minority Class Class Imbalance Candidate Configuration Gower Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Baesens, B., van Gestel, T.: Credit Risk Management—Basic Concepts. Oxford University Press, Oxford (2009)Google Scholar
  2. 2.
    Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.: Benchmarking state of the art classification algorithms for credit scoring. J. Oper. Res. Soc. 54(6), 627–635 (2003)CrossRefGoogle Scholar
  3. 3.
    Bischl, B., Lang, M., Mersmann, O., Rahnenführer, J., Weihs, C.: BatchJobs and BatchExperiments: abstraction mechanisms for using R in batch environments (ACCEPTED). J. Stat. Soft. (2015)Google Scholar
  4. 4.
    Bischl, B., Schiffner, J., Weihs, C.: Benchmarking local classification methods. Comput. Stat. 28(6), 2599–2619 (2013)CrossRefGoogle Scholar
  5. 5.
    Bischl, B., Schiffner, J., Weihs, C.: Benchmarking classification algorithms on high-performance computing clusters. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds.) Data Analysis, Machine Learning and Knowledge Discovery, Studies in Classification, Data Analysis, and Knowledge Organization, pp. 23–31. Springer, Heidelberg (2014)Google Scholar
  6. 6.
    Bischl, B., Lang, M., Richter, J., Judt, L.: mlr: Machine Learning in R. R package version 2.0. (2014)
  7. 7.
    Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39(3), 3446–3453 (2012)CrossRefGoogle Scholar
  8. 8.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)Google Scholar
  9. 9.
    Crone, S., Finlay, S.: Instance sampling in credit scoring: an empirical study of sample size and balancing. Int. J. Forecast. 28(1), 224–238 (2012)CrossRefGoogle Scholar
  10. 10.
    Galar, M., Fernandez, A., Barrenechea Tartas, E., Bustince Sola, H., Herrera, F.: A review on ensembles for the class imbalance problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans. Syst. Man Cybern. Part C 42(4), 463–484 (2012)Google Scholar
  11. 11.
    Koch, P., Bischl, B., Flasch, O., Bartz-Beielstein, T., Weihs, C., Konen, W.: Tuning and evolution of support vector kernels. Evol. Intell. 5(3), 153–170 (2012)CrossRefGoogle Scholar
  12. 12.
    Lang, M., Kotthaus, H., Marwedel, P., Weihs, C. Rahnenführer, J., Bischl, B.: Automatic model selection for high-dimensional survival analysis. J. Stat. Comput. Simul. (2014)Google Scholar
  13. 13.
    Lessmann S., Seow H.-V., Baesens, B., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: A ten-year update. (2013)
  14. 14.
    Lopez-Ibanez, M., Dubois-Lacoste, J., Stützle, T., Birattari, M.: The irace Package: iterated racing for automatic algorithm configuration, Technical report TR/IRIDIA/2011-004. IRIDIA, Bruxelles (2011)Google Scholar
  15. 15.
    Strackeljahn, J., Jonscher, R., Prieur, S., Vogel, D., Deslaers, T., Keysers, D., Mauser, A., Bezrukov, I., Hegerath, A.: GfKl Data mining competition 2005—predicting liquidity crisis of companies. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 748–758. Springer (2005)Google Scholar
  16. 16.
    Szepannek, G., Gruhne, M., Bischl, B., Krey, S., Harczos, T., Klefenz, F., Dittmar, C., Weihs, C.: Perceptually based phoneme recognition in popular music. In: Locarek-Junge, H., Weihs, C. (eds.) Classification as a Tool for Research, pp. 751–758. Springer, Heidelberg (2010)Google Scholar
  17. 17.
    Szepannek, G., Schiffner, J., Wilson, J.C., Weihs, C.: Local modelling in classification. In: Perner, P. (ed.) Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects, pp. 153–164. Springer LNAI 5077, Berlin (2008)Google Scholar
  18. 18.
    Therneau, T., Atkinson, E.: In introduction to recursive partitioning using RPART routines, TR 61, Mayo Foundation. (1997)
  19. 19.
    Thomas, L.C., Edelman, D.B., Crook, J.N.: Credit scoring and its applications. SIAM (2002)Google Scholar
  20. 20.
    Vincotti, T., Hand, D.: Scorecard construction with unbalanced class sizes. J. Iran. Stat. Soc. 2, 189–205 (2002)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.LMU MünchenMunichGermany
  2. 2.Stralsund University of Applied SciencesStralsundGermany

Personalised recommendations