Skip to main content

On Class Imbalance Correction for Classification Algorithms in Credit Scoring

  • Conference paper
  • First Online:
Operations Research Proceedings 2014

Part of the book series: Operations Research Proceedings ((ORP))

Abstract

Credit scoring is often modeled as a binary classification task where defaults rarely occur and the classes generally are highly unbalanced. Although many new algorithms have been proposed in the recent past to mitigate this specific problem, the aspect of class imbalance is still underrepresented in research despite its great relevance for many business applications. Within the “Machine Learning in R” (mlr) framework methods for imbalance correction are readily available and can be integrated into a systematic classifier optimization process. Different strategies are discussed, extended and compared.

The opinions expressed in this paper are those of the authors and do not reflect views of any organization or employer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.cs.gsu.edu/~zding/research/benchmark-data.php.

  2. 2.

    http://www.kaggle.com/c/GiveMeSomeCredit.

References

  1. Baesens, B., van Gestel, T.: Credit Risk Management—Basic Concepts. Oxford University Press, Oxford (2009)

    Google Scholar 

  2. Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.: Benchmarking state of the art classification algorithms for credit scoring. J. Oper. Res. Soc. 54(6), 627–635 (2003)

    Article  Google Scholar 

  3. Bischl, B., Lang, M., Mersmann, O., Rahnenführer, J., Weihs, C.: BatchJobs and BatchExperiments: abstraction mechanisms for using R in batch environments (ACCEPTED). J. Stat. Soft. (2015)

    Google Scholar 

  4. Bischl, B., Schiffner, J., Weihs, C.: Benchmarking local classification methods. Comput. Stat. 28(6), 2599–2619 (2013)

    Article  Google Scholar 

  5. Bischl, B., Schiffner, J., Weihs, C.: Benchmarking classification algorithms on high-performance computing clusters. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds.) Data Analysis, Machine Learning and Knowledge Discovery, Studies in Classification, Data Analysis, and Knowledge Organization, pp. 23–31. Springer, Heidelberg (2014)

    Google Scholar 

  6. Bischl, B., Lang, M., Richter, J., Judt, L.: mlr: Machine Learning in R. R package version 2.0. http://CRAN.R-project.org/package=mlr (2014)

  7. Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39(3), 3446–3453 (2012)

    Article  Google Scholar 

  8. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Google Scholar 

  9. Crone, S., Finlay, S.: Instance sampling in credit scoring: an empirical study of sample size and balancing. Int. J. Forecast. 28(1), 224–238 (2012)

    Article  Google Scholar 

  10. Galar, M., Fernandez, A., Barrenechea Tartas, E., Bustince Sola, H., Herrera, F.: A review on ensembles for the class imbalance problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans. Syst. Man Cybern. Part C 42(4), 463–484 (2012)

    Google Scholar 

  11. Koch, P., Bischl, B., Flasch, O., Bartz-Beielstein, T., Weihs, C., Konen, W.: Tuning and evolution of support vector kernels. Evol. Intell. 5(3), 153–170 (2012)

    Article  Google Scholar 

  12. Lang, M., Kotthaus, H., Marwedel, P., Weihs, C. Rahnenführer, J., Bischl, B.: Automatic model selection for high-dimensional survival analysis. J. Stat. Comput. Simul. (2014)

    Google Scholar 

  13. Lessmann S., Seow H.-V., Baesens, B., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: A ten-year update. http://www.business-school.ed.ac.uk/waf/crc_archive/2013/42.pdf (2013)

  14. Lopez-Ibanez, M., Dubois-Lacoste, J., Stützle, T., Birattari, M.: The irace Package: iterated racing for automatic algorithm configuration, Technical report TR/IRIDIA/2011-004. IRIDIA, Bruxelles (2011)

    Google Scholar 

  15. Strackeljahn, J., Jonscher, R., Prieur, S., Vogel, D., Deslaers, T., Keysers, D., Mauser, A., Bezrukov, I., Hegerath, A.: GfKl Data mining competition 2005—predicting liquidity crisis of companies. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 748–758. Springer (2005)

    Google Scholar 

  16. Szepannek, G., Gruhne, M., Bischl, B., Krey, S., Harczos, T., Klefenz, F., Dittmar, C., Weihs, C.: Perceptually based phoneme recognition in popular music. In: Locarek-Junge, H., Weihs, C. (eds.) Classification as a Tool for Research, pp. 751–758. Springer, Heidelberg (2010)

    Google Scholar 

  17. Szepannek, G., Schiffner, J., Wilson, J.C., Weihs, C.: Local modelling in classification. In: Perner, P. (ed.) Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects, pp. 153–164. Springer LNAI 5077, Berlin (2008)

    Google Scholar 

  18. Therneau, T., Atkinson, E.: In introduction to recursive partitioning using RPART routines, TR 61, Mayo Foundation. http://www.mayo.edu/hsr/techrpt/61.pdf (1997)

  19. Thomas, L.C., Edelman, D.B., Crook, J.N.: Credit scoring and its applications. SIAM (2002)

    Google Scholar 

  20. Vincotti, T., Hand, D.: Scorecard construction with unbalanced class sizes. J. Iran. Stat. Soc. 2, 189–205 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bernd Bischl , Tobias Kühn or Gero Szepannek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Bischl, B., Kühn, T., Szepannek, G. (2016). On Class Imbalance Correction for Classification Algorithms in Credit Scoring. In: Lübbecke, M., Koster, A., Letmathe, P., Madlener, R., Peis, B., Walther, G. (eds) Operations Research Proceedings 2014. Operations Research Proceedings. Springer, Cham. https://doi.org/10.1007/978-3-319-28697-6_6

Download citation

Publish with us

Policies and ethics