On Class Imbalance Correction for Classification Algorithms in Credit Scoring

Bischl, Bernd; Kühn, Tobias; Szepannek, Gero

doi:10.1007/978-3-319-28697-6_6

Bernd Bischl⁷,
Tobias Kühn⁷ &
Gero Szepannek⁸

Part of the book series: Operations Research Proceedings ((ORP))

2010 Accesses
3 Citations

Abstract

Credit scoring is often modeled as a binary classification task where defaults rarely occur and the classes generally are highly unbalanced. Although many new algorithms have been proposed in the recent past to mitigate this specific problem, the aspect of class imbalance is still underrepresented in research despite its great relevance for many business applications. Within the “Machine Learning in R” (mlr) framework methods for imbalance correction are readily available and can be integrated into a systematic classifier optimization process. Different strategies are discussed, extended and compared.

The opinions expressed in this paper are those of the authors and do not reflect views of any organization or employer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Baesens, B., van Gestel, T.: Credit Risk Management—Basic Concepts. Oxford University Press, Oxford (2009)
Google Scholar
Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.: Benchmarking state of the art classification algorithms for credit scoring. J. Oper. Res. Soc. 54(6), 627–635 (2003)
Article Google Scholar
Bischl, B., Lang, M., Mersmann, O., Rahnenführer, J., Weihs, C.: BatchJobs and BatchExperiments: abstraction mechanisms for using R in batch environments (ACCEPTED). J. Stat. Soft. (2015)
Google Scholar
Bischl, B., Schiffner, J., Weihs, C.: Benchmarking local classification methods. Comput. Stat. 28(6), 2599–2619 (2013)
Article Google Scholar
Bischl, B., Schiffner, J., Weihs, C.: Benchmarking classification algorithms on high-performance computing clusters. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds.) Data Analysis, Machine Learning and Knowledge Discovery, Studies in Classification, Data Analysis, and Knowledge Organization, pp. 23–31. Springer, Heidelberg (2014)
Google Scholar
Bischl, B., Lang, M., Richter, J., Judt, L.: mlr: Machine Learning in R. R package version 2.0. http://CRAN.R-project.org/package=mlr (2014)
Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39(3), 3446–3453 (2012)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Google Scholar
Crone, S., Finlay, S.: Instance sampling in credit scoring: an empirical study of sample size and balancing. Int. J. Forecast. 28(1), 224–238 (2012)
Article Google Scholar
Galar, M., Fernandez, A., Barrenechea Tartas, E., Bustince Sola, H., Herrera, F.: A review on ensembles for the class imbalance problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans. Syst. Man Cybern. Part C 42(4), 463–484 (2012)
Google Scholar
Koch, P., Bischl, B., Flasch, O., Bartz-Beielstein, T., Weihs, C., Konen, W.: Tuning and evolution of support vector kernels. Evol. Intell. 5(3), 153–170 (2012)
Article Google Scholar
Lang, M., Kotthaus, H., Marwedel, P., Weihs, C. Rahnenführer, J., Bischl, B.: Automatic model selection for high-dimensional survival analysis. J. Stat. Comput. Simul. (2014)
Google Scholar
Lessmann S., Seow H.-V., Baesens, B., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: A ten-year update. http://www.business-school.ed.ac.uk/waf/crc_archive/2013/42.pdf (2013)
Lopez-Ibanez, M., Dubois-Lacoste, J., Stützle, T., Birattari, M.: The irace Package: iterated racing for automatic algorithm configuration, Technical report TR/IRIDIA/2011-004. IRIDIA, Bruxelles (2011)
Google Scholar
Strackeljahn, J., Jonscher, R., Prieur, S., Vogel, D., Deslaers, T., Keysers, D., Mauser, A., Bezrukov, I., Hegerath, A.: GfKl Data mining competition 2005—predicting liquidity crisis of companies. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 748–758. Springer (2005)
Google Scholar
Szepannek, G., Gruhne, M., Bischl, B., Krey, S., Harczos, T., Klefenz, F., Dittmar, C., Weihs, C.: Perceptually based phoneme recognition in popular music. In: Locarek-Junge, H., Weihs, C. (eds.) Classification as a Tool for Research, pp. 751–758. Springer, Heidelberg (2010)
Google Scholar
Szepannek, G., Schiffner, J., Wilson, J.C., Weihs, C.: Local modelling in classification. In: Perner, P. (ed.) Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects, pp. 153–164. Springer LNAI 5077, Berlin (2008)
Google Scholar
Therneau, T., Atkinson, E.: In introduction to recursive partitioning using RPART routines, TR 61, Mayo Foundation. http://www.mayo.edu/hsr/techrpt/61.pdf (1997)
Thomas, L.C., Edelman, D.B., Crook, J.N.: Credit scoring and its applications. SIAM (2002)
Google Scholar
Vincotti, T., Hand, D.: Scorecard construction with unbalanced class sizes. J. Iran. Stat. Soc. 2, 189–205 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

LMU München, Munich, Germany
Bernd Bischl & Tobias Kühn
Stralsund University of Applied Sciences, Stralsund, Germany
Gero Szepannek

Authors

Bernd Bischl
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Kühn
View author publications
You can also search for this author in PubMed Google Scholar
Gero Szepannek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Bernd Bischl , Tobias Kühn or Gero Szepannek .

Editor information

Editors and Affiliations

Operations Research, RWTH Aachen University, Aachen, Germany
Marco Lübbecke
Lehrstuhl II für Mathematik, RWTH Aachen University, Aachen, Germany
Arie Koster
RWTH Aachen University, Aachen, Nordrhein-Westfalen, Germany
Peter Letmathe
E.ON Energy Research Center, RWTH Aachen University, Aachen, Nordrhein-Westfalen, Germany
Reinhard Madlener
Management Science, RWTH Aachen University, Aachen, Nordrhein-Westfalen, Germany
Britta Peis
Chair of Operations Management, RWTH Aachen University, Aachen, Germany
Grit Walther

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bischl, B., Kühn, T., Szepannek, G. (2016). On Class Imbalance Correction for Classification Algorithms in Credit Scoring. In: Lübbecke, M., Koster, A., Letmathe, P., Madlener, R., Peis, B., Walther, G. (eds) Operations Research Proceedings 2014. Operations Research Proceedings. Springer, Cham. https://doi.org/10.1007/978-3-319-28697-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-28697-6_6
Published: 21 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28695-2
Online ISBN: 978-3-319-28697-6
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics