Abstract
Credit scoring is often modeled as a binary classification task where defaults rarely occur and the classes generally are highly unbalanced. Although many new algorithms have been proposed in the recent past to mitigate this specific problem, the aspect of class imbalance is still underrepresented in research despite its great relevance for many business applications. Within the “Machine Learning in R” (mlr) framework methods for imbalance correction are readily available and can be integrated into a systematic classifier optimization process. Different strategies are discussed, extended and compared.
The opinions expressed in this paper are those of the authors and do not reflect views of any organization or employer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baesens, B., van Gestel, T.: Credit Risk Management—Basic Concepts. Oxford University Press, Oxford (2009)
Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.: Benchmarking state of the art classification algorithms for credit scoring. J. Oper. Res. Soc. 54(6), 627–635 (2003)
Bischl, B., Lang, M., Mersmann, O., Rahnenführer, J., Weihs, C.: BatchJobs and BatchExperiments: abstraction mechanisms for using R in batch environments (ACCEPTED). J. Stat. Soft. (2015)
Bischl, B., Schiffner, J., Weihs, C.: Benchmarking local classification methods. Comput. Stat. 28(6), 2599–2619 (2013)
Bischl, B., Schiffner, J., Weihs, C.: Benchmarking classification algorithms on high-performance computing clusters. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds.) Data Analysis, Machine Learning and Knowledge Discovery, Studies in Classification, Data Analysis, and Knowledge Organization, pp. 23–31. Springer, Heidelberg (2014)
Bischl, B., Lang, M., Richter, J., Judt, L.: mlr: Machine Learning in R. R package version 2.0. http://CRAN.R-project.org/package=mlr (2014)
Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39(3), 3446–3453 (2012)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Crone, S., Finlay, S.: Instance sampling in credit scoring: an empirical study of sample size and balancing. Int. J. Forecast. 28(1), 224–238 (2012)
Galar, M., Fernandez, A., Barrenechea Tartas, E., Bustince Sola, H., Herrera, F.: A review on ensembles for the class imbalance problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans. Syst. Man Cybern. Part C 42(4), 463–484 (2012)
Koch, P., Bischl, B., Flasch, O., Bartz-Beielstein, T., Weihs, C., Konen, W.: Tuning and evolution of support vector kernels. Evol. Intell. 5(3), 153–170 (2012)
Lang, M., Kotthaus, H., Marwedel, P., Weihs, C. Rahnenführer, J., Bischl, B.: Automatic model selection for high-dimensional survival analysis. J. Stat. Comput. Simul. (2014)
Lessmann S., Seow H.-V., Baesens, B., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: A ten-year update. http://www.business-school.ed.ac.uk/waf/crc_archive/2013/42.pdf (2013)
Lopez-Ibanez, M., Dubois-Lacoste, J., Stützle, T., Birattari, M.: The irace Package: iterated racing for automatic algorithm configuration, Technical report TR/IRIDIA/2011-004. IRIDIA, Bruxelles (2011)
Strackeljahn, J., Jonscher, R., Prieur, S., Vogel, D., Deslaers, T., Keysers, D., Mauser, A., Bezrukov, I., Hegerath, A.: GfKl Data mining competition 2005—predicting liquidity crisis of companies. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 748–758. Springer (2005)
Szepannek, G., Gruhne, M., Bischl, B., Krey, S., Harczos, T., Klefenz, F., Dittmar, C., Weihs, C.: Perceptually based phoneme recognition in popular music. In: Locarek-Junge, H., Weihs, C. (eds.) Classification as a Tool for Research, pp. 751–758. Springer, Heidelberg (2010)
Szepannek, G., Schiffner, J., Wilson, J.C., Weihs, C.: Local modelling in classification. In: Perner, P. (ed.) Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects, pp. 153–164. Springer LNAI 5077, Berlin (2008)
Therneau, T., Atkinson, E.: In introduction to recursive partitioning using RPART routines, TR 61, Mayo Foundation. http://www.mayo.edu/hsr/techrpt/61.pdf (1997)
Thomas, L.C., Edelman, D.B., Crook, J.N.: Credit scoring and its applications. SIAM (2002)
Vincotti, T., Hand, D.: Scorecard construction with unbalanced class sizes. J. Iran. Stat. Soc. 2, 189–205 (2002)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Bischl, B., Kühn, T., Szepannek, G. (2016). On Class Imbalance Correction for Classification Algorithms in Credit Scoring. In: Lübbecke, M., Koster, A., Letmathe, P., Madlener, R., Peis, B., Walther, G. (eds) Operations Research Proceedings 2014. Operations Research Proceedings. Springer, Cham. https://doi.org/10.1007/978-3-319-28697-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-28697-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28695-2
Online ISBN: 978-3-319-28697-6
eBook Packages: Business and ManagementBusiness and Management (R0)