Beta-Boosted Ensemble for Big Credit Scoring Data

  • Maciej Zieba
  • Wolfgang Karl HärdleEmail author
Part of the Springer Handbooks of Computational Statistics book series (SHCS)


In this work we present the novel ensemble model for credit scoring problem. The main idea of the approach is to incorporate separate beta binomial distributions for each of the classes to generate balanced datasets that are further used to construct base learners that constitute the final ensemble model. The sampling procedure is performed on two separate ranking lists, each for one class, where the ranking is based on probability of observing positive class. The two strategies are considered in the studies: one assumes mining easy examples and the second one force good classification of hard cases. The proposed solutions are tested on two big datasets from credit scoring domain.


Credit scoring Ensemble model Beta distribution Beta boost Big data 


  1. Abellán J, Mantas CJ (2014) Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 41(8):3825–3830CrossRefGoogle Scholar
  2. Bellotti T, Crook J (2009) Support vector machines for credit scoring and discovery of significant features. Expert Syst Appl 36(2):3302–3308CrossRefGoogle Scholar
  3. Chen S, Härdle WK, Jeong K (2010) Forecasting volatility with support vector machine-based GARCH model. J Forecast 29(4):406–433MathSciNetzbMATHGoogle Scholar
  4. Chen S, Härdle W, Moro R (2011) Modeling default risk with support vector machines. Quant Finan 11(1):135–154MathSciNetCrossRefGoogle Scholar
  5. Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4(Nov):933–969MathSciNetzbMATHGoogle Scholar
  6. Give Me Some Credit (2011) Give me some credit.
  7. Härdle W, Lee YJ, Schäfer D, Yeh YR (2009) Variable selection and oversampling in the use of smooth support vector machines for predicting the default risk of companies. J Forecast 28(6):512–534MathSciNetCrossRefGoogle Scholar
  8. Härdle WK, Prastyo DD, Hafner C (2012) Support vector machines with evolutionary feature selection for default prediction. In: Handbook of applied nonparametric and semi-parametric econometrics and statistics. Oxford University Press, Oxford, pp 346–373Google Scholar
  9. Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42(2):741–750CrossRefGoogle Scholar
  10. Huang SC (2011) Using Gaussian process based kernel classifiers for credit rating forecasting. Expert Syst Appl 38(7):8607–8611CrossRefGoogle Scholar
  11. Koutanaei FN, Sajedi H, Khanbabaei M (2015) A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. J Retail Consum Serv 27:11–23CrossRefGoogle Scholar
  12. Kumar MP, Packer B, Koller D (2010) Self-paced learning for latent variable models. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 1189–1197Google Scholar
  13. Lee TS, Chiu CC, Lu CJ, Chen IF (2002) Credit scoring using the hybrid neural discriminant technique. Expert Syst Appl 23(3):245–254CrossRefGoogle Scholar
  14. Lending Club (2016) Lending club loan data.
  15. Marqués A, García V, Sánchez JS (2012) Two-level classifier ensembles for credit risk assessment. Expert Syst Appl 39(12):10916–10922CrossRefGoogle Scholar
  16. Martens D, Baesens B, Van Gestel T, Vanthienen J (2007) Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 183(3):1466–1476CrossRefGoogle Scholar
  17. Nanni L, Lumini A (2009) An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 36(2):3028–3033CrossRefGoogle Scholar
  18. Oreski S, Oreski D, Oreski G (2012) Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment. Expert Syst Appl 39(16):12605–12617CrossRefGoogle Scholar
  19. Rudin C, Schapire RE (2009) Margin-based ranking and an equivalence between AdaBoost and RankBoost. J Mach Learn Res 10(Oct):2193–2232MathSciNetzbMATHGoogle Scholar
  20. Tomczak JM, Zieba M (2015) Classification restricted Boltzmann machine for comprehensible credit scoring model. Expert Syst Appl 42(4):1789–1796CrossRefGoogle Scholar
  21. Tsai CF, Wu JW (2008) Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst Appl 34(4):2639–2649CrossRefGoogle Scholar
  22. Zhao Z, Xu S, Kang BH, Kabir MMJ, Liu Y, Wasinger R (2015) Investigation and improvement of multi-layer perceptron neural networks for credit scoring. Expert Syst Appl 42(7):3508–3516CrossRefGoogle Scholar
  23. Zhou L, Lai KK, Yen J (2009) Credit scoring models with AUC maximization based on weighted SVM. Int J Inf Technol Decis Mak 8(04):677–696CrossRefGoogle Scholar
  24. Zhu Y, Xie C, Wang GJ, Yan XG (2016) Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Comput Appl 28:1–10CrossRefGoogle Scholar
  25. Zieba M, Świ ątek J (2012) Ensemble classifier for solving credit scoring problems. In: Doctoral conference on computing, electrical and industrial systems. Springer, Berlin, pp 59–66CrossRefGoogle Scholar
  26. Zieba M, Tomczak JM (2015) Boosted SVM with active learning strategy for imbalanced data. Soft Comput 19(12):3357–3368CrossRefGoogle Scholar
  27. Zieba M, Tomczak SK, Tomczak JM (2016) Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst Appl 58:93–101CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Wroclaw University of Science and TechnologyWroclawPoland
  2. 2.Humboldt-Universität zu BerlinBerlinGermany
  3. 3.School of BusinessSingapore Management UniversitySingaporeSingapore

Personalised recommendations