Skip to main content

Beta-Boosted Ensemble for Big Credit Scoring Data

  • Chapter
  • First Online:
Handbook of Big Data Analytics

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

  • 4362 Accesses

Abstract

In this work we present the novel ensemble model for credit scoring problem. The main idea of the approach is to incorporate separate beta binomial distributions for each of the classes to generate balanced datasets that are further used to construct base learners that constitute the final ensemble model. The sampling procedure is performed on two separate ranking lists, each for one class, where the ranking is based on probability of observing positive class. The two strategies are considered in the studies: one assumes mining easy examples and the second one force good classification of hard cases. The proposed solutions are tested on two big datasets from credit scoring domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abellán J, Mantas CJ (2014) Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 41(8):3825–3830

    Article  Google Scholar 

  • Bellotti T, Crook J (2009) Support vector machines for credit scoring and discovery of significant features. Expert Syst Appl 36(2):3302–3308

    Article  Google Scholar 

  • Chen S, Härdle WK, Jeong K (2010) Forecasting volatility with support vector machine-based GARCH model. J Forecast 29(4):406–433

    MathSciNet  MATH  Google Scholar 

  • Chen S, Härdle W, Moro R (2011) Modeling default risk with support vector machines. Quant Finan 11(1):135–154

    Article  MathSciNet  Google Scholar 

  • Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4(Nov):933–969

    MathSciNet  MATH  Google Scholar 

  • Give Me Some Credit (2011) Give me some credit. https://www.kaggle.com/c/GiveMeSomeCredit

  • Härdle W, Lee YJ, Schäfer D, Yeh YR (2009) Variable selection and oversampling in the use of smooth support vector machines for predicting the default risk of companies. J Forecast 28(6):512–534

    Article  MathSciNet  Google Scholar 

  • Härdle WK, Prastyo DD, Hafner C (2012) Support vector machines with evolutionary feature selection for default prediction. In: Handbook of applied nonparametric and semi-parametric econometrics and statistics. Oxford University Press, Oxford, pp 346–373

    Google Scholar 

  • Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42(2):741–750

    Article  Google Scholar 

  • Huang SC (2011) Using Gaussian process based kernel classifiers for credit rating forecasting. Expert Syst Appl 38(7):8607–8611

    Article  Google Scholar 

  • Koutanaei FN, Sajedi H, Khanbabaei M (2015) A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. J Retail Consum Serv 27:11–23

    Article  Google Scholar 

  • Kumar MP, Packer B, Koller D (2010) Self-paced learning for latent variable models. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 1189–1197

    Google Scholar 

  • Lee TS, Chiu CC, Lu CJ, Chen IF (2002) Credit scoring using the hybrid neural discriminant technique. Expert Syst Appl 23(3):245–254

    Article  Google Scholar 

  • Lending Club (2016) Lending club loan data. https://www.kaggle.com/wendykan/lending-club-loan-data

  • Marqués A, García V, Sánchez JS (2012) Two-level classifier ensembles for credit risk assessment. Expert Syst Appl 39(12):10916–10922

    Article  Google Scholar 

  • Martens D, Baesens B, Van Gestel T, Vanthienen J (2007) Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 183(3):1466–1476

    Article  Google Scholar 

  • Nanni L, Lumini A (2009) An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 36(2):3028–3033

    Article  Google Scholar 

  • Oreski S, Oreski D, Oreski G (2012) Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment. Expert Syst Appl 39(16):12605–12617

    Article  Google Scholar 

  • Rudin C, Schapire RE (2009) Margin-based ranking and an equivalence between AdaBoost and RankBoost. J Mach Learn Res 10(Oct):2193–2232

    MathSciNet  MATH  Google Scholar 

  • Tomczak JM, Zieba M (2015) Classification restricted Boltzmann machine for comprehensible credit scoring model. Expert Syst Appl 42(4):1789–1796

    Article  Google Scholar 

  • Tsai CF, Wu JW (2008) Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst Appl 34(4):2639–2649

    Article  Google Scholar 

  • Zhao Z, Xu S, Kang BH, Kabir MMJ, Liu Y, Wasinger R (2015) Investigation and improvement of multi-layer perceptron neural networks for credit scoring. Expert Syst Appl 42(7):3508–3516

    Article  Google Scholar 

  • Zhou L, Lai KK, Yen J (2009) Credit scoring models with AUC maximization based on weighted SVM. Int J Inf Technol Decis Mak 8(04):677–696

    Article  Google Scholar 

  • Zhu Y, Xie C, Wang GJ, Yan XG (2016) Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Comput Appl 28:1–10

    Article  Google Scholar 

  • Zieba M, Świ ątek J (2012) Ensemble classifier for solving credit scoring problems. In: Doctoral conference on computing, electrical and industrial systems. Springer, Berlin, pp 59–66

    Chapter  Google Scholar 

  • Zieba M, Tomczak JM (2015) Boosted SVM with active learning strategy for imbalanced data. Soft Comput 19(12):3357–3368

    Article  Google Scholar 

  • Zieba M, Tomczak SK, Tomczak JM (2016) Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst Appl 58:93–101

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfgang Karl Härdle .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zieba, M., Härdle, W.K. (2018). Beta-Boosted Ensemble for Big Credit Scoring Data. In: Härdle, W., Lu, HS., Shen, X. (eds) Handbook of Big Data Analytics. Springer Handbooks of Computational Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18284-1_21

Download citation

Publish with us

Policies and ethics