Beta-Boosted Ensemble for Big Credit Scoring Data

Zieba, Maciej; Härdle, Wolfgang Karl

doi:10.1007/978-3-319-18284-1_21

Maciej Zieba⁷ &
Wolfgang Karl Härdle^8,9

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

4362 Accesses

Abstract

In this work we present the novel ensemble model for credit scoring problem. The main idea of the approach is to incorporate separate beta binomial distributions for each of the classes to generate balanced datasets that are further used to construct base learners that constitute the final ensemble model. The sampling procedure is performed on two separate ranking lists, each for one class, where the ranking is based on probability of observing positive class. The two strategies are considered in the studies: one assumes mining easy examples and the second one force good classification of hard cases. The proposed solutions are tested on two big datasets from credit scoring domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abellán J, Mantas CJ (2014) Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 41(8):3825–3830
Article Google Scholar
Bellotti T, Crook J (2009) Support vector machines for credit scoring and discovery of significant features. Expert Syst Appl 36(2):3302–3308
Article Google Scholar
Chen S, Härdle WK, Jeong K (2010) Forecasting volatility with support vector machine-based GARCH model. J Forecast 29(4):406–433
MathSciNet MATH Google Scholar
Chen S, Härdle W, Moro R (2011) Modeling default risk with support vector machines. Quant Finan 11(1):135–154
Article MathSciNet Google Scholar
Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4(Nov):933–969
MathSciNet MATH Google Scholar
Give Me Some Credit (2011) Give me some credit. https://www.kaggle.com/c/GiveMeSomeCredit
Härdle W, Lee YJ, Schäfer D, Yeh YR (2009) Variable selection and oversampling in the use of smooth support vector machines for predicting the default risk of companies. J Forecast 28(6):512–534
Article MathSciNet Google Scholar
Härdle WK, Prastyo DD, Hafner C (2012) Support vector machines with evolutionary feature selection for default prediction. In: Handbook of applied nonparametric and semi-parametric econometrics and statistics. Oxford University Press, Oxford, pp 346–373
Google Scholar
Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42(2):741–750
Article Google Scholar
Huang SC (2011) Using Gaussian process based kernel classifiers for credit rating forecasting. Expert Syst Appl 38(7):8607–8611
Article Google Scholar
Koutanaei FN, Sajedi H, Khanbabaei M (2015) A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. J Retail Consum Serv 27:11–23
Article Google Scholar
Kumar MP, Packer B, Koller D (2010) Self-paced learning for latent variable models. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 1189–1197
Google Scholar
Lee TS, Chiu CC, Lu CJ, Chen IF (2002) Credit scoring using the hybrid neural discriminant technique. Expert Syst Appl 23(3):245–254
Article Google Scholar
Lending Club (2016) Lending club loan data. https://www.kaggle.com/wendykan/lending-club-loan-data
Marqués A, García V, Sánchez JS (2012) Two-level classifier ensembles for credit risk assessment. Expert Syst Appl 39(12):10916–10922
Article Google Scholar
Martens D, Baesens B, Van Gestel T, Vanthienen J (2007) Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 183(3):1466–1476
Article Google Scholar
Nanni L, Lumini A (2009) An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 36(2):3028–3033
Article Google Scholar
Oreski S, Oreski D, Oreski G (2012) Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment. Expert Syst Appl 39(16):12605–12617
Article Google Scholar
Rudin C, Schapire RE (2009) Margin-based ranking and an equivalence between AdaBoost and RankBoost. J Mach Learn Res 10(Oct):2193–2232
MathSciNet MATH Google Scholar
Tomczak JM, Zieba M (2015) Classification restricted Boltzmann machine for comprehensible credit scoring model. Expert Syst Appl 42(4):1789–1796
Article Google Scholar
Tsai CF, Wu JW (2008) Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst Appl 34(4):2639–2649
Article Google Scholar
Zhao Z, Xu S, Kang BH, Kabir MMJ, Liu Y, Wasinger R (2015) Investigation and improvement of multi-layer perceptron neural networks for credit scoring. Expert Syst Appl 42(7):3508–3516
Article Google Scholar
Zhou L, Lai KK, Yen J (2009) Credit scoring models with AUC maximization based on weighted SVM. Int J Inf Technol Decis Mak 8(04):677–696
Article Google Scholar
Zhu Y, Xie C, Wang GJ, Yan XG (2016) Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Comput Appl 28:1–10
Article Google Scholar
Zieba M, Świ ątek J (2012) Ensemble classifier for solving credit scoring problems. In: Doctoral conference on computing, electrical and industrial systems. Springer, Berlin, pp 59–66
Chapter Google Scholar
Zieba M, Tomczak JM (2015) Boosted SVM with active learning strategy for imbalanced data. Soft Comput 19(12):3357–3368
Article Google Scholar
Zieba M, Tomczak SK, Tomczak JM (2016) Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst Appl 58:93–101
Article Google Scholar

Download references

Author information

Authors and Affiliations

Wroclaw University of Science and Technology, Wroclaw, Poland
Maciej Zieba
Humboldt-Universität zu Berlin, Berlin, Germany
Wolfgang Karl Härdle
School of Business, Singapore Management University, Singapore, Singapore
Wolfgang Karl Härdle

Authors

Maciej Zieba
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Karl Härdle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfgang Karl Härdle .

Editor information

Editors and Affiliations

Ladislaus von Bortkiewicz Chair of Statistics, C.A.S.E. Center for Applied Statistics & Economics, Humboldt-Universität zu Berlin, Berlin, Germany
Wolfgang Karl Härdle
Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan
Henry Horng-Shing Lu
School of Statistics, University of Minnesota, Minneapolis, USA
Xiaotong Shen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zieba, M., Härdle, W.K. (2018). Beta-Boosted Ensemble for Big Credit Scoring Data. In: Härdle, W., Lu, HS., Shen, X. (eds) Handbook of Big Data Analytics. Springer Handbooks of Computational Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18284-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-18284-1_21
Published: 18 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18283-4
Online ISBN: 978-3-319-18284-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics