Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China

Abstract

In recent years, peer-to-peer (P2P) lending in China, which is a new form of unsecured financing that uses the Internet, has boomed, but the consequent credit risk problems are inevitable. A key challenge facing P2P lending platforms is accurately predicting the default probability of the borrower of each loan using the default prediction model, which effectively helps the P2P lending platform avoid credit risks. The traditional default prediction model based on machine learning and statistical learning does not meet the needs of P2P lending platforms in terms of default risk prediction because for data-driven P2P lending, credit data have a large number of missing values, are high-dimensional and have class-imbalanced problems, which makes it difficult to effectively train the default risk prediction model. To solve the above problems, this paper proposes a new default risk prediction model based on heterogeneous ensemble learning. Three individual classifiers, extreme gradient boosting (XGBoost), a deep neural network (DNN) and logistic regression (LR), are used simultaneously with a liner weight ensemble strategy. In particular, this model is able to process missing values. After generating discrete and rank features, this model adds missing values to the model for self-training. Then, the hyperparameters are optimized by the XGBoost model to improve the performance of the prediction model. Finally, compared with the benchmark model, the proposed method significantly improves the accuracy of the prediction results. In conclusion, the prediction method proposed in this paper solves the class-imbalanced problem.

This is a preview of subscription content, log in to check access.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

References

  1. 1.

    Bergstra, J., Yoshua Bengio, U.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012). https://doi.org/10.1162/153244303322533223

  2. 2.

    Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446–3453 (2012). https://doi.org/10.1016/j.eswa.2011.09.033

  3. 3.

    Chen, T., International, C.G.-P. of the 22nd acm sigkdd: U.: XGBoost: a scalable tree boosting system. Dl.Acm.Org. 785–794(2016), (2016). https://doi.org/10.1145/2939672.2939785

  4. 4.

    Chen, K., Jiang, J., Zheng, F., Chen, K.: A novel data-driven approach for residential electricity consumption prediction based on ensemble learning. Energy. 150, 49–60 (2018)

  5. 5.

    Cheng, M.Y., Hoang, N.D., Limanto, L., Wu, Y.W.: A novel hybrid intelligent approach for contractor default status prediction. Knowledge-Based Syst. 71, 314–321 (2014). https://doi.org/10.1016/j.knosys.2014.08.009

  6. 6.

    Crone, S.F., Finlay, S.: Instance sampling in credit scoring: an empirical study of sample size and balancing. Int. J. Forecast. 28, 224–238 (2012). https://doi.org/10.1016/j.ijforecast.2011.07.006

  7. 7.

    Emekter, R., Tu, Y., Jirasakuldech, B., Lu, M.: Evaluating credit risk and loan performance in online peer-to-peer (P2P) lending. Appl. Econ. 47, 54–70 (2015). https://doi.org/10.1080/00036846.2014.962222

  8. 8.

    Feng, X., Xiao, Z., Zhong, B., Qiu, J., Dong, Y.: Dynamic ensemble classification for credit scoring using soft probability. Appl. Soft Comput. J. 65, 139–151 (2018). https://doi.org/10.1016/j.asoc.2018.01.021

  9. 9.

    Genre, V., Kenny, G., Meyler, A., Timmermann, A.: Combining expert forecasts: can anything beat the simple average? Int. J. Forecast. 29, 108–121 (2013). https://doi.org/10.1016/j.ijforecast.2012.06.004

  10. 10.

    Guo, Y., Zhou, W., Luo, C., Liu, C., Xiong, H.: Instance-based credit risk assessment for investment decisions in P2P lending. Eur. J. Oper. Res. 249, 417–426 (2016). https://doi.org/10.1016/j.ejor.2015.05.050

  11. 11.

    Haixiang, G., Yijing, L., Yanan, L., Xiao, L., Jinling, L.: BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Eng. Appl. Artif. Intell. 49, 176–193 (2016). https://doi.org/10.1016/j.engappai.2015.09.011

  12. 12.

    Han, L., Han, L., Zhao, H.: Orthogonal support vector machine for credit scoring. Eng. Appl. Artif. Intell. 26, 848–862 (2013). https://doi.org/10.1016/j.engappai.2012.10.005

  13. 13.

    Ignatov, A.: Real-time human activity recognition from accelerometer data using convolutional neural networks. Appl. Soft Comput. J. 62, 915–922 (2018). https://doi.org/10.1016/j.asoc.2017.09.027

  14. 14.

    Iwata, K.: Extending the peak bandwidth of parameters for softmax selection in reinforcement learning. IEEE Trans. Neural Networks Learn. Syst. 28, 1865–1877 (2017). https://doi.org/10.1109/TNNLS.2016.2558295

  15. 15.

    Kaneko, H., Funatsu, K.: Fast optimization of hyperparameters for support vector regression models with highly predictive ability. Chemom. Intell. Lab. Syst. 142, 64–69 (2015). https://doi.org/10.1016/j.chemolab.2015.01.001

  16. 16.

    Kim, S.Y., Upneja, A.: Predicting restaurant financial distress using decision tree and AdaBoosted decision tree models. Econ. Model. 36, 354–362 (2014). https://doi.org/10.1016/j.econmod.2013.10.005

  17. 17.

    Krauss, C., Do, X.A., Huck, N.: Deep neural networks, gradient-boosted trees, random forests: statistical arbitrage on the S&P 500. Eur. J. Oper. Res. 259, 689–702 (2017). https://doi.org/10.1016/j.ejor.2016.10.031

  18. 18.

    Krawczyk, B., Woźniak, M., Schaefer, G.: Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl. Soft Comput. J. 14, 554–562 (2014). https://doi.org/10.1016/j.asoc.2013.08.014

  19. 19.

    Kuncheva, L.I., Faithfull, W.J.: PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Trans. Neural Networks Learn. Syst. 25, 69–80 (2014). https://doi.org/10.1109/TNNLS.2013.2248094

  20. 20.

    Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247, 124–136 (2015). https://doi.org/10.1016/j.ejor.2015.05.030

  21. 21.

    Li, H., Mao, X., Wu, C., Yang, F.: Design and Analysis of a General Data Evaluation System Based on Social Networks. (2018)

  22. 22.

    Liu, J., Liao, X., Huang, W., Yang, J.b.: A new decision-making approach for multiple criteria sorting with an imbalanced set of assignment examples. Eur. J. Oper. Res. 265, 598–620 (2018). https://doi.org/10.1016/j.ejor.2017.07.043

  23. 23.

    Liu, X., Chuai, G., Gao, W., Zhang, K.: GA-AdaBoostSVM classifier empowered wireless network diagnosis. (2018)

  24. 24.

    López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. (Ny). 250, 113–141 (2013). https://doi.org/10.1016/j.ins.2013.07.007

  25. 25.

    Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42, 4621–4631 (2015). https://doi.org/10.1016/j.eswa.2015.02.001

  26. 26.

    Nascimento, D.S.C., Coelho, A.L.V., Canuto, A.M.P.: Integrating complementary techniques for promoting diversity in classifier ensembles: a systematic study. Neurocomputing. 138, 347–357 (2014). https://doi.org/10.1016/j.neucom.2014.01.027

  27. 27.

    Osanaiye, O., Cai, H., Choo, K.K.R., Dehghantanha, A., Xu, Z., Dlodlo, M.: Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing. EURASIP J. Wirel. Commun. Netw. 2016, (2016). https://doi.org/10.1186/s13638-016-0623-3

  28. 28.

    Paleologo, G., Elisseeff, A., Antonini, G.: Subagging for credit scoring models. Eur. J. Oper. Res. 201, 490–499 (2010). https://doi.org/10.1016/j.ejor.2009.03.008

  29. 29.

    Serrano-Cinca, C., Gutiérrez-Nieto, B.: The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decis. Support. Syst. 89, 113–122 (2016). https://doi.org/10.1016/j.dss.2016.06.014

  30. 30.

    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014). https://doi.org/10.1214/12-AOS1000

  31. 31.

    Sun, T., Jiao, L., Liu, F., Wang, S., Feng, J.: Selective multiple kernel learning for classification with ensemble strategy. Pattern Recogn. 46, 3081–3090 (2013). https://doi.org/10.1016/j.patcog.2013.04.003

  32. 32.

    Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., Zhou, Y.: A novel ensemble method for classifying imbalanced data. Pattern Recogn. 48, 1623–1637 (2015). https://doi.org/10.1016/j.patcog.2014.11.014

  33. 33.

    Sun, J., Lang, J., Fujita, H., Li, H.: Imbalanced Enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf. Sci. (Ny). 425, 76–91 (2017). https://doi.org/10.1016/j.ins.2017.10.017

  34. 34.

    Tavana, M., Abtahi, A.R., Di Caprio, D., Poortarigh, M.: An artificial neural network and Bayesian network model for liquidity risk assessment in banking. Neurocomputing. 275, 2525–2554 (2018). https://doi.org/10.1016/j.neucom.2017.11.034

  35. 35.

    Tobback, E., Bellotti, T., Moeyersoms, J., Stankova, M., Martens, D.: Bankruptcy prediction for SMEs using relational data. Decis. Support. Syst. 102, 69–81 (2017). https://doi.org/10.1016/j.dss.2017.07.004

  36. 36.

    Wang, G., Ma, J., Huang, L., Xu, K.: Two credit scoring models based on dual strategy ensemble trees. Knowledge-Based Syst. 26, 61–68 (2012). https://doi.org/10.1016/j.knosys.2011.06.020

  37. 37.

    Wang, Z., Jiang, C., Ding, Y., Lyu, X., Liu, Y.: A novel behavioral scoring model for estimating probability of default over time in peer-to-peer lending. Electron. Commer. Res. Appl. 27, 74–82 (2018). https://doi.org/10.1016/j.elerap.2017.12.006

  38. 38.

    Wu, H., Zhang, Z., Yue, K., Zhang, B., He, J., Sun, L.: Dual-regularized matrix factorization with deep neural networks for recommender systems. Knowledge-Based Syst. 145, 46–58 (2018). https://doi.org/10.1016/j.knosys.2018.01.003

  39. 39.

    Xia, Y., Liu, C., Liu, N.: Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending. Electron. Commer. Res. Appl. 24, 30–49 (2017). https://doi.org/10.1016/j.elerap.2017.06.004

  40. 40.

    Xia, Y., Liu, C., Li, Y., Liu, N.: A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst. Appl. 78, 225–241 (2017). https://doi.org/10.1016/j.eswa.2017.02.017

  41. 41.

    Xia, Y., Liu, C., Da, B., Xie, F.: A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Syst. Appl. 93, 182–199 (2018). https://doi.org/10.1016/j.eswa.2017.10.022

  42. 42.

    Xiao, H., Xiao, Z., Wang, Y.: Ensemble classification based on supervised clustering for credit scoring. Appl. Soft Comput. J. 43, 73–86 (2016). https://doi.org/10.1016/j.asoc.2016.02.022

  43. 43.

    Yao, C., Cai, D., Bu, J., Chen, G.: Pre-training the deep generative models with adaptive hyperparameter optimization. Neurocomputing. 247, 144–155 (2017). https://doi.org/10.1016/j.neucom.2017.03.058

  44. 44.

    Yeh, C.C., Lin, F., Hsu, C.Y.: A hybrid KMV model, random forests and rough set theory approach for credit rating. Knowledge-Based Syst. 33, 166–172 (2012). https://doi.org/10.1016/j.knosys.2012.04.004

Download references

Funding

This work was funded by the National Natural Science Foundation of China under Grant Nos. 91846107, 71571058 and Anhui Provincial Science and Technology Major Project under Grant Nos. 16030801121 and 17030801001.

Author information

Correspondence to Shuai Ding or Shanlin Yang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, W., Ding, S., Wang, H. et al. Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China. World Wide Web 23, 23–45 (2020). https://doi.org/10.1007/s11280-019-00676-y

Download citation

Keywords

  • Heterogeneous ensemble learning
  • Default prediction
  • Feature engineering
  • imbalanced data
  • Hyperparameter optimization