Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China


In recent years, peer-to-peer (P2P) lending in China, which is a new form of unsecured financing that uses the Internet, has boomed, but the consequent credit risk problems are inevitable. A key challenge facing P2P lending platforms is accurately predicting the default probability of the borrower of each loan using the default prediction model, which effectively helps the P2P lending platform avoid credit risks. The traditional default prediction model based on machine learning and statistical learning does not meet the needs of P2P lending platforms in terms of default risk prediction because for data-driven P2P lending, credit data have a large number of missing values, are high-dimensional and have class-imbalanced problems, which makes it difficult to effectively train the default risk prediction model. To solve the above problems, this paper proposes a new default risk prediction model based on heterogeneous ensemble learning. Three individual classifiers, extreme gradient boosting (XGBoost), a deep neural network (DNN) and logistic regression (LR), are used simultaneously with a liner weight ensemble strategy. In particular, this model is able to process missing values. After generating discrete and rank features, this model adds missing values to the model for self-training. Then, the hyperparameters are optimized by the XGBoost model to improve the performance of the prediction model. Finally, compared with the benchmark model, the proposed method significantly improves the accuracy of the prediction results. In conclusion, the prediction method proposed in this paper solves the class-imbalanced problem.

This is a preview of subscription content, log in to check access.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9


  1. 1.

    Bergstra, J., Yoshua Bengio, U.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012).

  2. 2.

    Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446–3453 (2012).

  3. 3.

    Chen, T., International, C.G.-P. of the 22nd acm sigkdd: U.: XGBoost: a scalable tree boosting system. Dl.Acm.Org. 785–794(2016), (2016).

  4. 4.

    Chen, K., Jiang, J., Zheng, F., Chen, K.: A novel data-driven approach for residential electricity consumption prediction based on ensemble learning. Energy. 150, 49–60 (2018)

  5. 5.

    Cheng, M.Y., Hoang, N.D., Limanto, L., Wu, Y.W.: A novel hybrid intelligent approach for contractor default status prediction. Knowledge-Based Syst. 71, 314–321 (2014).

  6. 6.

    Crone, S.F., Finlay, S.: Instance sampling in credit scoring: an empirical study of sample size and balancing. Int. J. Forecast. 28, 224–238 (2012).

  7. 7.

    Emekter, R., Tu, Y., Jirasakuldech, B., Lu, M.: Evaluating credit risk and loan performance in online peer-to-peer (P2P) lending. Appl. Econ. 47, 54–70 (2015).

  8. 8.

    Feng, X., Xiao, Z., Zhong, B., Qiu, J., Dong, Y.: Dynamic ensemble classification for credit scoring using soft probability. Appl. Soft Comput. J. 65, 139–151 (2018).

  9. 9.

    Genre, V., Kenny, G., Meyler, A., Timmermann, A.: Combining expert forecasts: can anything beat the simple average? Int. J. Forecast. 29, 108–121 (2013).

  10. 10.

    Guo, Y., Zhou, W., Luo, C., Liu, C., Xiong, H.: Instance-based credit risk assessment for investment decisions in P2P lending. Eur. J. Oper. Res. 249, 417–426 (2016).

  11. 11.

    Haixiang, G., Yijing, L., Yanan, L., Xiao, L., Jinling, L.: BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Eng. Appl. Artif. Intell. 49, 176–193 (2016).

  12. 12.

    Han, L., Han, L., Zhao, H.: Orthogonal support vector machine for credit scoring. Eng. Appl. Artif. Intell. 26, 848–862 (2013).

  13. 13.

    Ignatov, A.: Real-time human activity recognition from accelerometer data using convolutional neural networks. Appl. Soft Comput. J. 62, 915–922 (2018).

  14. 14.

    Iwata, K.: Extending the peak bandwidth of parameters for softmax selection in reinforcement learning. IEEE Trans. Neural Networks Learn. Syst. 28, 1865–1877 (2017).

  15. 15.

    Kaneko, H., Funatsu, K.: Fast optimization of hyperparameters for support vector regression models with highly predictive ability. Chemom. Intell. Lab. Syst. 142, 64–69 (2015).

  16. 16.

    Kim, S.Y., Upneja, A.: Predicting restaurant financial distress using decision tree and AdaBoosted decision tree models. Econ. Model. 36, 354–362 (2014).

  17. 17.

    Krauss, C., Do, X.A., Huck, N.: Deep neural networks, gradient-boosted trees, random forests: statistical arbitrage on the S&P 500. Eur. J. Oper. Res. 259, 689–702 (2017).

  18. 18.

    Krawczyk, B., Woźniak, M., Schaefer, G.: Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl. Soft Comput. J. 14, 554–562 (2014).

  19. 19.

    Kuncheva, L.I., Faithfull, W.J.: PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Trans. Neural Networks Learn. Syst. 25, 69–80 (2014).

  20. 20.

    Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247, 124–136 (2015).

  21. 21.

    Li, H., Mao, X., Wu, C., Yang, F.: Design and Analysis of a General Data Evaluation System Based on Social Networks. (2018)

  22. 22.

    Liu, J., Liao, X., Huang, W., Yang, J.b.: A new decision-making approach for multiple criteria sorting with an imbalanced set of assignment examples. Eur. J. Oper. Res. 265, 598–620 (2018).

  23. 23.

    Liu, X., Chuai, G., Gao, W., Zhang, K.: GA-AdaBoostSVM classifier empowered wireless network diagnosis. (2018)

  24. 24.

    López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. (Ny). 250, 113–141 (2013).

  25. 25.

    Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42, 4621–4631 (2015).

  26. 26.

    Nascimento, D.S.C., Coelho, A.L.V., Canuto, A.M.P.: Integrating complementary techniques for promoting diversity in classifier ensembles: a systematic study. Neurocomputing. 138, 347–357 (2014).

  27. 27.

    Osanaiye, O., Cai, H., Choo, K.K.R., Dehghantanha, A., Xu, Z., Dlodlo, M.: Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing. EURASIP J. Wirel. Commun. Netw. 2016, (2016).

  28. 28.

    Paleologo, G., Elisseeff, A., Antonini, G.: Subagging for credit scoring models. Eur. J. Oper. Res. 201, 490–499 (2010).

  29. 29.

    Serrano-Cinca, C., Gutiérrez-Nieto, B.: The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decis. Support. Syst. 89, 113–122 (2016).

  30. 30.

    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

  31. 31.

    Sun, T., Jiao, L., Liu, F., Wang, S., Feng, J.: Selective multiple kernel learning for classification with ensemble strategy. Pattern Recogn. 46, 3081–3090 (2013).

  32. 32.

    Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., Zhou, Y.: A novel ensemble method for classifying imbalanced data. Pattern Recogn. 48, 1623–1637 (2015).

  33. 33.

    Sun, J., Lang, J., Fujita, H., Li, H.: Imbalanced Enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf. Sci. (Ny). 425, 76–91 (2017).

  34. 34.

    Tavana, M., Abtahi, A.R., Di Caprio, D., Poortarigh, M.: An artificial neural network and Bayesian network model for liquidity risk assessment in banking. Neurocomputing. 275, 2525–2554 (2018).

  35. 35.

    Tobback, E., Bellotti, T., Moeyersoms, J., Stankova, M., Martens, D.: Bankruptcy prediction for SMEs using relational data. Decis. Support. Syst. 102, 69–81 (2017).

  36. 36.

    Wang, G., Ma, J., Huang, L., Xu, K.: Two credit scoring models based on dual strategy ensemble trees. Knowledge-Based Syst. 26, 61–68 (2012).

  37. 37.

    Wang, Z., Jiang, C., Ding, Y., Lyu, X., Liu, Y.: A novel behavioral scoring model for estimating probability of default over time in peer-to-peer lending. Electron. Commer. Res. Appl. 27, 74–82 (2018).

  38. 38.

    Wu, H., Zhang, Z., Yue, K., Zhang, B., He, J., Sun, L.: Dual-regularized matrix factorization with deep neural networks for recommender systems. Knowledge-Based Syst. 145, 46–58 (2018).

  39. 39.

    Xia, Y., Liu, C., Liu, N.: Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending. Electron. Commer. Res. Appl. 24, 30–49 (2017).

  40. 40.

    Xia, Y., Liu, C., Li, Y., Liu, N.: A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst. Appl. 78, 225–241 (2017).

  41. 41.

    Xia, Y., Liu, C., Da, B., Xie, F.: A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Syst. Appl. 93, 182–199 (2018).

  42. 42.

    Xiao, H., Xiao, Z., Wang, Y.: Ensemble classification based on supervised clustering for credit scoring. Appl. Soft Comput. J. 43, 73–86 (2016).

  43. 43.

    Yao, C., Cai, D., Bu, J., Chen, G.: Pre-training the deep generative models with adaptive hyperparameter optimization. Neurocomputing. 247, 144–155 (2017).

  44. 44.

    Yeh, C.C., Lin, F., Hsu, C.Y.: A hybrid KMV model, random forests and rough set theory approach for credit rating. Knowledge-Based Syst. 33, 166–172 (2012).

Download references


This work was funded by the National Natural Science Foundation of China under Grant Nos. 91846107, 71571058 and Anhui Provincial Science and Technology Major Project under Grant Nos. 16030801121 and 17030801001.

Author information

Correspondence to Shuai Ding or Shanlin Yang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, W., Ding, S., Wang, H. et al. Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China. World Wide Web 23, 23–45 (2020).

Download citation


  • Heterogeneous ensemble learning
  • Default prediction
  • Feature engineering
  • imbalanced data
  • Hyperparameter optimization