Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending
- 1k Downloads
Predicting whether a borrower will default on a loan is of significant concern to platforms and investors in online peer-to-peer (P2P) lending. Because the data types online platforms use are complex and involve unstructured information such as text, which is difficult to quantify and analyze, loan default prediction faces new challenges in P2P. To this end, we propose a default prediction method for P2P lending combined with soft information related to textual description. We introduce a topic model to extract valuable features from the descriptive text concerning loans and construct four default prediction models to demonstrate the performance of these features for default prediction. Moreover, a two-stage method is designed to select an effective feature set containing both soft and hard information. An empirical analysis using real-word data from a major P2P lending platform in China shows that the proposed method can improve loan default prediction performance compared with existing methods based only on hard information.
KeywordsP2P lending Default prediction Soft information Topic model
The authors gratefully acknowledge the assistance provided by the constructive comments of the anonymous referees, which considerably improved the paper in terms of quality and clarity. This work was funded primarily by the National Natural Science Foundation of China (Grant Nos. 71571059,71331002 and 71731005), and the Humanities and Social Sciences Fund Projects of the Ministry of Education (Grant Nos. 13YJA630037, 15YJA630010).
- Angilella, S., & Mazzù, S. (2015). The financing of innovative SMEs: A multicriteria credit rating model. European Journal of Operational Research, 244(2), 540–554.Google Scholar
- Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. JMLR.org.Google Scholar
- Cornée, S. (2017). The relevance of soft information for predicting small business credit default: Evidence from a social bank. Journal of Small Business Management. doi: 10.1111/jsbm.12318.
- Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.Google Scholar
- Gao, Q., & Lin, M. (July 15, 2016). Economic value of texts: Evidence from online debt crowdfunding. Available at SSRN: doi: 10.2139/ssrn.2446114.
- Liberti, J. M., & Petersen, M. A. (2017). Information: Hard and Soft. Working Paper.Google Scholar
- Shao, H., Ju, X., Wu, C., Xu, J., & Liu, M. (2012). Research on commercial bank credit risk evaluation model based on the integration of the probability distribution theory and the bp neural network technology. International Journal of Advancements in Computing Technology, 4(22), 115–128.CrossRefGoogle Scholar
- Wei, X., & Croft, W. B. (2006). LDA-based document models for ad-hoc retrieval. In International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 178–185). ACM.Google Scholar