Skip to main content

Advertisement

Log in

Quantum Optimized Cost Based Feature Selection and Credit Scoring for Mobile Micro-financing

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

Mobile e-commerce has grown rapidly in the last decade because of the development of mobile network services, computing capabilities and big data’s applications. Financial institutions have been undergoing fundamental transformation in credit risk areas, specifically to traditional credit policy, that is now inadequate for accurately evaluating an individual’s credit risk profile in a timely manner. A big-scale dataset representing deep mobile usage of 450,722 anonymous mobile users with a 28-month loan history and mobile behavior of both iOS and Android is designed, can add value for credit scoring in terms of better accuracy and lower feature acquisition cost by introducing a cost-based quantum-inspired evolutionary algorithm (QIEA) feature selection method. The QIEA adopts quantum-based individual representation and quantum rotation gate operator to improve feature exploration capability of conventional genetic algorithm (GA). The expected feature yield fitness function introduced in QIEA able to identify cost-effective feature subsets. Experimental results show that quantum-based method achieves good predictive performances even with only 70–80% number of features selected by GAs, and hence achieve lower feature acquisition costs with budget constraints. Additionally, computational time can be reduced by 30–60% compared with GAs depending on different feature set sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Agrawal, R. K., Kaur, B., & Sharma, S. (2020). Quantum based Whale Optimization Algorithm for wrapper feature selection. Applied Soft Computing Journal, 89, 106092.

    Article  Google Scholar 

  • Agarwal, R. R., Lin, C. C., Chen, K. T., & Singh, V. K. (2019). Predicting financial trouble using call data—On social capital, phone logs, and financial trouble. Applied Soft Computing Journal, 74, 26–39.

    Google Scholar 

  • Batten, L., & Asghari, F. (2020). Bankruptcy prediction using logit and genetic algorithm models: A comparative analysis. Computational Economics, 55, 335–348.

    Article  Google Scholar 

  • Bayir, M., Demirbas, M., Eagle, N. (2010). Mobility profiler: A framework for discovering mobility profiles of cell phone users. Pervasive and Mobile Computing, 6, 435–454. https://doi.org/10.1016/j.pmcj.2010.01.003

    Article  Google Scholar 

  • Benyacoub, B., ElBernoussi, S., Zoglat, A., & Ouzineb, M. (2022). Credit scoring model based on HMM/Baum-Welch method. Computational Economics, 59(3), 1135–1154.

    Article  Google Scholar 

  • Bhatia, S., Sharma, P., Burman, R., Hazari, S., & Hande, R. (2017). Credit scoring using machine learning techniques. International Journal of Computer Applications, 161(11), 1–4.

    Article  Google Scholar 

  • Bommert, A., Xu, D. S., Bischl, B., Rahnenführer, J., & Lang, M. (2019). Benchmark for filter methods for feature selection in high-dimensional data. Computational Statistics & Data Analysis Journal. https://doi.org/10.1016/j.csda.2019.106839

    Article  Google Scholar 

  • Butaru, F., Chen, Q., Clark, B., Das, S., Lo, A., & Siddique, A. (2015). Risk management in the credit card industry. MIT Press.

    Book  Google Scholar 

  • Carrizosa, E., Martin-Barragan, B., & Morales, D. R. (2008). Multi-group support vector machines with measurement costs: A biobjective approach. Discrete Applied Mathematics, 156(6), 950–966.

  • Chittaranjan, G., Blom, J., & Perez, D. G. (2011). Who’s with big-five: Analyzing and classifying personality traits with smartphones. In Proceedings of the 15th annual international symposium on wearable computers (pp. 29–36). IEEE.

  • Church, K., et al. (2015). Understanding the challenges of mobile phone usage data. In Proceedings of the 17th international conference on human–computer interaction with mobile devices and services (pp. 504–514). ACM.

  • Ferdous, R., Osmani, V., & Mayora, O. (2015). Smartphone app usage as a predictor of perceived stress levels at workplace. In 2015 9th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) (pp. 225–228). IEEE.

  • Fitzpatrick, T., & Mues, C. (2016). An empirical comparison of classification algorithms for mortgage default prediction: Evidence from a distressed mortgage market. European Journal of Operational Research, 249(2), 427–439.

    Article  Google Scholar 

  • Goldberg, D. E. (2002). Design of Competent Genetic Algorithms. The Design of Innovation, pp 187–216.

  • Han, K. H., & Kim, J. H. (2002). Quantum-inspired evolutionary algorithm for a class of combinatorial optimization. IEEE Transactions on Evolutionary Computation, 6, 580–593.

    Article  Google Scholar 

  • Han, K. H., & Kim, J. H. (2003). On setting the parameters of quantum-inspired evolutionary algorithm for practical application. IEEE Congress on Evolutionary Computation, 1, 178–194.

    Google Scholar 

  • Jabeur, S. B., Stef, N., & Carmona, P. (2022). Bankruptcy prediction using the XGBoost algorithm and variable importance feature engineering.

  • Leo, Y., Busson, A., Sarraute, C., Fleury, E. (2016). Call detail records to characterize usages and mobility events of phone users. Computer Communications, 95, 43–53.

    Article  Google Scholar 

  • Leong, C. K. (2016). Credit risk scoring with Bayesian network models. Computational Economics., 47(3), 423–446.

    Article  Google Scholar 

  • Lessmann, S., Baesens, B., Seow, H. V., & Thoms, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247, 124–136.

    Article  Google Scholar 

  • Li, H., Lu, X., Liu, X., Xie, T., Bian, K., Lin, F. X., Mei, Q. Z., Feng, F. (2015). Characterizing smartphone usage patterns from millions of android users. In Proceedings of the 2015 Internet Measurement Conference (pp. 459–472).

  • Lim, K. W., Secci, S., Tabourier, L., & Tebbani, B. (2016). Characterizing and predicting mobile application usage. Computer Communications, 95, 82–94.

    Article  Google Scholar 

  • Liu J., Min F., Liao S., & Zhu W. (2011). A genetic algorithm to attribute reduction with test cost constraint. In 6th international conference on computer sciences and convergence information technology (ICCIT) (pp. 751–754). IEEE.

  • Maldonado, S., Perez, J., Bravo, C. (2017). Costbased feature selection for support vector machines: An application in credit scoring. European Journal of Operational Research, 216, 656–665.

    Article  Google Scholar 

  • Marqués, A. I., García, V., & Sánchez, J. S. (2013). A literature review on the application of evolutionary computing to credit scoring. Journal of the Operational Research Society, 64, 1384–1399.

    Article  Google Scholar 

  • Min, F., & Xu, J. (2016). Semi-greedy heuristics for feature selection with test cost constraints. Granular Computing, 1(3), 199–211.

    Article  Google Scholar 

  • Naboulsi, D., Fiore, M., Ribot, S., & Stanica, R. (2015). Large-scale mobile traffic analysis: A survey. IEEE Communications Surveys & Tutorials, 18(1), 124–161.

    Article  Google Scholar 

  • Oskarsdottir, M., et al. (2019). The value of big data for credit scoring: Enhancing financial inclusion using mobile phone data and social network analytics. Applied Soft Computing Journal, 74, 26–39.

    Article  Google Scholar 

  • Paclík, P., Duin, R. P., van Kempen, G. M., & Kohlus, R. (2002). On feature selection with measurement cost and grouped features. In Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshops SSPR 2002 and SPR 2002 Windsor, Ontario, Canada, August 6–9, 2002 Proceedings (pp. 461–469). Springer Berlin Heidelberg.

  • Pei, T., Sobolevsky, S., Ratti, C., Shaw, S. L., Li, T., & Zhou, C. (2014). A new insight into land use classification based on aggregated mobile phone data. International Journal of Geographical Information Science, 28(9), 1988–2007.

    Article  Google Scholar 

  • Rezac, M. (2015). ESIS2: Information value estimator for credit scoring models. Computational Economics, 45(2), 303–322.

    Article  Google Scholar 

  • Seneviratne, S., et al. (2014). Predicting user traits from a snapshot of apps installed on a smartphone. ACM SIGMOBILE Mobile Computing and Communications Review, 18(2), 1–8.

    Article  Google Scholar 

  • Singh, V. K., Bozkaya, B., & Pentland, A. (2015). Money walks: Implicit mobility behavior and financial well-being. PLoS ONE, 10(8), e0136628.

    Article  Google Scholar 

  • Srinivasan, V., Moghaddam, S., Mukherji, A., Rachuri, K. K., Xu, C. R., & Tapia, E. T. (2014). Mobileminer: Mining your frequent patterns on your phone. In Proceedings of the 2014 Acm International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 389–400.

  • Wang, Y., Li, S., & Lin, Z. X. (2013). Revealing key non-financial factors for online credit-scoring in e-Financing. In 10th international conference on service systems and service management (pp. 547–552).

  • Wu, Y. L., Li, X., Liu, Q. Q., & Tong, G. J. (2022). The analysis of credit risks in agricultural supply chain finance assessment model based on genetic algorithm and backpropagation neural network. Computational Economics, 60, 1269–1292.

    Article  Google Scholar 

  • Yu, Y. X. (2017). Machine learning application in online leading credit risk prediction. arXiv.org.

  • Zhang, G. X. (2011). Quantum-inspired evolutionary algorithms: A survey and empirical study. Journal of Heuristics, 17, 303–351. https://doi.org/10.1007/s10732-010-9136-0

    Article  Google Scholar 

  • Zhou, Q., Zhou, H., & Li, T. (2016). Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features. Knowledge-Based System, 95, 1–11.

    Article  Google Scholar 

Download references

Funding

The work described in this paper was partially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CityU C1143-20G), the CityU Strategic Research Grant (Project No. 7005430), the National Natural Science Foundation of China (grant number 72271089) and Hunan Provincial Natural Science Foundation of China (grant number 2022JJ30401).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chi Ming Chen.

Ethics declarations

Conflict of interest

We also stated that there is no direct financial or personal interests that could influence our research work in this paper. The processing of our research data is fully anonymized, which falls outside the scope of the data regulatory requirements.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Table 11.

Table 11 Table of data fields extracted from three sources of mobile device logs

1.1 Feature Variable Generation Measuring Credit Risk

This sub-section presents how to generate features in the Chinese credit dataset that reflect composite measurements of credit risk features for credit scoring. The behavior of mobile users is derived according to the following four types of indicator measurements:

  1. 1.

    Diversity:

    $${D}_{i}=\frac{-\sum_{j}^{N}{f}_{ij}log {f}_{ij} }{log P}$$

    The outcome value is between 0 and 1, with larger numbers meaning higher usage diversity. For example, someone with very high diversity spreads usage almost equitably across different times. The normalization by log P keeps the focus on quantifying the relative spread across bins and gives equal chance for a user with different level of usage to score high on this measurement.

  2. 2.

    Interest concentration:

    $${C}_{i}=\frac{{h}_{i}}{\sum_{j}^{N}{f}_{ij}}$$

    The outcome value is also between 0 and 1, with larger values indicating higher interest concentrations. For example, an individual with a very high interest concentration indicates that most of usage occur within the top three interest areas.

  3. 3.

    Consistency:

    $${CI}_{i}=1 - \frac{\sqrt{{({D}_{i}^{1}}- {D}_{i}^{T}{)}^{2} + {({C}_{i}^{1}}- {C}_{i}^{T}{)}^{2}}}{\sqrt{2}}$$

    where Di1 and DiT refer to the diversity in the first three months and the entire period, respectively. Similarly, Ci1 and CiT correspond to the interest concentration in the same time frames. The output distance values CIi are between 0 and 1, with 1 indicating absolute consistency (e.g., the two usage periods are identical, and hence the first three months and overall spending patterns are) and 0 indicating perfect inconsistency.

  4. 4.

    Overspending:

    $$O_{i}=cc_i/I_i$$

    For an overspending user who spends more than their earned income, this ratio should be over 1, and the higher the ratio is, the more “overspending” and hence financially risky someone is.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, C.M., Tso, G.K.F. & He, K. Quantum Optimized Cost Based Feature Selection and Credit Scoring for Mobile Micro-financing. Comput Econ 63, 919–950 (2024). https://doi.org/10.1007/s10614-023-10365-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10614-023-10365-8

Keywords

Navigation