Skip to main content
Log in

Classifying the Variety of Customers’ Online Engagement for Churn Prediction with a Mixed-Penalty Logistic Regression

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

Using big data to analyze consumer behavior can provide effective decision-making tools for preventing customer attrition (churn) in customer relationship management (CRM). Focusing on a CRM dataset with several different categories of factors that impact customer heterogeneity (i.e., usage of self-care service channels, service duration, and responsiveness to marketing actions), this research provides new predictive analytics of customer churn rate based on a machine learning method that enhances the classification of logistic regression by adding a mixed penalty term. The proposed penalized logistic regression prevents overfitting when dealing with big data and minimizes the loss function when balancing the cost from the median (absolute value) and mean (squared value) regularization. We show the analytical properties of the proposed method and its computational advantage in this research. In addition, we investigate the performance of the proposed method with a CRM dataset (that has a large number of features) under different settings by efficiently eliminating the disturbance of (1) least important features and (2) sensitivity from the minority (churn) class. Our empirical results confirm the expected performance of the proposed method in full compliance with the common classification criteria (i.e., accuracy, precision, and recall) for evaluating machine learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://www.hattricksg.com/.

References

  • Affes, Z., & Hentati-Kaffel, R. (2019). Predicting us banks bankruptcy: Logit versus canonical discriminant analysis. Computational Economics, 54, 199–244.

    Article  Google Scholar 

  • Ali, L., & Tibshirani, R. (2019). The generalized lasso problem and uniqueness. Electronic Journal of Statistics, 13, 2307–2347.

    Article  Google Scholar 

  • Amin, A., Al-Obeidat, F., Shah, B., Adnan, A., Loo, J., & Anwar, S. (2019). Customer churn prediction in telecommunication industry using data certainty. Journal of Business Research, 94, 290–301.

    Article  Google Scholar 

  • Castro, E. G., & Tsuzuki, M. S. G. (2015). Churn prediction in online games using players’ login records: A frequency analysis approach. IEEE Transactions on Computational Intelligence and AI in Games, 7, 255–265.

    Article  Google Scholar 

  • Coussement, K., & De Bock, K. W. (2013). Customer churn prediction in the online gambling industry: The beneficial effect of ensemble learning. Journal of Business Research, 66, 1629–1636.

  • De Bock, K. W., & De Caigny, A. (2021). Spline-rule ensemble classifiers with structured sparsity regularization for interpretable customer churn modeling. Decision Support Systems, 150, 113523.

    Article  Google Scholar 

  • De Caigny, A., Coussement, K., & De Bock, K. W. (2018). A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research, 269, 760–772.

    Article  Google Scholar 

  • Defazio, A., Bach, F. R., & Lacoste-Julien, S. (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. CoRR. arXiv:1407.0202

  • Fiedler, I., & Wilcke, A. C. (2011). Der Markt für Onlinepoker. Spielerherkunft und Spielerverhalten.

  • Friedman, J., Hastie, T., Hoefling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1(2), 302–332.

    Article  Google Scholar 

  • Gainsbury, S. M., Suhonen, N., & Saastamoinen, J. (2014). Chasing losses in online poker and casino games: Characteristics and game play of internet gamblers at risk of disordered gambling. Psychiatry Research, 217, 220–225.

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning data mining, inference, and prediction (2nd ed.). Springer.

  • Hastie, T., Tibshirani, R., & Wainwright, M., (2015). Statistical Learning with sparsity: The Lasso and generalizations. Taylor and Francis Group, LLC.

  • Hing, N., Lamont, M., Vitartas, P., & Fink, E. (2015). Sports bettors’ responses to sports-embedded gambling promotions: Implications for compulsive consumption. Journal of Business Research, 68, 2057–2066.

  • Johansen, A. B., Helland, P. F., Wennesland, D. K., Henden, E., & Brendryen, H. (2019). Exploring online problem gamblers’ motivation to change. Addictive Behaviors Reports, 10, 100187.

    Article  Google Scholar 

  • Konietzny, J., Caruana, A., & Cassar, M. L. (2018). Fun and fair, and I don’t care: The role of enjoyment, fairness and subjective norms on online gambling intentions. Journal of Retailing and Consumer Services, 44, 91–99.

    Article  Google Scholar 

  • Korobov, M. (2020). Eli5. https://github.com/eli5-org/eli5

  • Koslovsky, M. D., Swartz, M. D., Leon-Novelo, L., Chan, W., & Wilkinson, A. (2018). Using the EM algorithm for Bayesian variable selection in logistic regression models with related covariates. Journal of Statistical Computation and Simulation, 88(3), 575–596.

    Article  Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. CoRR. arXiv:1201.0490

  • Milošević, M., Živić, N., & Andjelković, I. (2017). Early churn prediction with personalized targeting in mobile social games. Expert Systems with Applications, 83, 326–332.

    Article  Google Scholar 

  • Nguyen, N. N., & Duong, A. T. (2021). Comparison of two main approaches for handling imbalanced data in churn prediction problem. Journal of Advances in Information Technology, 12, 29–35.

    Article  Google Scholar 

  • Pesantez-Narvaez, J., Guillen, M., & Alcañiz, M. (2020). A synthetic penalized logitboost to model mortgage lending with imbalanced data. Computational Economics, 57, 281–309.

  • Rockafellar, R. T. (1970). Convex analysis. Princeton University Press.

    Book  Google Scholar 

  • Scott, S., Hughes, P., Hodgkinson, I., & Kraus, S. (2019). Technology adoption factors in the digitization of popular culture: Analyzing the online gambling market. Technological Forecasting and Social Change, 148, 119717.

    Article  Google Scholar 

  • Siemens, J. C., & Kopp, S. W. (2011). The influence of online gambling environments on self-control. Journal of Public Policy & Marketing, 30, 279–293. https://doi.org/10.1509/jppm.30.2.279

    Article  Google Scholar 

  • Taylor, J., & Tibshirani, R. (2011). The solution path of the generalized lasso. Annals of Statistics, 39(3), 1335–1371.

    Google Scholar 

  • Tseng, P. (1998). Coordinate ascent for maximizing nondifferentiable concave functions. Technical Report LIDS-P; 1840. Massachusetts Institute of Technology, Laboratory for Information and Decision Systems.

  • Tseng, P. (2001). Convergence of block coordinate descent method for nondifferentiable maximization. Journal of Optimization Theory and Applications, 109(3), 474–494.

    Article  Google Scholar 

  • Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics. https://doi.org/10.1186/1471-2105-7-91

  • Wu, T., Chan, T. F., Hasti, T., Sobel, E., & Lange, K. (2009). Genomewide association analysis by lasso penalized logistic regression. Bioinformatics, 25(6), 714–721.

    Article  Google Scholar 

  • Zhang, Y., Bradlow, E. T., & Small, D. S. (2015). Predicting customer value using clumpiness: From RFM to RFMC. Marketing Science, 34, 195–208.

    Article  Google Scholar 

  • Zhang, Y., & Trubey, P. (2019). Machine learning and sampling scheme: An empirical study of money laundering detection. Computational Economics, 54, 1043–1063.

    Article  Google Scholar 

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B., 67, 301–320.

    Article  Google Scholar 

Download references

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edward W. Sun.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Šimović, P.P., Chen, C.Y.T. & Sun, E.W. Classifying the Variety of Customers’ Online Engagement for Churn Prediction with a Mixed-Penalty Logistic Regression. Comput Econ 61, 451–485 (2023). https://doi.org/10.1007/s10614-022-10275-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10614-022-10275-1

Keywords

Navigation