Abstract
Using big data to analyze consumer behavior can provide effective decision-making tools for preventing customer attrition (churn) in customer relationship management (CRM). Focusing on a CRM dataset with several different categories of factors that impact customer heterogeneity (i.e., usage of self-care service channels, service duration, and responsiveness to marketing actions), this research provides new predictive analytics of customer churn rate based on a machine learning method that enhances the classification of logistic regression by adding a mixed penalty term. The proposed penalized logistic regression prevents overfitting when dealing with big data and minimizes the loss function when balancing the cost from the median (absolute value) and mean (squared value) regularization. We show the analytical properties of the proposed method and its computational advantage in this research. In addition, we investigate the performance of the proposed method with a CRM dataset (that has a large number of features) under different settings by efficiently eliminating the disturbance of (1) least important features and (2) sensitivity from the minority (churn) class. Our empirical results confirm the expected performance of the proposed method in full compliance with the common classification criteria (i.e., accuracy, precision, and recall) for evaluating machine learning methods.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-022-10275-1/MediaObjects/10614_2022_10275_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-022-10275-1/MediaObjects/10614_2022_10275_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-022-10275-1/MediaObjects/10614_2022_10275_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-022-10275-1/MediaObjects/10614_2022_10275_Fig4_HTML.png)
Similar content being viewed by others
References
Affes, Z., & Hentati-Kaffel, R. (2019). Predicting us banks bankruptcy: Logit versus canonical discriminant analysis. Computational Economics, 54, 199–244.
Ali, L., & Tibshirani, R. (2019). The generalized lasso problem and uniqueness. Electronic Journal of Statistics, 13, 2307–2347.
Amin, A., Al-Obeidat, F., Shah, B., Adnan, A., Loo, J., & Anwar, S. (2019). Customer churn prediction in telecommunication industry using data certainty. Journal of Business Research, 94, 290–301.
Castro, E. G., & Tsuzuki, M. S. G. (2015). Churn prediction in online games using players’ login records: A frequency analysis approach. IEEE Transactions on Computational Intelligence and AI in Games, 7, 255–265.
Coussement, K., & De Bock, K. W. (2013). Customer churn prediction in the online gambling industry: The beneficial effect of ensemble learning. Journal of Business Research, 66, 1629–1636.
De Bock, K. W., & De Caigny, A. (2021). Spline-rule ensemble classifiers with structured sparsity regularization for interpretable customer churn modeling. Decision Support Systems, 150, 113523.
De Caigny, A., Coussement, K., & De Bock, K. W. (2018). A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research, 269, 760–772.
Defazio, A., Bach, F. R., & Lacoste-Julien, S. (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. CoRR. arXiv:1407.0202
Fiedler, I., & Wilcke, A. C. (2011). Der Markt für Onlinepoker. Spielerherkunft und Spielerverhalten.
Friedman, J., Hastie, T., Hoefling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1(2), 302–332.
Gainsbury, S. M., Suhonen, N., & Saastamoinen, J. (2014). Chasing losses in online poker and casino games: Characteristics and game play of internet gamblers at risk of disordered gambling. Psychiatry Research, 217, 220–225.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning data mining, inference, and prediction (2nd ed.). Springer.
Hastie, T., Tibshirani, R., & Wainwright, M., (2015). Statistical Learning with sparsity: The Lasso and generalizations. Taylor and Francis Group, LLC.
Hing, N., Lamont, M., Vitartas, P., & Fink, E. (2015). Sports bettors’ responses to sports-embedded gambling promotions: Implications for compulsive consumption. Journal of Business Research, 68, 2057–2066.
Johansen, A. B., Helland, P. F., Wennesland, D. K., Henden, E., & Brendryen, H. (2019). Exploring online problem gamblers’ motivation to change. Addictive Behaviors Reports, 10, 100187.
Konietzny, J., Caruana, A., & Cassar, M. L. (2018). Fun and fair, and I don’t care: The role of enjoyment, fairness and subjective norms on online gambling intentions. Journal of Retailing and Consumer Services, 44, 91–99.
Korobov, M. (2020). Eli5. https://github.com/eli5-org/eli5
Koslovsky, M. D., Swartz, M. D., Leon-Novelo, L., Chan, W., & Wilkinson, A. (2018). Using the EM algorithm for Bayesian variable selection in logistic regression models with related covariates. Journal of Statistical Computation and Simulation, 88(3), 575–596.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. CoRR. arXiv:1201.0490
Milošević, M., Živić, N., & Andjelković, I. (2017). Early churn prediction with personalized targeting in mobile social games. Expert Systems with Applications, 83, 326–332.
Nguyen, N. N., & Duong, A. T. (2021). Comparison of two main approaches for handling imbalanced data in churn prediction problem. Journal of Advances in Information Technology, 12, 29–35.
Pesantez-Narvaez, J., Guillen, M., & Alcañiz, M. (2020). A synthetic penalized logitboost to model mortgage lending with imbalanced data. Computational Economics, 57, 281–309.
Rockafellar, R. T. (1970). Convex analysis. Princeton University Press.
Scott, S., Hughes, P., Hodgkinson, I., & Kraus, S. (2019). Technology adoption factors in the digitization of popular culture: Analyzing the online gambling market. Technological Forecasting and Social Change, 148, 119717.
Siemens, J. C., & Kopp, S. W. (2011). The influence of online gambling environments on self-control. Journal of Public Policy & Marketing, 30, 279–293. https://doi.org/10.1509/jppm.30.2.279
Taylor, J., & Tibshirani, R. (2011). The solution path of the generalized lasso. Annals of Statistics, 39(3), 1335–1371.
Tseng, P. (1998). Coordinate ascent for maximizing nondifferentiable concave functions. Technical Report LIDS-P; 1840. Massachusetts Institute of Technology, Laboratory for Information and Decision Systems.
Tseng, P. (2001). Convergence of block coordinate descent method for nondifferentiable maximization. Journal of Optimization Theory and Applications, 109(3), 474–494.
Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics. https://doi.org/10.1186/1471-2105-7-91
Wu, T., Chan, T. F., Hasti, T., Sobel, E., & Lange, K. (2009). Genomewide association analysis by lasso penalized logistic regression. Bioinformatics, 25(6), 714–721.
Zhang, Y., Bradlow, E. T., & Small, D. S. (2015). Predicting customer value using clumpiness: From RFM to RFMC. Marketing Science, 34, 195–208.
Zhang, Y., & Trubey, P. (2019). Machine learning and sampling scheme: An empirical study of money laundering detection. Computational Economics, 54, 1043–1063.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B., 67, 301–320.
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Šimović, P.P., Chen, C.Y.T. & Sun, E.W. Classifying the Variety of Customers’ Online Engagement for Churn Prediction with a Mixed-Penalty Logistic Regression. Comput Econ 61, 451–485 (2023). https://doi.org/10.1007/s10614-022-10275-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-022-10275-1