Classifying the Variety of Customers’ Online Engagement for Churn Prediction with a Mixed-Penalty Logistic Regression

Šimović, Petra P.; Chen, Claire Y. T.; Sun, Edward W.

doi:10.1007/s10614-022-10275-1

Classifying the Variety of Customers’ Online Engagement for Churn Prediction with a Mixed-Penalty Logistic Regression

Published: 10 June 2022

Volume 61, pages 451–485, (2023)
Cite this article

Computational Economics Aims and scope Submit manuscript

Petra P. Šimović¹,
Claire Y. T. Chen² &
Edward W. Sun³

523 Accesses
4 Citations
Explore all metrics

Abstract

Using big data to analyze consumer behavior can provide effective decision-making tools for preventing customer attrition (churn) in customer relationship management (CRM). Focusing on a CRM dataset with several different categories of factors that impact customer heterogeneity (i.e., usage of self-care service channels, service duration, and responsiveness to marketing actions), this research provides new predictive analytics of customer churn rate based on a machine learning method that enhances the classification of logistic regression by adding a mixed penalty term. The proposed penalized logistic regression prevents overfitting when dealing with big data and minimizes the loss function when balancing the cost from the median (absolute value) and mean (squared value) regularization. We show the analytical properties of the proposed method and its computational advantage in this research. In addition, we investigate the performance of the proposed method with a CRM dataset (that has a large number of features) under different settings by efficiently eliminating the disturbance of (1) least important features and (2) sensitivity from the minority (churn) class. Our empirical results confirm the expected performance of the proposed method in full compliance with the common classification criteria (i.e., accuracy, precision, and recall) for evaluating machine learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bank Customer Churn Prediction Based on Correlation Analysis and Multiple Linear Regression

A Customer Churn Prediction Using CSL-Based Analysis for ML Algorithms: The Case of Telecom Sector

A novel cost-sensitive framework for customer churn predictive modeling

Article Open access 12 June 2015

Notes

http://www.hattricksg.com/.

References

Affes, Z., & Hentati-Kaffel, R. (2019). Predicting us banks bankruptcy: Logit versus canonical discriminant analysis. Computational Economics, 54, 199–244.
Article Google Scholar
Ali, L., & Tibshirani, R. (2019). The generalized lasso problem and uniqueness. Electronic Journal of Statistics, 13, 2307–2347.
Article Google Scholar
Amin, A., Al-Obeidat, F., Shah, B., Adnan, A., Loo, J., & Anwar, S. (2019). Customer churn prediction in telecommunication industry using data certainty. Journal of Business Research, 94, 290–301.
Article Google Scholar
Castro, E. G., & Tsuzuki, M. S. G. (2015). Churn prediction in online games using players’ login records: A frequency analysis approach. IEEE Transactions on Computational Intelligence and AI in Games, 7, 255–265.
Article Google Scholar
Coussement, K., & De Bock, K. W. (2013). Customer churn prediction in the online gambling industry: The beneficial effect of ensemble learning. Journal of Business Research, 66, 1629–1636.
De Bock, K. W., & De Caigny, A. (2021). Spline-rule ensemble classifiers with structured sparsity regularization for interpretable customer churn modeling. Decision Support Systems, 150, 113523.
Article Google Scholar
De Caigny, A., Coussement, K., & De Bock, K. W. (2018). A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research, 269, 760–772.
Article Google Scholar
Defazio, A., Bach, F. R., & Lacoste-Julien, S. (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. CoRR. arXiv:1407.0202
Fiedler, I., & Wilcke, A. C. (2011). Der Markt für Onlinepoker. Spielerherkunft und Spielerverhalten.
Friedman, J., Hastie, T., Hoefling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1(2), 302–332.
Article Google Scholar
Gainsbury, S. M., Suhonen, N., & Saastamoinen, J. (2014). Chasing losses in online poker and casino games: Characteristics and game play of internet gamblers at risk of disordered gambling. Psychiatry Research, 217, 220–225.
Article Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning data mining, inference, and prediction (2nd ed.). Springer.
Hastie, T., Tibshirani, R., & Wainwright, M., (2015). Statistical Learning with sparsity: The Lasso and generalizations. Taylor and Francis Group, LLC.
Hing, N., Lamont, M., Vitartas, P., & Fink, E. (2015). Sports bettors’ responses to sports-embedded gambling promotions: Implications for compulsive consumption. Journal of Business Research, 68, 2057–2066.
Johansen, A. B., Helland, P. F., Wennesland, D. K., Henden, E., & Brendryen, H. (2019). Exploring online problem gamblers’ motivation to change. Addictive Behaviors Reports, 10, 100187.
Article Google Scholar
Konietzny, J., Caruana, A., & Cassar, M. L. (2018). Fun and fair, and I don’t care: The role of enjoyment, fairness and subjective norms on online gambling intentions. Journal of Retailing and Consumer Services, 44, 91–99.
Article Google Scholar
Korobov, M. (2020). Eli5. https://github.com/eli5-org/eli5
Koslovsky, M. D., Swartz, M. D., Leon-Novelo, L., Chan, W., & Wilkinson, A. (2018). Using the EM algorithm for Bayesian variable selection in logistic regression models with related covariates. Journal of Statistical Computation and Simulation, 88(3), 575–596.
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. CoRR. arXiv:1201.0490
Milošević, M., Živić, N., & Andjelković, I. (2017). Early churn prediction with personalized targeting in mobile social games. Expert Systems with Applications, 83, 326–332.
Article Google Scholar
Nguyen, N. N., & Duong, A. T. (2021). Comparison of two main approaches for handling imbalanced data in churn prediction problem. Journal of Advances in Information Technology, 12, 29–35.
Article Google Scholar
Pesantez-Narvaez, J., Guillen, M., & Alcañiz, M. (2020). A synthetic penalized logitboost to model mortgage lending with imbalanced data. Computational Economics, 57, 281–309.
Rockafellar, R. T. (1970). Convex analysis. Princeton University Press.
Book Google Scholar
Scott, S., Hughes, P., Hodgkinson, I., & Kraus, S. (2019). Technology adoption factors in the digitization of popular culture: Analyzing the online gambling market. Technological Forecasting and Social Change, 148, 119717.
Article Google Scholar
Siemens, J. C., & Kopp, S. W. (2011). The influence of online gambling environments on self-control. Journal of Public Policy & Marketing, 30, 279–293. https://doi.org/10.1509/jppm.30.2.279
Article Google Scholar
Taylor, J., & Tibshirani, R. (2011). The solution path of the generalized lasso. Annals of Statistics, 39(3), 1335–1371.
Google Scholar
Tseng, P. (1998). Coordinate ascent for maximizing nondifferentiable concave functions. Technical Report LIDS-P; 1840. Massachusetts Institute of Technology, Laboratory for Information and Decision Systems.
Tseng, P. (2001). Convergence of block coordinate descent method for nondifferentiable maximization. Journal of Optimization Theory and Applications, 109(3), 474–494.
Article Google Scholar
Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics. https://doi.org/10.1186/1471-2105-7-91
Wu, T., Chan, T. F., Hasti, T., Sobel, E., & Lange, K. (2009). Genomewide association analysis by lasso penalized logistic regression. Bioinformatics, 25(6), 714–721.
Article Google Scholar
Zhang, Y., Bradlow, E. T., & Small, D. S. (2015). Predicting customer value using clumpiness: From RFM to RFMC. Marketing Science, 34, 195–208.
Article Google Scholar
Zhang, Y., & Trubey, P. (2019). Machine learning and sampling scheme: An empirical study of money laundering detection. Computational Economics, 54, 1043–1063.
Article Google Scholar
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B., 67, 301–320.
Article Google Scholar

Download references

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

Faculty of Agriculture, University of Zagreb, Zagreb, Croatia
Petra P. Šimović
Montpellier Business School, Montpellier, France
Claire Y. T. Chen
KEDGE Business School, Talence, France
Edward W. Sun

Authors

Petra P. Šimović
View author publications
You can also search for this author in PubMed Google Scholar
Claire Y. T. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Edward W. Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edward W. Sun.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Šimović, P.P., Chen, C.Y.T. & Sun, E.W. Classifying the Variety of Customers’ Online Engagement for Churn Prediction with a Mixed-Penalty Logistic Regression. Comput Econ 61, 451–485 (2023). https://doi.org/10.1007/s10614-022-10275-1

Download citation

Accepted: 01 May 2022
Published: 10 June 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10614-022-10275-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classifying the Variety of Customers’ Online Engagement for Churn Prediction with a Mixed-Penalty Logistic Regression

Abstract

Access this article

Similar content being viewed by others

Bank Customer Churn Prediction Based on Correlation Analysis and Multiple Linear Regression

A Customer Churn Prediction Using CSL-Based Analysis for ML Algorithms: The Case of Telecom Sector

A novel cost-sensitive framework for customer churn predictive modeling

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classifying the Variety of Customers’ Online Engagement for Churn Prediction with a Mixed-Penalty Logistic Regression

Abstract

Access this article

Similar content being viewed by others

Bank Customer Churn Prediction Based on Correlation Analysis and Multiple Linear Regression

A Customer Churn Prediction Using CSL-Based Analysis for ML Algorithms: The Case of Telecom Sector

A novel cost-sensitive framework for customer churn predictive modeling

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation