Abstract
In today’s date where machine learning is the key to solve so many problems in different fields, one really should know the extent of its importance in their field. One of the major applications of machine learning is Predictive Analytics. Churn prediction is one of the key steps for customer retention in this saturating market scenario [31]. This is one of the major objectives and any toolkit which can give insights on this can be really beneficial for any service providing companies. Furthermore, one of the major problems that business analysts face during this procedure is to decide which classifier to select. In the continuously evolving field of machine learning where developers are constantly coming up with new machine learning algorithms, it is often difficult for the analysts to have knowledge about the varied options. In our work, we try to analyze and compare the performance of over 100 classifiers in churn prediction of a telecom company. We have used renowned classifiers from different families. This work can serve as the first step for any data scientist who wants to develop a churn prediction system for their application. Also, we try to explore efficient algorithms that will give a better result. Churn prediction is a mildly imbalanced set of the problem which degrade the performance of classifiers. The highest accuracy is given by the Regularized Random Forest classifier. Since the problem is imbalanced, we also consider the area under the Receiver Operating Characteristic (ROC) curve and the classifier Bagging Random Forest produces the best result in this scenario.
Similar content being viewed by others
References
Abbott, Dean (2014). Applied predictive analytics: principles and techniques for the professional data analyst. John Wiley & Sons
Aha D, Kibler D (1991) Instance-based learning algorithms. Mach Learn 6:37–66
Arun Kumar M, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert SystAppl 36:7535–7543
Borah, Parashjyoti, and Deepak Gupta (2019). “Functional iterative approaches for solving support vector classification problems based on generalized Huber loss.” Neural Comput & Applic : 1–21
Bouckaert, Remco “class BayesNet” https://weka.sourceforge.io/doc.dev/weka/classifiers/bayes/BayesNet.html Accessed October 2019
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Castro EG, Tsuzuki MSG (2015) Churn prediction in online games using players’ login records: a frequency analysis approach. IEEE Transactions on Computational Intelligence and AI in Games 7(3):255–265
Chih-Chung Chang, Chih-Jen Lin (2001). LIBSVM - a library for support vector machines. URL http://www.csie.ntu.edu.tw/~cjlin/libsvm/
William W Cohen (1995). Fast effective rule induction. In: Twelfth International Conference on Machine Learning, 115-123
Dalvi, Preeti K, et al (2016). “Analysis of customer churn prediction in telecom industry using decision trees and logistic regression.” 2016 Symposium on Colossal Data Analysis and Networking (CDAN). IEEE
Diez JJR et al (2006) Rotation Forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28:1619–1630
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, Chih-Jen Lin (2008). LIBLINEAR - a library for large linear classification. URL http://www.csie.ntu.edu.tw/~cjlin/liblinear/
Farquad MAH, Ravi V, Bapi Raju S (2012) Analytical CRM in banking and finance using SVM: a modified active learning-based rule extraction approach. International Journal of Electronic Customer Relationship Management 6(1):48–73
Fernández A, del Jesus MJ, Herrera F (2010) On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets. Inf Sci 180(8):1268–1291
Fernández-Delgado M et al (2014) Do we need hundreds of classifiers to solve real world classification problems? The journal of machine learning research 15(1):3133–3181
Frank, Eibe (2014). “Fully supervised training of Gaussian radial basis function networks in WEKA.” : 1–5
Eibe Frank, Mark Hall (2001). A Simple Approach to Ordinal Classification. In: 12th European Conference on Machine Learning, 145–156
Eibe Frank, Mark Hall, Bernhard Pfahringer (2003). Locally Weighted Naive Bayes. In: 19th Conference in Uncertainty in Artificial Intelligence, 249–256
Frank E, Wang Y, Inglis S, Holmes G, Witten IH (1998) Using model trees for classification. Mach Learn 32(1):63–76
Eibe Frank, Ian H. Witten (1998). Generating accurate rule sets without global optimization. In: Fifteenth International Conference on Machine Learning, 144-151
Holte RC (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11:63–91
Huang, Guang-Bin, Qin-Yu Zhu, and Chee-Kheong Siew (2004). “Extreme learning machine: a new learning scheme of feedforward neural networks.” 2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04CH37541). Vol. 2. IEEE
Idris, Adnan, Asifullah Khan, and Yeon Soo Lee (2012). “Genetic programming and adaboosting based churn prediction for telecom.” 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE
Ismail MR, Awang MK, Rahman MNA, Makhtar M (2015) A multi-layer perceptron approach for customer churn prediction. International Journal of Multimedia and Ubiquitous Engineering 10(7):213–222
John G Cleary, Leonard E (1995). Trigg: K*: An Instance-based Learner Using an Entropic Distance Measure. In: 12th International Conference on Machine Learning, 108–114
George H John, Pat Langley (1995). Estimating continuous distributions in Bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, 338-345
Khemchandani JR, S. (2007) Chandra, twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Ron Kohavi (1995). The Power of Decision Tables. In: 8th European Conference on Machine Learning, 174–189
R Kohavi (1995). Wrappers for performance enhancement and oblivious decision graphs. Department of Computer Science, Stanford University
Ron Kohavi (1996). Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: Second International Conference on Knoledge Discovery and Data Mining, 202-207
Kumar, Krishan (2013). “Customer retention strategies of telecom service providers.”
Ludmila I Kuncheva (2004). Combining pattern classifiers: methods and algorithms. John Wiley and Sons, Inc
le Cessie S, van Houwelingen JC (1992) Ridge estimators in logistic regression. Appl Stat 41(1):191–201
Lin Dong, Eibe Frank, Stefan Kramer (2005). Ensembles of balanced nested dichotomies for multi-class problems. In: PKDD, 84-95
MATLAB. (2018). (R2018b). Natick, Massachusetts: the MathWorks Inc
P Melville, RJ Mooney (2003). Constructing diverse classifier ensembles using artificial training examples. In: Eighteenth International Joint Conference on Artificial Intelligence, 505-510
Mozer, Michael, Richard Wolniewicz, Eric Johnson and Howard Kaushansky. (1999). Churn reduction in the wireless industry, proceedings of the neural information processing systems conference, San Diego, CA
Nasiri JA, Charkari NM, Jalili S (2015) Least squares twin multi-class classification support vector machine. Pattern Recogn 48(3):984–992
Pamina. (2018). Telecom churn, Teradata center for customer relationship management at Duke University. Version 2. Retrieved 2019 September
Pao Y-H, YoshiyasuTakefuji (1992) Functional-link net computing: theory, system architecture, and functionalities. Computer 25(5):76–79
Peng XJ, Xu D, Kong LY, Chen DJ (2016) L1-norm loss based twin support vector machine for data recognition. InfSci 340–341:86–103
J Platt (1998). Fast Training of Support Vector Machines using Sequential Minimal Optimization. In B. Schoelkopf and C. Burges and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning
Quinlan, John R (1992). “Learning with continuous classes.” 5th Australian joint conference on artificial intelligence. Vol. 92
Quinlan R (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA
R Core Team (2013). R: a language and environment for statistical computing. R Foundation for statistical computing, Vienna, Austria. URL http://www.R-project.org/
Richeldi, Marco, and Alessandro Perrucci (2002). “Churn analysis case study.” Deliverable D17 2
Richhariya B, Sharma A, Tanveer M (2018) “Improved universum twin support vector machine.” 2018 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE
Richhariya B, Tanveer M (2018) EEG signal classification using universum support vector machine. Expert Syst Appl 106:169–182
Richhariya B, Tanveer M (2020) “A reduced universum twin support vector machine for class imbalance learning.” Pattern Recogn :107150
RStudio Team (2015). RStudio: integrated development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/
Savitha R, Suresh S, Sundararajan N (2012) Fast learning circular complex-valued extreme learning machine (CC-ELM) for real-valued classification problems. Inf Sci 187:277–290
Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825–2830, (2011)
Seni G, Elder JF (2010) Ensemble methods in data mining: improving accuracy through combining predictions. Synthesis lectures on data mining and knowledge discovery 2(1):1–126
Shankar, K, et al (2018). “Optimal feature-based multi-kernel SVM approach for thyroid disease classification.” J Supercomput : 1–16
Shao Y-H et al (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968
Sharma, Sweta, Reshma Rastogi, and Suresh Chandra (2019). “Large-scale twin parametric support vector machine using pinball loss function.” IEEE Transactions on Systems, Man, and Cybernetics: Systems
Marc Sumner, Eibe Frank, Mark Hall (2005). Speeding up Logistic Model Tree Induction. In: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, 675–683
Suresh S, Sundararajan N, ParamasivanSaratchandran (2008) Risk-sensitive loss functions for sparse multi-category classification problems. Inf Sci 178(12):2621–2638
Tanveer M, Gautam C, Suganthan PN (2019) Comprehensive evaluation of twin SVM based classifiers on UCI datasets. Appl Soft Comput 83:105617
Tanveer M, Khan MA, Ho S-S (2016) Robust energy-based least squares twin support vector machines. Appl Intell 45(1):174–186
Tanveer M, Tiwari A, Choudhary R, Jalan S (2019) Sparse pinball twin support vector machines. Appl Soft Comput 78:164–175
Tanveer, M, et al (2020). “Machine learning techniques for the diagnosis of Alzheimer’s disease: A review.” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16.1s : 1–35
Ting, KM, Witten, IH (1997). Stacking Bagged and Dagged Models. In: Fourteenth international Conference on Machine Learning, San Francisco, CA, 367-375
Vafeiadis T, Diamantaras KI, Sarigiannidis G, Chatzisavvas KC (2015) A comparison of machine learning techniques for customer churn prediction. Simul Model Pract Theory 55:1–9
Vapnik VN (1998) Statistical learning theory. John Wiley & Sons, New York
Geoffrey I. Webb (2000). MultiBoosting: A Technique for Combining Boosting and Wagging. Machine Learning. Vol.40 (No.2)
Witten IH, Mining EFD (2005) Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Xie Y, Li X, Ngai EWT, Ying W (2009) Customer churn prediction using improved balanced random forests. Expert Syst Appl 36(3):5445–5449
Zhao Y, Li B, Li X, Liu W, Ren S (2005) Customer Churn Prediction Using Improved One-Class Support Vector Machine. In: Li X, Wang S, Dong ZY (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture notes in computer science, vol 3584. Springer, Berlin, Heidelberg
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Adhikary, D.D., Gupta, D. Applying over 100 classifiers for churn prediction in telecom companies. Multimed Tools Appl 80, 35123–35144 (2021). https://doi.org/10.1007/s11042-020-09658-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09658-z