Exploring nested ensemble learners using overproduction and choose approach for churn prediction in telecom industry

  • Mahreen Ahmed
  • Hammad Afzal
  • Imran Siddiqi
  • Muhammad Faisal Amjad
  • Khawar Khurshid
Original Article


Combining multiple classifiers to create hybrid learners (ensembles) has gained popularity in recent years. Ensembles are gaining more interest in the field of data mining as they have reportedly performed best predictions as compared to individual classifiers. This has resulted in experimentation with new ways of ensemble creation. This paper presents a study on creation of novel hybrid ways of combining multiple ensemble models using ‘over production and choose approach.’ In contrast to the original concept of ensembles that combine various learners, the proposed ensemble models comprise of combinations of other ensembles. In particular, we have combined learners as in composition of other learners, thus producing nested learners. Two such models named as Boosted-Stacked learners and Bagged-Stacked learners are proposed and are shown to outperform the traditional ensembles. Experiments are performed in churn prediction domain where a benchmark customer churn dataset (available on UCI repository) and a newly created dataset from a South Asian wireless telecom operator (named as SATO) are used. SATO dataset is created as balanced dataset (having equal number of churners and non-churners). The novel Boosted-Stacked learner and Bagged-Stacked learner achieved accuracies of 98.4% and 97.2%, respectively, on the UCI Churn dataset outperforming the existing state-of-the-art techniques. Furthermore, a high accuracy on the SATO dataset validates the effectiveness of the proposed models on balanced as well as imbalanced datasets.


Data mining Churn prediction Classification Ensembles Telecommunication industry 


Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Ali S, Majid A (2015) Can-evo-ens: classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences. J Biomed Inf 54:256–269CrossRefGoogle Scholar
  2. 2.
    Amin A, Anwar S, Adnan A, Nawaz M, Alawfi K, Hussain A, Huang K (2017) Customer churn prediction in the telecommunication sector using a rough set approach. Neurocomputing 237:242–254CrossRefGoogle Scholar
  3. 3.
    Athanasopoulos G, Song H, Sun JA (2017) Bagging in tourism demand modeling and forecasting. J Travel Res. Google Scholar
  4. 4.
    Azeem M, Usman M, Fong A (2017) A churn prediction model for prepaid customers in telecom using fuzzy classifiers. Telecommun Syst 66(4):603–614CrossRefGoogle Scholar
  5. 5.
    Basiri J, Taghiyareh F, Moshiri B (2010) A hybrid approach to predict churn. In: Services computing conference (APSCC), 2010 IEEE Asia-Pacific. IEEE, pp 485–491Google Scholar
  6. 6.
    Baumann A, Lessmann S, Coussement K, De Bock KW (2015) Maximize what matters: predicting customer churn with decision-centric ensemble selection. In: Proceedings of the 23rd European conference on information systems (ECIS)Google Scholar
  7. 7.
    Blake CL, Merz CJ (1998) UCI Repository of machine learning databases, Irvine, University of California.
  8. 8.
    Blouin KD, Flannigan MD, Wang X, Kochtubajda B (2016) Ensemble lightning prediction models for the province of alberta, canada. Int J Wildland Fire 25(4):421–432Google Scholar
  9. 9.
    Brandusoiu IB, Toderean G (2014) A neural networks approach for churn prediction modeling in mobile telecommunications industry. Ann Univ Craiova 11(1):9–16Google Scholar
  10. 10.
    Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MathSciNetzbMATHGoogle Scholar
  11. 11.
    Breiman L (1996) Heuristics of instability and stabilization in model selection. Ann Stat 24(6):2350–2383MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Coussement K, Lessmann S, Verstraeten G (2017) A comparative analysis of data preparation algorithms for customer churn prediction: a case study in the telecommunication industry. Decis Support Syst 95:27–36CrossRefGoogle Scholar
  13. 13.
    Dahiya K, Bhatia S (2015) Customer churn analysis in telecom industry. In: Reliability, infocom technologies and optimization (ICRITO) (Trends and future directions). 2015 4th International Conference on, pp 1–6Google Scholar
  14. 14.
    De Bock KW, Van den Poel D (2011) An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction. Expert Syst Appl 38(10):12,293–12,301CrossRefGoogle Scholar
  15. 15.
    Dietterich TG (2000) Ensemble methods in machine learning. Springer, Berlin, pp 1–15Google Scholar
  16. 16.
    Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory. Springer, pp 23–37Google Scholar
  17. 17.
    Giacinto G, Roli F (2001) Design of effective neural network ensembles for image classification purposes. Image Vis Comput 19(9):699–707CrossRefGoogle Scholar
  18. 18.
    Hadden J, Tiwari A, Roy R, Ruta D (2006) Churn prediction: does technology matter. Int J Intell Technol 1(2):104–110Google Scholar
  19. 19.
    Hung SY, Yen DC, Wang HY (2006) Applying data mining to telecom churn management. Expert Syst Appl 31(3):515–524CrossRefGoogle Scholar
  20. 20.
    Hussain SF, Mushtaq M, Halim Z (2014) Multi-view document clustering via ensemble method. J Intell Inf Syst 43(1):81CrossRefGoogle Scholar
  21. 21.
    Ismail MR, Awang MK, Rahman MNA, Makhtar M (2015) A multi-layer perceptron approach for customer churn prediction. Int J Multimed Ubiquitous Eng 10(7):213–222CrossRefGoogle Scholar
  22. 22.
    Jedrzejowicz J, Kostrzewski R, Neumann J, Zakrzewska M (2018) Imbalanced data classification using mapreduce and relief. J Inf Telecommun 2(2):217–230Google Scholar
  23. 23.
    Kang S, Cho S, Kang P (2015) Multi-class classification via heterogeneous ensemble of one-class classifiers. Eng Appl Artif Intell 43:35–43CrossRefGoogle Scholar
  24. 24.
    Kisioglu P, Topcu YI (2011) Applying bayesian belief network approach to customer churn analysis: a case study on the telecom industry of turkey. Expert Syst Appl 38(6):7151–7157CrossRefGoogle Scholar
  25. 25.
    Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the international joint conference on artificial intelligence, vol 14, pp 1137–1145Google Scholar
  26. 26.
    Koller D, Sahami M (1996) Toward optimal feature selection. Technical Report, Stanford InfoLabGoogle Scholar
  27. 27.
    Kraljević G, Gotovac S (2010) Modeling data mining applications for prediction of prepaid churn in telecommunication services. AUTOMATIKA: časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije 51(3):275–283CrossRefGoogle Scholar
  28. 28.
    Kubat M, Matwin S (1997) Addressing the curse of imbalanced data sets: one sided sampling. In: Proceedings of the international conference on machine learning, vol 97, pp 179–186Google Scholar
  29. 29.
    Lemmens A, Croux C (2006) Bagging and boosting classification trees to predict churn. J Mark Res 43(2):276–286CrossRefGoogle Scholar
  30. 30.
    Liu Y, Zhuang Y (2015) Research model of churn prediction based on customer segmentation and misclassification cost in the context of big data. J Comput Commun 3(06):87CrossRefGoogle Scholar
  31. 31.
    Lu N, Lin H, Lu J, Zhang G (2014) A customer churn prediction model in telecom industry using boosting. IEEE Trans Ind Inf 10(2):1659–1665CrossRefGoogle Scholar
  32. 32.
    Malmasi S, Dras M (2017) Native language identification using stacked generalization. arXiv preprint arXiv:170306541
  33. 33.
    Nath SV, Behara RS (2003) Customer churn analysis in the wireless industry: a data mining approach. In: Proceedings of the Annual meeting of the decision sciences institute, pp 505–510Google Scholar
  34. 34.
    Olle GDO, Cai S (2014) A hybrid churn prediction model in mobile telecommunication industry. Int J e-Educ e-Bus e-Manag e-Learn 4(1):55Google Scholar
  35. 35.
    Olorunnimbe MK, Viktor HL, Paquet E (2017) Dynamic adaptation of online ensembles for drifting data streams. J Intell Inf Syst 50(2):291–313CrossRefGoogle Scholar
  36. 36.
    Oseman K, Shukor SM, Haris NA, Bakar FA (2010) Data mining in churn analysis model for telecommunication industry. J Stat Model Anal 1:19–27Google Scholar
  37. 37.
    Partridge D, Yates WB (1996) Engineering multiversion neural-net systems. Neural Comput 8(4):869–893CrossRefGoogle Scholar
  38. 38.
    Potamias G, Koumakis L, Moustakis V (2004) Gene selection via discretized gene-expression profiles and greedy feature-elimination. In: Hellenic conference on artificial intelligence. Springer, pp 256–266Google Scholar
  39. 39.
    Qureshi SA, Rehman AS, Qamar AM, Kamal A, Rehman A (2013) Telecommunication subscribers’ churn prediction model using machine learning. In: 2013 8th international conference on digital information management (ICDIM). IEEE, pp 131–136Google Scholar
  40. 40.
    Richter Y, Yom-Tov E, Slonim N (2010) Predicting customer churn in mobile networks through analysis of social groups. In: Proceedings of the 2010 SIAM international conference on data mining. SIAM, pp 732–741Google Scholar
  41. 41.
    Rodan A, Fayyoumi A, Faris H, Alsakran J, Al-Kadi O (2015) Negative correlation learning for customer churn prediction: a comparison study. Sci World J 2015:473283-1–473283-7. CrossRefGoogle Scholar
  42. 42.
    Roli F, Giacinto G, Vernazza G (2001) Methods for designing multiple classifier systems. In: Kittler J, Roli F (eds) Multiple classifier systems. MCS 2001, vol 2096. Lecture Notes in Computer Science, pp 78–87Google Scholar
  43. 43.
    Sharma A, Kumar Panigrahi P (2011) A neural network based approach for predicting customer churn in cellular network services. Int J Comput Appl 27(11):26–31. Google Scholar
  44. 44.
    Sharma A, Panigrahi D, Kumar P (2011) A neural network based approach for predicting customer churn in cellular network services. Int J Comput Appl 27(11):26–31Google Scholar
  45. 45.
    Stripling E, vanden Broucke S, Antonio K, Baesens B, Snoeck M (2017) Profit maximizing logistic model for customer churn prediction using genetic algorithms. Swarm Evolut Comput 40:116–130CrossRefGoogle Scholar
  46. 46.
    Tsai CF, Chen MY (2010) Variable selection by association rules for customer churn prediction of multimedia on demand. Expert Syst Appl 37(3):2006–2015CrossRefGoogle Scholar
  47. 47.
    Tsai CF, Lu YH (2009) Customer churn prediction by hybrid neural networks. Expert Syst Appl 36(10):12547–12553CrossRefGoogle Scholar
  48. 48.
    Vafeiadis T, Diamantaras KI, Sarigiannidis G, Chatzisavvas KC (2015) A comparison of machine learning techniques for customer churn prediction. Simul Model Pract Theory 55:1–9CrossRefGoogle Scholar
  49. 49.
    Verbeke W, Martens D, Mues C, Baesens B (2011) Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Syst Appl 38(3):2354–2364CrossRefGoogle Scholar
  50. 50.
    Verbeke W, Dejaeger K, Martens D, Hur J, Baesens B (2012) New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. Eur J Oper Res 218(1):211–229CrossRefGoogle Scholar
  51. 51.
    Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443CrossRefGoogle Scholar
  52. 52.
    Wang Y, Feng D, Li D, Chen X, Zhao Y, Niu X (2016) A mobile recommendation system based on logistic regression and gradient boosting decision trees. In: 2016 international joint conference on neural networks (IJCNN). IEEE, pp 1896–1902Google Scholar
  53. 53.
    Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259CrossRefGoogle Scholar
  54. 54.
    Ge Xia, Jin Wd (2008) Model of customer churn prediction on support vector machine. Syst Eng Theory Pract 28(1):71–77CrossRefGoogle Scholar
  55. 55.
    Xiao J, Xiao Y, Huang A, Liu D, Wang S (2015) Feature-selection-based dynamic transfer ensemble model for customer churn prediction. Knowl Inf Syst 43(1):29–51CrossRefGoogle Scholar
  56. 56.
    Xu D, Zhang Y, Cheng C, Xu W, Zhang L (2014) A neural network-based ensemble prediction using PMRS and ECM. In: 2014 47th Hawaii international conference on system sciences (HICSS). IEEE, pp 1335–1343Google Scholar
  57. 57.
    Yang J, Rao R, Hong P, Ding P (2016) Ensemble model for stock price movement trend prediction on different investing periods. In: 2016 12th international conference on computational intelligence and security (CIS). IEEE, pp 358–361Google Scholar
  58. 58.
    Zhang W, Zou H, Luo L, Liu Q, Wu W, Xiao W (2016) Predicting potential side effects of drugs by recommender methods and ensemble learning. Neurocomputing 173:979–987CrossRefGoogle Scholar
  59. 59.
    Zhao L, Gao Q, Dong X, Dong A, Dong X (2017) K-Local maximum margin feature extraction algorithm for churn prediction in telecom. Cluster Comput 20(2):1401–1409CrossRefGoogle Scholar
  60. 60.
    Zhao Y, Li B, Li X, Liu W, Ren S (2005) Customer churn prediction using improved one-class support vector machine. In: International conference on advanced data mining and applications. Springer, pp 300–306Google Scholar

Copyright information

© The Natural Computing Applications Forum 2018

Authors and Affiliations

  1. 1.National University of Sciences and TechnologyIslamabadPakistan
  2. 2.Bahria UniversityIslamabadPakistan

Personalised recommendations