A Non-sequential Representation of Sequential Data for Churn Prediction

  • Mark Eastwood
  • Bogdan Gabrys
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5711)


We investigate the length of event sequence giving best predictions when using a continuous HMM approach to churn prediction from sequential data. Motivated by observations that predictions based on only the few most recent events seem to be the most accurate, a non-sequential dataset is constructed from customer event histories by averaging features of the last few events. A simple K-nearest neighbor algorithm on this dataset is found to give significantly improved performance. It is quite intuitive to think that most people will react only to events in the fairly recent past. Events related to telecommunications occurring months or years ago are unlikely to have a large impact on a customer’s future behaviour, and these results bear this out. Methods that deal with sequential data also tend to be much more complex than those dealing with simple non-temporal data, giving an added benefit to expressing the recent information in a non-sequential manner.


Hide Markov Model Event History Recent Event Combination Method Hide State 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bilmes, J.: A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models (1997)Google Scholar
  2. 2.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHGoogle Scholar
  3. 3.
    Chen, Y.-S., Hung, Y.-P., Yen, T.-F., Fuh, C.-S.: Fast and versatile algorithm for nearest neighbor search based on a lower bound tree. Pattern Recogn. 40(2), 360–375 (2007)CrossRefzbMATHGoogle Scholar
  4. 4.
    Dietterich, T.G.: Machine learning for sequential data: A review. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 15–30. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Duda, R., Hart, P., Stork, D.: Pattern Classification. John Wiley and Sons, Chichester (2001)zbMATHGoogle Scholar
  6. 6.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann, San Francisco (1996)Google Scholar
  7. 7.
    Haddon, J., Tiwari, A., Roy, R., Ruta, D.: Churn prediction: Does technology matter (2006)Google Scholar
  8. 8.
    Lemmens, A., Croux, C.: Bagging and boosting classification trees to predict churn. Journal of Marketing Research XLIII, 276–286 (2006)CrossRefGoogle Scholar
  9. 9.
    Murphy, K.: A hmm toolbox for matlab,
  10. 10.
    Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  11. 11.
    Ruta, D., Nauck, D., Azvine, B.: K nearest sequence method and its application to churn prediction. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 207–215. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Wei, C.-P., Chiu, I.-T.: Turning telecommunications call details to churn prediction: a data mining approach. Expert Systems with Applications 23, 103–112 (2002)CrossRefGoogle Scholar
  13. 13.
    Yan, L., Miller, D.J., Mozer, M.C., Wolniewicz, R.: Improving prediction of customer behaviour in non-stationary environments. In: Proc. of Int. Joint Conf. on Neural Networks, pp. 2258–2263 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Mark Eastwood
    • 1
  • Bogdan Gabrys
    • 1
  1. 1.Computational Intelligence Research Group, School of Design, Engineering and ComputingBournemouth UniversityUK

Personalised recommendations