Mining Rare Events Data for Assessing Customer Attrition Risk

  • Tom Au
  • Meei-Ling Ivy Chin
  • Guangqin Ma
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 31)


Customer attrition refers to the phenomenon whereby a customer leaves a service provider. As competition intensifies, preventing customers from leaving is a major challenge to many businesses such as telecom service providers. Research has shown that retaining existing customers is more profitable than acquiring new customers due primarily to savings on acquisition costs, the higher volume of service consumption, and customer referrals. For a large enterprise, its customer base consists of tens of millions service subscribers, more often the events, such as switching to competitors or canceling services are large in absolute number, but rare in percentage, far less than 5%. Based on a simple random sample, popular statistical procedures, such as logistic regression, tree-based method and neural network, can sharply underestimate the probability of rare events, and often result a null model (no significant predictors). To improve efficiency and accuracy for event probability estimation, a case-based data collection technique is then considered. A case-based sample is formed by taking all available events and a small, but representative fraction of nonevents from a dataset of interest. In this article we showed a consistent prior correction method for events probability estimation and demonstrated the performance of the above data collection techniques in predicting customer attrition with actual telecommunications data.


Rare Events Data Case-Based Sampling ROC Curves 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    King, G., Zeng, L.: Logistic Regression in Rare Events Data. Society for Political Methodology, 137–163 (February 2001)Google Scholar
  2. 2.
    Prentice, R.L.: A Case-cohort Design for Epidemiologic Cohort Studies and Disease Prevention Trials. Biometrika 73, 1–11 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Jacob, R.: Why Some Customers Are More Equal Than Others. Fortune, 200–201 (September 19, 1994)Google Scholar
  4. 4.
    Walker, O.C., Boyd, H.W., Larreche, J.C.: Marketing Strategy: Planning and Implementation, 3rd edn., Irwin, Boston (1999)Google Scholar
  5. 5.
    Li, S.: Applications of Demographic Techniques in Modeling Customer Retention. In: Rao, K.V., Wicks, J.W. (eds.) Applied Demography, pp. 183–197. Bowling Green State University, Bowling Green (1994)Google Scholar
  6. 6.
    Li, S.: Survival Analysis. Marketing Research, 17–23 (Fall, 1995)Google Scholar
  7. 7.
    Breslow, N.E.: Statistics in Epidemiology: The case-Control Study. Journal of the American Statistical Association 91, 14–28 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Hanley, J.A., McNeil, B.J.: The Meaning and Use of the Area under a ROC Curve. Radiology 143, 29–36 (1982)CrossRefGoogle Scholar
  9. 9.
    Ma, G., Hall, W.J.: Confidence Bands for ROC Curves. Medical Decision Making 13, 191–197 (1993)CrossRefGoogle Scholar
  10. 10.
    Au, T., Li, S., Ma, G.: Applications Applying and Evaluating Models to Predict Customer Attrition Using Data Mining Techniques. J. of Cmparative International Management 6, 10–22 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Tom Au
    • 1
  • Meei-Ling Ivy Chin
    • 1
  • Guangqin Ma
    • 1
  1. 1.AT&T Labs, Inc.-ResearchUSA

Personalised recommendations