Abstract
Currently, in order to remain competitive companies are adopting customer centered strategies and consequently customer relationship management is gaining increasing importance. In this context, customer retention deserves particular attention. This paper proposes a model for partial churn detection in the retail grocery sector that includes as a predictor the similarity of the products’ first purchase sequence with churner and non-churner sequences. The sequence of first purchase events is modeled using Markov for discrimination. Two classification techniques are used in the empirical study: logistic regression and random forests. A real sample of approximately 95,000 new customers is analyzed taken from the data warehouse of a European retailing company. The empirical results reveal the relevance of the inclusion of a products’ sequence likelihood in partial churn prediction models, as well as the supremacy of logistic regression when compared with random forests.
Similar content being viewed by others
References
Agresti A (1996) An introduction to categorical data analysis. Wiley-Interscience, New York
Bejou D, Ennew CT, Palmer A (1998) Trust, ethics and relationship satisfaction. Int J Bank Market 16: 170–175
Bikhchandani S, Hirshleifer D, Welch I (1992) A theory of fads, fashion, custom, and cultural change as informational cascades. J Political Econ 100(5):992–1026
Bikhchandani S, Hirshleifer D, Welch I (1998) Learning from the behavior of others: conformity, fads, and informational cascades. J Econ Perspect 12(3):151–170
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Buckinx W, Van den Poel D (2005) Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting. Eur J Operat Res 164(1):252–268
Bult JR, Wansbeek T (1995) Optimal selection for direct mail. Market Sci 14(4):378–394
Burez J, Van den Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36:4626–4636
Coussement K, Van den Poel D (2008) Churn prediction in subscription services: an application of support vector machines while comparing two parameter-selection techniques. Expert Syst Appl 34(1):313–327
DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44(3):837–845
Dhar R, Novemsky N (2002) The effects of goal fulfillment on risk preferences in sequential choice. Adv Consumer Res 29:6–7
Dick AS, Basu K (1994) Customer loyalty: toward an integrated conceptual framework. J Acad Market Sci 22:99–113
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Dwyer FR (1989) Customer lifetime valuation to support marketing decision making. J Direct Market 3(4):8–15
EFMI, CBL (2005) Consumenten trends 2005. Technical report, Rotterdam/Leidschendam
Grover R, Vriens M (2006) The handbook of marketing research: uses, misuses, and future advances. Sage Publications, California
Gujarati D (2002) Basic econometrics. McGraw-Hill/Irwin, New York
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36
Hung S, Yen DC, Wang H (2006) Applying data mining to telecom churn management. Expert Syst Appl 31(3):515–524
Hwang H, Jung T, Suh E (2004) An LTV model and customer segmentation based on customer value: a case study on the wireless telecommunication industry. Expert Syst Appl 26(2):181–188
Kalbfleisch J, Prentice RL (1980) The statistical analysis of failure time data. Wiley, New York
Kamakura W, Ramaswami S, Srivastava R (1991) Applying latent trait analysis in the evaluation of prospects for cross-selling of financial services. Int J Res Market 8(4):329–349
Kotler P (1999) Marketing management: analysis, planning, implementation, and control. Prentice Hall, New Jersey
Kumar DA, Ravi V (2008) Predicting credit card customer churn in banks using data mining. Int J Data Anal Tech Strateg 1(1):4–28
Larivire B, Van den Poel D (2004) Investigating the role of product features in preventing customer churn, by using survival analysis and choice modeling: the case of financial services. Expert Syst Appl 27(2): 277–285
Larivire B, Van den Poel D (2005) Predicting customer retention and profitability by using random forests and regression forests techniques. Expert Syst Appl 29(2):472–484
Martin C, Clark M, Peck H, Payne A (1995) Relationship marketing for competitive advantage: winning and keeping customers. Butterworth-Heinemann, Oxford
Mavri M, Ioannou G (2008) Customer switching behaviour in greek banking services using survival analysis. Manag Finance 34:186–197
Mcculloch W, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(133):115
Miguéis VL, Camanho AS, Cunha JF (2011) Mining customer loyalty card programs: The improvement of service levels enabled by innovative segmentation and promotions design. Lect Notes Bus Inf Process 82:83–97
Migueis VL, Van den Poel D, Camanho AS, Falcao e Cunha J (2012) Modeling partial customer churn: on the value of first product-category purchase sequences. Expert Syst Appl 39(12):11250–11256
Morik K, Kpcke H (2004) Analysing customer churn in insurance data: a case study. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) Knowledge discovery in databases, lecture notes in computer science, vol 3202. Springer, Italy, pp 325–336
Morrison DG (1969) On the interpretation of discriminant analysis. J Market Res 6(2):156–163
Murthy SK (1997) Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min Knowl Discov 2:345–389
Nakahara T, Yada K (2012) Analyzing consumers’ shopping behavior using RFID data and pattern mining. Adv Data Anal Class Special Issue Data Anal Class Market. doi:10.1007/s11634-012-0117-z
Neter J, Kutner M, Wasserman W, Nachtsheim C (1996) Applied linear statistical models, 4th edn. McGraw-Hill/Irwin, New York
Ngai E, Xiu L, Chau D (2009) Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl 36(2, Part 2):2592–2602
Novemsky N, Dhar R (2005) Goal fulfillment and goal targets in sequential choice. J Consumer Res 32(3):396–404
Paruelo J, Tomasel F (1997) Prediction of functional characteristics of ecosystems: a comparison of artificial neural networks and regression models. Ecol Modell 98(2–3):173–186
Peterson RA (1995) Relationship marketing and the consumer. J Acad Market Sci 23:278–281
Prinzie A, Van den Poel D (2006a) Incorporating sequential information into traditional classification models by using an element/position-sensitive SAM. Decis Support Syst 42(2):508–526
Prinzie A, Van den Poel D (2006b) Investigating purchasing-sequence patterns for financial services using markov, MTD and MTDg models. Eur J Operat Res 170(3):710–734
Prinzie A, Van den Poel D (2007) Predicting home-appliance acquisition sequences: Markov/Markov for discrimination and survival analysis for modeling sequential information in NPTB models. Decis Support Syst 44(1):28–45
Quinlan JR (1992) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Francisco
Reichheld FF, Sasser WE Jr (1990) Zero defections: quality comes to services. Harvard Bus Rev 68(5): 105–111
Saren MJ, Tzokas NX (1998) Some dangerous axioms of relationship marketing. J Strateg Market 6(3): 187–196
Strandvik T, Liljander V (1994) Relationship strength in bank services. Theory, methods and applications, relationship marketing. Atlanta, GA, pp 356–359
Verbeke W, Martens D, Mues C, Baesens B (2011) Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Syst Appl 38(3):2354–2364
Wei C, Chiu I (2002) Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst Appl 23(2):103–112
Zeithaml V, Berry L, Parasuraman A (1996) The behavioral consequences of service quality. J Market 60(2):31–46
Acknowledgments
The funding of this research through the scholarship SFRH/BD/60970/2009 from the Portuguese Foundation of Science and Technology (FCT) is gratefully acknowledged by the first author.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Miguéis, V.L., Van den Poel, D., Camanho, A.S. et al. Predicting partial customer churn using Markov for discrimination for modeling first purchase sequences. Adv Data Anal Classif 6, 337–353 (2012). https://doi.org/10.1007/s11634-012-0121-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-012-0121-3
Keywords
- Customer relationship management
- Churn analysis
- Retailing
- Classification
- Logistic regression
- Random forests