Skip to main content
Log in

Next-Purchase Prediction Using Projections of Discounted Purchasing Sequences

  • Research Paper
  • Published:
Business & Information Systems Engineering Aims and scope Submit manuscript

Abstract

A primary task of customer relationship management (CRM) is the transformation of customer data into business value related to customer binding and development, for instance, by offering additional products that meet customers’ needs. A customer’s purchasing history (or sequence) is a promising feature to better anticipate customer needs, such as the next purchase intention. To operationalize this feature, sequences need to be aggregated before applying supervised prediction. That is because numerous sequences might exist with little support (number of observations) per unique sequence, discouraging inferences from past observations at the individual sequence level. In this paper the authors propose mechanisms to aggregate sequences to generalized purchasing types. The mechanisms group sequences according to their similarity but allow for giving higher weights to more recent purchases. The observed conversion rate per purchasing type can then be used to predict a customer’s probability of a next purchase and target the customers most prone to purchasing a particular product. The bias–variance trade-off when applying the models to target customers with respect to the lift criterion are discussed. The mechanisms are tested on empirical data in the realm of cross-selling campaigns. Results show that the expected bias–variance behavior well predicts the lift achieved with the mechanisms. Results also show a superior performance of the proposed methods compared to commonly used segmentation-based approaches, different similarity measures, and popular class predictors. While the authors tested the approaches for CRM campaigns, their parameterization can be adjusted to operationalize sequential features of high cardinality also in other domains or business functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The company operates on the US and European markets of telecommunication services. The company achieved annual revenues in the double-digit billion Euro range. The product portfolio ranges from basic starter-products like Internet domains, to various hosting solutions up to professional server solutions for large-scale businesses, mobile telephony, as well as access products such as digital subscriber lines.

  2. Such approaches are often used within a broader category of methods such as the Sequence Alignment Method (SAM; Kruskal 1983).

  3. Geometrically descending weights are widely-used techniques to model a discounted importance of observations, such as in time series forecasting (Brown 2004).

  4. This is very different from tasks such as class prediction, where a classifier is typically assessed by the total accuracy or its (potentially weighted) confusion matrix computed over all test data instances. The discrepancy of the business-oriented objective of lift and the traditional accuracy measures as well as its implications are extensively discussed in Baumann et al. (2015).

  5. Evaluations with higher and lower parameter values delivered clearly worse results and are not further considered in this article.

  6. This results in 48 dimensional binary vector encoding 9 + 9 potential products for the first two purchases, and 10 + 10 + 10 potential products when including \(P_0\) for the three prior purchases.

  7. We apply \(\lambda _{Box-Cox}=0.26\), as in our dataset we observe approximately white noise error structures with this value.

  8. We used Wilcoxon test as a more conservative approach but a t-test has also been conducted on Box–Cox transformed values, also confirming the significance in lift difference.

  9. Note, that loss in lift and normalized out-of-sample lift sum up to 1.

References

  • Back B, Holmbom A, Eklund T (2011) Customer portfolio analysis using the SOM. Int J Bus Inf Syst 8(4):396–412

    Google Scholar 

  • Baumann A, Lessmann S, Coussement K, De Bock KW (2015) Maximize what matters: predicting customer churn with decision-centric ensemble selection. In: ECIS 2015 completed research papers. http://aisel.aisnet.org/ecis2015_cr/15/. Accessed 25 June 2017

  • Bicego M, Murino V, Figueiredo MA (2003) Similarity-based clustering of sequences using hidden Markov models. Machine learning and data mining in pattern recognition. Springer, Heidelberg, pp 86–95

    Chapter  Google Scholar 

  • Bose I, Chen X (2009) Quantitative models for direct marketing: a review from systems perspective. Eur J Oper Res 195(1):1–16

    Article  Google Scholar 

  • Brown RG (2004) Smoothing, forecasting and prediction of discrete time series. Courier Dover Publications, Mineola, NY

  • Chan CCH (2008) Intelligent value-based customer segmentation method for campaign management: a case study of automobile retailer. Expert Syst Appl 34(4):2754–2762

    Article  Google Scholar 

  • Cho YB, Cho YH, Kim SH (2005) Mining changes in customer buying behavior for collaborative recommendations. Expert Syst Appl 28(2):359–369

    Article  Google Scholar 

  • Daoud RA, Amine A, Bouikhalene B, Lbibb R (2015) Combining RFM model and clustering techniques for customer value analysis of a company selling online. In: Computer systems and applications (AICCSA), 2015 IEEE/ACS 12th international conference, IEEE, pp 1–6

  • Domingos P (2000) A unified bias-variance decomposition. In: Proceedings of 17th international conference on machine learning. Morgan Kaufmann, Stanford, CA, pp 231–238

  • Dunlavy DM, Kolda TG, Acar E (2011) Temporal link prediction using matrix and tensor factorizations. ACM Trans Knowl Discov Data TKDD 5(2):10

    Google Scholar 

  • Han SH, Lu SX, Leung SC (2012) Segmentation of telecom customers based on customer value by decision tree model. Expert Syst Appl 39(4):3964–3973

    Article  Google Scholar 

  • Hsu MW, Lessmann S, Sung MC, Ma T, Johnson JE (2016) Bridging the divide in financial market forecasting: machine learners vs. financial economists. Expert Syst Appl 61:215–234

    Article  Google Scholar 

  • James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning, vol 6. Springer, Heidelberg

    Book  Google Scholar 

  • Joh CH, Timmermans HJ, Popkowski-Leszczyc PT (2003) Identifying purchase-history sensitive shopper segments using scanner panel data and sequence alignment methods. J Retail Consum Serv 10(3):135–144

    Article  Google Scholar 

  • Kaski S, Nikkilä J, Kohonen T (1998) Methods for interpreting a self-organized map in data analysis. In: In Proc. 6th European Symposium on Artificial Neural Networks (ESANN98). D-Facto, Brugfes, Citeseer

  • Khajvand M, Tarokh MJ (2011) Estimating customer future value of different customer segments based on adapted RFM model in retail banking context. Proced Comput Sci 3:1327–1332

    Article  Google Scholar 

  • Kohonen T (2001) Self-organizing maps. Springer, Heidelberg

    Book  Google Scholar 

  • Kruskal JB (1983) An overview of sequence comparison: time warps, string edits, and macromolecules. SIAM Rev 25(2):201–237

    Article  Google Scholar 

  • Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Cybern Control Theory 10:845–848

    Google Scholar 

  • Li S, Sun B, Wilcox RT (2005) Cross-selling sequentially ordered products: an application to consumer banking services. J Mark Res 42(2):233–239

    Article  Google Scholar 

  • MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations, vol 1. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, California, pp 281–297

  • Miguéis V, Van den Poel D, Camanho A, Cunha J (2012) Predicting partial customer churn using Markov for discrimination for modeling first purchase sequences. Adv Data Anal Classif 6(4):337–353

    Article  Google Scholar 

  • Moeyersoms J, Martens D (2015) Including high-cardinality attributes in predictive models: A case study in churn prediction in the energy sector. Decis Support Syst 72:72–81

    Article  Google Scholar 

  • Moon S, Russell GJ (2008) Predicting product purchase from inferred customer similarity: an autologistic model approach. Manag Sci 54(1):71–82

    Article  Google Scholar 

  • Mooney CH, Roddick JF (2013) Sequential pattern mining—approaches and algorithms. ACM Comput Surv 45(2):19:1–19:39

    Article  Google Scholar 

  • Netzer O, Lattin JM, Srinivasan V (2008) A hidden Markov model of customer relationship dynamics. Mark Sci 27(2):185–204

    Article  Google Scholar 

  • Ngai E, Xiu L, Chau D (2009) Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl 36(2):2592–2602

    Article  Google Scholar 

  • Park DH, Kim HK, Choi IY, Kim JK (2012) A literature review and classification of recommender systems research. Expert Syst Appl 39(11):10,059–10,072

    Article  Google Scholar 

  • Piatetsky-Shapiro G, Masand B (1999) Estimating campaign benefits and modeling lift. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, KDD ’99, pp 185–193. doi:10.1145/312129.312225

  • Prinzie A, Van den Poel D (2007) Predicting home-appliance acquisition sequences: Markov/Markov for discrimination and survival analysis for modeling sequential information in NPTB models. Decis Support Syst 44(1):28–45

    Article  Google Scholar 

  • Sahoo N, Singh PV, Mukhopadhyay T (2012) A hidden Markov model for collaborative filtering. MIS Q 36(4):1329–1356

    Google Scholar 

  • Schweidel DA, Bradlow ET, Fader PS (2011) Portfolio dynamics for customers of a multiservice provider. Manag Sci 57(3):471–486

    Article  Google Scholar 

  • Shirley KE, Small DS, Lynch KG, Maisto SA, Oslin DW (2010) Hidden Markov models for alcoholism treatment trial data. Ann Appl Stat 4:366–395

    Article  Google Scholar 

  • Steinmann S, Silberer G (2010) Clustering customer contact sequences—results of a customer survey in retailing. European Retail Research. Gabler, Wiesbaden, pp 97–120

    Chapter  Google Scholar 

  • Van den Poel D, Buckinx W (2005) Predicting online-purchasing behaviour. Eur J Oper Res 166(2):557–575

    Article  Google Scholar 

  • Wong KW, Zhou S, Yang Q, Yeung JMS (2005) Mining customer value: from association rules to direct marketing. Data Min Knowl Discov 11(1):57–79

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katerina Shapoval.

Additional information

Accepted after two revisions by Prof. Dr. Suhl.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shapoval, K., Setzer, T. Next-Purchase Prediction Using Projections of Discounted Purchasing Sequences. Bus Inf Syst Eng 60, 151–166 (2018). https://doi.org/10.1007/s12599-017-0485-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12599-017-0485-1

Keywords

Navigation