Skip to main content
Log in

Survey data and Bayesian analysis: a cost-efficient way to estimate customer equity

  • Published:
Quantitative Marketing and Economics Aims and scope Submit manuscript

Abstract

We present a Bayesian framework for estimating the customer lifetime value (CLV) and the customer equity (CE) based on the purchasing behavior deducible from the market surveys on customer purchasing behavior. The proposed framework systematically addresses the challenges faced when the future value of customers is estimated based on survey data. The scarcity of the survey data and the sampling variance are countered by utilizing the prior information and quantifying the uncertainty of the CE and CLV estimates by posterior distributions. Furthermore, information on the purchase behavior of the customers of competitors available in the survey data is integrated to the framework. The introduced approach is directly applicable in the domains where a customer relationship can be thought to be monogamous. As an example on the use of the framework, we analyze a consumer survey on mobile phones carried out in Finland in February 2013. The survey data contains consumer given information on the current and previous brand of the phone and the times of the last two purchases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abe, M. (2009). Counting your customers one by one: A hierarchical Bayes extension to the Pareto/NBD model. Marketing Science, 28(3), 541–553.

    Article  Google Scholar 

  • Allison, P.D. (1985). Survival analysis of backward recurrence times. Journal of the American Statistical Association, 80(390), 315–322.

    Article  Google Scholar 

  • Bauer, H., Hammerschmidt, M., Braehler, M. (2003). Customer lifetime value concept and its contribution to corporate valuation. Yearbook of Marketing and Consumer Research, 1(1), 49–67.

  • Bejou, D., Keiningham, T., Aksoy, L. (Eds.) (2006). Customer lifetime value – reshaping the way we manage to maximize profit: Haworth Press.

  • Blattberg, R.C., Byung-Do, K., Neslin, S.A. (Eds.) (2008). Database marketing: analyzing and managing customers: Springer.

  • Borle, S., Singh, S.S., Jain, D.C. (2008). Customer lifetime value measurement. Management science, 54(1), 100–112.

    Article  Google Scholar 

  • Brooks, S.P., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, 434–455.

    MathSciNet  Google Scholar 

  • Fader, P., & Hardie, B. (2010). Customer-base valuation in a contractual setting: The perils of ignoring heterogeneity. Marketing Science, 29(1).

  • Fader, P., Hardie, B., Lee, K.L. (2005a). Counting Your Customers the easy way: An alternative to the Pareto/NBD model. Marketing Science, 24(2), 275–284.

    Article  Google Scholar 

  • Fader, P., Hardie, B., Lee, K.L. (2005b). RFM and CLV: Using iso-value curves for customer base analysis. Journal of Marketing Research XLII(November), 415–430.

  • Finnish Communications Regulatory Authority (FICORA) (2013). Toimialakatsaus 2012 (In Finnish). https://www.viestintavirasto.fi/attachments/Toimialakatsaus2012.pdf.

  • Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–533.

    MathSciNet  Google Scholar 

  • Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B. (2013). Bayesian data analysis, 3rd edn. Boca Raton, FL: Chapman & Hall/CRC.

  • Gupta, S., & Lehmann, D. (2005). Managing customers as investments: the strategic value of customers in the long run. Upper Saddle River, NJ: Wharton School Publishing.

  • Herniter, J. (1971). A probablistic market model of purchase timing and brand selection. Management Science, 18(4-Part-II), P–102.

    Article  Google Scholar 

  • Hubbard, D.W. (2010). How to measure anything: finding the value of intangibles in business, 2nd edn. Hoboken, NJ: Wiley.

  • Jen, L., Chou, C.-H., Allenby, G.M. (2009). The importance of modeling temporal dependence of timing and quantity in direct marketing. Journal of Marketing Research, 46(4), 482–493.

    Article  Google Scholar 

  • Kumar, V., & George, M. (2007). Measuring and maximizing customer equity: a critical analysis. Journal of the Academy of Marketing Science, 35(4), 157–171.

    Article  Google Scholar 

  • Kumar, V., & Petersen, J.A. (2005). Using a customer-level marketing strategy to enhance firm’s performance. Journal of the Academy of Marketing Science, 33(4), 505–519.

    Article  Google Scholar 

  • Kumar, V., Venkatesan, R., Bohling, T., Beckmann, D. (2008). The power of CLV: Managing customer lifetime value at IBM. Marketing Science, 27(4), 585–599.

    Article  Google Scholar 

  • Lunn, D., Spiegelhalter, D., Thomas, A., Best, N. (2009). The BUGS project: Evolution, critique and future directions (with discussion). Statistics in Medicine, 28, 3049–3082.

    Article  MathSciNet  PubMed  Google Scholar 

  • Nagano, S., Ichikawa, Y., Takaya, N., Uchiyama, T., Abe, M. (2013). Nonparametric hierarchal bayesian modeling in non-contractual heterogeneous survival data. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining(pp. 668–676). ACM.

  • Pfeifer, P. (2011). On estimating current-customer equity using company summary data. Journal of Interactive Marketing, 25(1), 1–14.

    Article  Google Scholar 

  • R Core Team (2012). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.

  • Rossi, P.E., Allenby, G.M., McCulloch, R. (2005). Bayesian statistics and marketing. Hoboken, NJ: Wiley.

  • Rust, R.T., Zeithaml, V.A., Lemon, K.N. (2001). Driving customer equity: how customer lifetime value is reshaping corporate strategy. New York: Simon and Schuster.

  • Schmittlein, D., Morrison, D., Colombo, R. (1987). Counting your customers: Who they are and what will they do next Management Science, 33(1), 1–24.

    Article  Google Scholar 

  • Schmittlein, D.C., Bemmaor, A.C., Morrison, D.G. (1985). Technical note – why does the NBD model work? Robustness in representing product purchases, brand purchases and imperfectly recorded purchases. Marketing Science, 4(3), 255–266.

    Article  Google Scholar 

  • Singh, S.S., Borle, S., Jain, D.C. (2009). A generalized framework for estimating customer lifetime value when customer lifetimes are not observed. Quantitative Marketing and Economics, 7(2), 181–205.

    Article  Google Scholar 

  • Statistics Finland (2012). Statistical Yearbook of Finland 2012.

  • Sturtz, S., Ligges, U., Gelman, A. (2005). R2WinBUGS: A package for running WinBUGS from R. Journal of Statistical Software, 12(3), 1–16.

    Article  Google Scholar 

  • Venkatesan, R., & Kumar, V. (2004). A customer lifetime value framework for customer selection and resource allocation strategy. Journal of Marketing, 68(10), 106–125.

    Article  Google Scholar 

  • Vilcassim, N.J., & Jain, D.C. (1991). Modeling purchase-timing and brand-switching behavior incorporating explanatory variables and unobserved heterogeneity. Journal of Marketing Research, 29–41.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juha Karvanen.

Appendices

Appendix A: Model for the simulation example

The estimation method is demonstrated with simulated data from a heterogeneous semi-Markov brand switching model which is defined as follows:

  1. 1.

    The number of transactions made by an individual i follows a Poisson process with the transaction rate λ i .

  2. 2.

    Transaction rate λ i follows a Gamma distribution with the probability density function

    $$f(\lambda_{i} \mid \gamma, \delta) = \frac{\delta^{\gamma}}{\Gamma(\gamma)} \lambda_{i}^{\gamma-1} e^{-\delta \lambda_{i}}, \quad \lambda_{i}>0,$$

    where γ > 0 is the shape parameter, δ > 0 is the rate parameter and Γ stands for the Gamma function.

  3. 3.

    After any transaction, an individual may change the brand with a probability that depends on the current brand. For the focal company, the probability of repurchase for individual i is p i . The probability of acquisition, i.e. change from any competitor to the focal company, is 1−q i for individual i.

  4. 4.

    Repurchase probability p i follows Beta(α p ,β p ) distribution and competitor repurchase probability q i follows Beta(α q ,β q ) distribution.

The model is related to the long modeling tradition in the marketing literature (Herniter 1971; Schmittlein et al. 1985; Vilcassim and Jain 1991; Fader et al. 2005a). An individual is either in a state where she has made the last transaction with the focal company, or the individual is in a ‘competitor state’, where she has made the last transaction with one of the competitors of the focal company. The transaction rate is assumed to be the same in the both states and independent on the transition probabilities.

Full purchase histories are generated for the population from where small survey samples are drawn. The CE estimated from the sample is compared with the CE of the population. The procedure is repeated for a number of survey samples to obtain information on the sampling variation.

The survey data collected for the individuals i = 1,2,…,n are the current state \(S^{(0)}_{i}\) (1 for the focal company and 0 for the competitors), the previous state \(S^{(-1)}_{i}\), the time between the last two purchases T i and the time from the latest transaction \(T^{*}_{i}\). As the transactions follow the Poisson process, the time between the purchases T i follows exponential distribution with rate λ i . The time from the latest purchase to the day of the survey \(T^{*}_{i}\) is an observation from the same exponential distribution because the Poisson process is memoryless. For the current state it holds

$$P(S^{(0)}_{i}=1)=p_{i} S^{(-1)}_{i} + (1-q_{i}) (1-S^{(-1)}_{i}).$$
(1)

For the previous state it holds

$$P(S^{(-1)}_{i}=1)=p_{i} S^{(-2)}_{i} + (1-q_{i}) (1-S^{(-2)}_{i}),$$
(2)

where \(S^{(-2)}_{i}\) is the state before the previous. For the state \(S^{(-2)}_{i}\) there are no observations but the formula

$$P(S^{(-2)}_{i}=1)=\frac{1-q_{i}}{2-q_{i}-p_{i}}$$
(3)

follows from the equilibrium state of the Markov chain characterizing the brand switching.

1.1 A.1 Simulation setup

We first simulate the complete purchase histories of 100,000 individuals who are divided between the focal company and the competitors according to the market shares. Then, by using the proposed approach, we estimate the average CLV and CE from the small ‘survey sample’ of the simulated data, and compare the result to the true CLV and CE. The parameters used in the simulation are γ = 3, δ = 10, α p =4, β p =6, α q =4, β q =6 and the intensity is defined as the number of transactions per year. The value of a purchase assumed to be 100 euros. The purchase histories are generated for the 40 years forward and 30 years backward from the time of the survey. The true CLVs and CEs are calculated using the whole population and the generated purchase histories for the forthcoming 40 years. With the annual discounting rate of 10 % this leads to CE of 10.0 million euros for the population of the 100,000 individuals and an average CLV of 100 euros. For the current customers of the focal company, the average CLV equals 120 euros and for the current customers of competitors, the average CLV equals 91 euros. These numbers are compared to the estimates from a small survey samples from the same population. For the each individual selected to the sample, only the variables \(S^{(0)}_ i\), \(S^{(-1)}_{i}\), T i and \(T^{*}_{i}\) are recorded at the time of the survey, which means the amount of the data from the sample is exiguous compared to the full future purchase histories of 100,000 individuals. To illustrate the effect of the sample size to the accuracy of the estimates, the sample sizes are varied from 100 to 1000.

The model and the chosen prior distributions can be written as follows:

$$\begin{array}{ll} & T_{i},T_{i}^{*} \sim \text{Exp}(\lambda_{i}), \\ & \lambda_{i} \sim \text{Gamma}(\gamma,\delta), \\ & \gamma = m_{\lambda}^{2}/v_{\lambda}, \quad \delta = m_{\lambda}/v_{\lambda}, \\ & m_{\lambda},v_{\lambda} \sim \text{Gamma}(2,1),\\ & S^{(-2)}_{i} \sim \text{Bernoulli}\left( \frac{1-q_{i}}{2-q_{i}-p_{i}} \right), \\ & S^{(-1)}_{i} \sim \text{Bernoulli}\left( p_{i} S^{(-2)}_{i}+(1-q_{i})(1-S^{(-2)}_{i}) \right), \\ & S^{(0)}_{i} \sim \text{Bernoulli}\left( p_{i} S^{(-1)}_{i}+(1-q_{i})(1-S^{(-1)}_{i}) \right), \\ & p_{i} \sim \text{Beta}(\alpha_{p},\beta_{p}), \quad q_{i} \sim \text{Beta}(\alpha_{q},\beta_{q}), \\ & \alpha_{p} = k_{p} m_{p}, \quad \beta_{p} = k_{p} (1-m_{p}),\\ & \alpha_{q} = k_{q} m_{q}, \quad \alpha_{q} = k_{q} (1-m_{q}),\\ & m_{p} \sim \text{Uniform}(0,1), \quad m_{q} \sim \text{Uniform}(0,1), \\ & k_{p} \sim \text{Gamma}(10,1), \quad k_{q} \sim \text{Gamma}(10,1). \\ \end{array}$$

For parameters γ, δ, α p , β p , α q and β q we use weakly informative prior distributions (Gelman 2006). Parameters γ and δ describe the shape and scale of the Gamma distribution where the values for the intensity λ i are drawn. We define \(\gamma =m_{\lambda }^{2}/v_{\lambda }\) and δ = m λ /v λ where m λ ∼Gamma(2,1) is the mean of the intensity distribution and v λ ∼Gamma(2,1) is the variance of the intensity distribution. In other words, the expected mean of the intensity distribution is 2 years and the expected standard deviation of the intensity distribution is 1.4 years but there is a considerable uncertainty on the intensity distribution. With these priors, the 95 % Bayes interval for the purchase intervals in the population is (10−2,8×1011) indicating that the priors are rather uninformative.

Parameters α p and β p describe the Beta distribution from where the individual repurchase probabilities are drawn and parameters α q and β q describe the Beta distribution from where the individual competitor repurchase probabilities are drawn. We define α p =k p m p and β p =k p (1−m p ) where m p ∼Uniform(0,1) is the expected average repurchase probability and k p ∼Gamma(10,1) controls the variation of the repurchase probabilities in the population. Similarly we define α q =k q m q and β q =k q (1−m q ) where m q ∼Uniform(0,1) and k q ∼Gamma(10,1). These priors for the expected average repurchase probabilities are uninformative but the Gamma(10,1) for k p and k q makes sure that there is a reasonable variation of repurchase probabilities in the population. With these priors, the 95 % Bayes interval for the repurchase probability p in the population is (0.001,0.999) indicating that the prior is otherwise flat but there are peaks near 0 and 1.

1.2 A.2 BUGS code

The analysis is carried out using OpenBUGS 3.2.2 (Lunn et al. 2009), R (R Core Team 2012) and R2OpenBUGS R package (Sturtz et al. 2005). The BUGS code for the model is given as:

model {

  for(i in 1:N)

  {

    lambda[i] dgamma(gammal,deltal)

    p[i] dbeta(alphap,betap)

    q[i] dbeta(alphaq,betaq)

    tau[i] dexp(lambda[i])

    taustar[i] dexp(lambda[i])

    m0[i] <- (1-q[i])/(2-q[i]-p[i])

    S2[i] dbern(m0[i])

    S1prob[i] <- p[i]*S2[i]+(1-q[i])*(1-S2[i])

    S1[i] dbern(S1prob[i])

    S0prob[i] <- p[i]*S1[i]+(1-q[i])*(1-S1[i])

    S0[i] dbern(S0prob[i])

  }

  ml dgamma(2,1)

  vl dgamma(2,1)

  gammal <- ml*ml/(vl+0.00001)

  deltal <- ml/(vl+0.00001)

  mp dunif(0,1)

  mq dunif(0,1)

  kp dgamma(10,1)

  kq dgamma(10,1)

  alphap <- kp*mp

  betap <- kp*(1-mp)

  alphaq <- kq*mq

  betaq <- kq*(1-mq)

}

1.3 A.3 Simulation results

The simulation results are shown in Fig. 5 and in Table 6. From Fig. 5, it can be seen that the estimated posterior distributions are concentrated around the true value of the CE and the systematic bias is small or non-existing. As expected, the variance is smaller for the larger sample sizes. Sample sizes of 800 or more seem to give sufficient accuracy of estimation.

Fig. 5
figure 5

The accuracy of the Bayesian CE estimation as a function of the sample size of the survey. The circles show the mean of the estimated CE posterior distribution and the vertical lines show the posterior range from the 1st decile to the 9th decile. The horizontal line shows the true CE of the population

Table 6 Estimated CLV distributions for the customers of the focal company and the customers of a competitor in the simulation example

The CLV posterior distributions for the customers of the focal company and the customers of a competitor are presented in Table 6. It can be seen that the posteriors estimated from a sample of size 1000 are very similar to the true CLV distribution of the population.

Appendix B: BUGS code for the mobile phone data

The BUGS code for the mobile phone survey is given as

model {

  for(i in 1:Nretained)

  {

    interval[i] dgamma(kappa,lambda[i])C(it˙min[i],it˙max[i])

    log(lambda[i]) <- betaconst+betaprev[prevbrand[i]]+

              betaagegr[agegr[i]]+

              betagender[gender[i]]+

              betaincomegr[incomegr[i]]+

              betaarea[area[i]]

    repurchase[i] dbern(p[i])

    logit(p[i]) <- alphaconst+alphaprev[prevbrand[i]]+

            alphaagegr[agegr[i]]+

            alphagender[gender[i]]+

            alphaincomegr[incomegr[i]]+

            alphaarea[area[i]]

  }

  for(i in (Nretained+1):(Nretained+Nchurned))

  {

    interval[i] dgamma(kappa,lambda[i])C(it˙min[i],it˙max[i])

    log(lambda[i]) <- betaconst+betaprev[prevbrand[i]+

              betaagegr[agegr[i]]+

              betagender[gender[i]]+

              betaincomegr[incomegr[i]]+

              betaarea[area[i]]

    repurchase[i] dbern(p[i])

    logit(p[i]) <- alphaconst+alphaprev[prevbrand[i]]+

            alphaagegr[agegr[i]]+

            alphagender[gender[i]]+

            alphaincomegr[incomegr[i]]+

            alphaarea[area[i]]

  for(brand in 1:Nbrands)

    {

      q[i,brand] <- (1-equals(prevbrand[i],brand))

        *qq[brand,agegr[i]]

        (sum(qq[,agegr[i]])-qq[prevbrand[i],agegr[i]])

    }

    aquisition[i] dbern(q[i,newbrand[i]])

  }

  for(i in (Nretained+Nchurned+1):N)

  {

    interval[i] dgamma(kappa,lambda[i])C(it˙min[i],it˙max[i])

    log(lambda[i]) <- betaconst+betaprev[prevbrand[i]]+

              betaagegr[agegr[i]]+

              betagender[gender[i]]+

              betaincomegr[incomegr[i]]+

              betaarea[area[i]]

  }

  kappa dgamma(1,1)

  betaconst dnorm(0,0.001)

  alphaconst dnorm(0,0.001)

  for(h in 2:Nbrands)

  {

    betaprev[h] dnorm(0,0.001)

    alphaprev[h] dnorm(0,0.001)

  }

  betaprev[1] <- 0

  alphaprev[1] <- 0

  for(h in 2:Nagegr)

  {

    betaagegr[h] dnorm(0,0.001)

    alphaagegr[h] dnorm(0,0.001)

  }

    betaagegr[1] <- 0

    for(h in 2:Nincomegr)

  {

    betaincomegr[h] dnorm(0,0.001)

  }

  betaincomegr[1] <- 0

  for(h in 2:Narea)

  {

    betaarea[h] dnorm(0,0.001)

  }

  betaarea[1] <- 0

  betagender[2] dnorm(0,0.001)

  betagender[1] <- 0

  alphaagegr[1] <- 0

  for(h in 2:Nincomegr)

  {

    alphaincomegr[h] dnorm(0,0.001)

  }

  alphaincomegr[1] <- 0

  for(h in 2:Narea)

  {

    alphaarea[h] dnorm(0,0.001)

  }

  alphaarea[1] <- 0

  alphagender[2] dnorm(0,0.001)

  alphagender[1] <- 0

  for(h1 in 1:Nbrands)

  {

    for(h2 in 1:Nagegr)

    {

        qq[h1,h2] dbeta(2,2)

    }

  }

}

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karvanen, J., Rantanen, A. & Luoma, L. Survey data and Bayesian analysis: a cost-efficient way to estimate customer equity. Quant Mark Econ 12, 305–329 (2014). https://doi.org/10.1007/s11129-014-9148-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11129-014-9148-4

Keywords

JEL Classifications

Navigation