Survey data and Bayesian analysis: a cost-efficient way to estimate customer equity

Karvanen, Juha; Rantanen, Ari; Luoma, Lasse

doi:10.1007/s11129-014-9148-4

Survey data and Bayesian analysis: a cost-efficient way to estimate customer equity

Published: 08 July 2014

Volume 12, pages 305–329, (2014)
Cite this article

Quantitative Marketing and Economics Aims and scope Submit manuscript

Juha Karvanen¹,
Ari Rantanen² &
Lasse Luoma³

961 Accesses
6 Citations
6 Altmetric
Explore all metrics

Abstract

We present a Bayesian framework for estimating the customer lifetime value (CLV) and the customer equity (CE) based on the purchasing behavior deducible from the market surveys on customer purchasing behavior. The proposed framework systematically addresses the challenges faced when the future value of customers is estimated based on survey data. The scarcity of the survey data and the sampling variance are countered by utilizing the prior information and quantifying the uncertainty of the CE and CLV estimates by posterior distributions. Furthermore, information on the purchase behavior of the customers of competitors available in the survey data is integrated to the framework. The introduced approach is directly applicable in the domains where a customer relationship can be thought to be monogamous. As an example on the use of the framework, we analyze a consumer survey on mobile phones carried out in Finland in February 2013. The survey data contains consumer given information on the current and previous brand of the phone and the times of the last two purchases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From data acquisition to validation: a complete workflow for predicting individual customer lifetime value

Article 21 November 2022

Counterfactual inference for consumer choice across many product categories

Article 17 November 2021

Improving customer profit predictions with customer mindset metrics through multiple overimputation

Article 21 May 2019

References

Abe, M. (2009). Counting your customers one by one: A hierarchical Bayes extension to the Pareto/NBD model. Marketing Science, 28(3), 541–553.
Article Google Scholar
Allison, P.D. (1985). Survival analysis of backward recurrence times. Journal of the American Statistical Association, 80(390), 315–322.
Article Google Scholar
Bauer, H., Hammerschmidt, M., Braehler, M. (2003). Customer lifetime value concept and its contribution to corporate valuation. Yearbook of Marketing and Consumer Research, 1(1), 49–67.
Bejou, D., Keiningham, T., Aksoy, L. (Eds.) (2006). Customer lifetime value – reshaping the way we manage to maximize profit: Haworth Press.
Blattberg, R.C., Byung-Do, K., Neslin, S.A. (Eds.) (2008). Database marketing: analyzing and managing customers: Springer.
Borle, S., Singh, S.S., Jain, D.C. (2008). Customer lifetime value measurement. Management science, 54(1), 100–112.
Article Google Scholar
Brooks, S.P., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, 434–455.
MathSciNet Google Scholar
Fader, P., & Hardie, B. (2010). Customer-base valuation in a contractual setting: The perils of ignoring heterogeneity. Marketing Science, 29(1).
Fader, P., Hardie, B., Lee, K.L. (2005a). Counting Your Customers the easy way: An alternative to the Pareto/NBD model. Marketing Science, 24(2), 275–284.
Article Google Scholar
Fader, P., Hardie, B., Lee, K.L. (2005b). RFM and CLV: Using iso-value curves for customer base analysis. Journal of Marketing Research XLII(November), 415–430.
Finnish Communications Regulatory Authority (FICORA) (2013). Toimialakatsaus 2012 (In Finnish). https://www.viestintavirasto.fi/attachments/Toimialakatsaus2012.pdf.
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–533.
MathSciNet Google Scholar
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B. (2013). Bayesian data analysis, 3rd edn. Boca Raton, FL: Chapman & Hall/CRC.
Gupta, S., & Lehmann, D. (2005). Managing customers as investments: the strategic value of customers in the long run. Upper Saddle River, NJ: Wharton School Publishing.
Herniter, J. (1971). A probablistic market model of purchase timing and brand selection. Management Science, 18(4-Part-II), P–102.
Article Google Scholar
Hubbard, D.W. (2010). How to measure anything: finding the value of intangibles in business, 2nd edn. Hoboken, NJ: Wiley.
Jen, L., Chou, C.-H., Allenby, G.M. (2009). The importance of modeling temporal dependence of timing and quantity in direct marketing. Journal of Marketing Research, 46(4), 482–493.
Article Google Scholar
Kumar, V., & George, M. (2007). Measuring and maximizing customer equity: a critical analysis. Journal of the Academy of Marketing Science, 35(4), 157–171.
Article Google Scholar
Kumar, V., & Petersen, J.A. (2005). Using a customer-level marketing strategy to enhance firm’s performance. Journal of the Academy of Marketing Science, 33(4), 505–519.
Article Google Scholar
Kumar, V., Venkatesan, R., Bohling, T., Beckmann, D. (2008). The power of CLV: Managing customer lifetime value at IBM. Marketing Science, 27(4), 585–599.
Article Google Scholar
Lunn, D., Spiegelhalter, D., Thomas, A., Best, N. (2009). The BUGS project: Evolution, critique and future directions (with discussion). Statistics in Medicine, 28, 3049–3082.
Article MathSciNet PubMed Google Scholar
Nagano, S., Ichikawa, Y., Takaya, N., Uchiyama, T., Abe, M. (2013). Nonparametric hierarchal bayesian modeling in non-contractual heterogeneous survival data. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining(pp. 668–676). ACM.
Pfeifer, P. (2011). On estimating current-customer equity using company summary data. Journal of Interactive Marketing, 25(1), 1–14.
Article Google Scholar
R Core Team (2012). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
Rossi, P.E., Allenby, G.M., McCulloch, R. (2005). Bayesian statistics and marketing. Hoboken, NJ: Wiley.
Rust, R.T., Zeithaml, V.A., Lemon, K.N. (2001). Driving customer equity: how customer lifetime value is reshaping corporate strategy. New York: Simon and Schuster.
Schmittlein, D., Morrison, D., Colombo, R. (1987). Counting your customers: Who they are and what will they do next Management Science, 33(1), 1–24.
Article Google Scholar
Schmittlein, D.C., Bemmaor, A.C., Morrison, D.G. (1985). Technical note – why does the NBD model work? Robustness in representing product purchases, brand purchases and imperfectly recorded purchases. Marketing Science, 4(3), 255–266.
Article Google Scholar
Singh, S.S., Borle, S., Jain, D.C. (2009). A generalized framework for estimating customer lifetime value when customer lifetimes are not observed. Quantitative Marketing and Economics, 7(2), 181–205.
Article Google Scholar
Statistics Finland (2012). Statistical Yearbook of Finland 2012.
Sturtz, S., Ligges, U., Gelman, A. (2005). R2WinBUGS: A package for running WinBUGS from R. Journal of Statistical Software, 12(3), 1–16.
Article Google Scholar
Venkatesan, R., & Kumar, V. (2004). A customer lifetime value framework for customer selection and resource allocation strategy. Journal of Marketing, 68(10), 106–125.
Article Google Scholar
Vilcassim, N.J., & Jain, D.C. (1991). Modeling purchase-timing and brand-switching behavior incorporating explanatory variables and unobserved heterogeneity. Journal of Marketing Research, 29–41.

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä, Finland
Juha Karvanen
Sanoma Media Finland, Helsinki, Finland
Ari Rantanen
Tietoykkönen Oy, Jyväskylä, Finland
Lasse Luoma

Authors

Juha Karvanen
View author publications
You can also search for this author in PubMed Google Scholar
Ari Rantanen
View author publications
You can also search for this author in PubMed Google Scholar
Lasse Luoma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juha Karvanen.

Appendices

Appendix A: Model for the simulation example

The estimation method is demonstrated with simulated data from a heterogeneous semi-Markov brand switching model which is defined as follows:

1.
The number of transactions made by an individual i follows a Poisson process with the transaction rate λ _i.
2.
Transaction rate λ _i follows a Gamma distribution with the probability density function
$$f(\lambda_{i} \mid \gamma, \delta) = \frac{\delta^{\gamma}}{\Gamma(\gamma)} \lambda_{i}^{\gamma-1} e^{-\delta \lambda_{i}}, \quad \lambda_{i}>0,$$
where γ > 0 is the shape parameter, δ > 0 is the rate parameter and Γ stands for the Gamma function.
3.
After any transaction, an individual may change the brand with a probability that depends on the current brand. For the focal company, the probability of repurchase for individual i is p _i. The probability of acquisition, i.e. change from any competitor to the focal company, is 1−q _i for individual i.
4.
Repurchase probability p _i follows Beta(α _p,β _p) distribution and competitor repurchase probability q _i follows Beta(α _q,β _q) distribution.

The model is related to the long modeling tradition in the marketing literature (Herniter 1971; Schmittlein et al. 1985; Vilcassim and Jain 1991; Fader et al. 2005a). An individual is either in a state where she has made the last transaction with the focal company, or the individual is in a ‘competitor state’, where she has made the last transaction with one of the competitors of the focal company. The transaction rate is assumed to be the same in the both states and independent on the transition probabilities.

Full purchase histories are generated for the population from where small survey samples are drawn. The CE estimated from the sample is compared with the CE of the population. The procedure is repeated for a number of survey samples to obtain information on the sampling variation.

The survey data collected for the individuals i = 1,2,…,n are the current state $S^{(0)}_{i}$ (1 for the focal company and 0 for the competitors), the previous state $S^{(-1)}_{i}$, the time between the last two purchases T _i and the time from the latest transaction $T^{*}_{i}$. As the transactions follow the Poisson process, the time between the purchases T _i follows exponential distribution with rate λ _i. The time from the latest purchase to the day of the survey $T^{*}_{i}$ is an observation from the same exponential distribution because the Poisson process is memoryless. For the current state it holds

$$P(S^{(0)}_{i}=1)=p_{i} S^{(-1)}_{i} + (1-q_{i}) (1-S^{(-1)}_{i}).$$

(1)

For the previous state it holds

$$P(S^{(-1)}_{i}=1)=p_{i} S^{(-2)}_{i} + (1-q_{i}) (1-S^{(-2)}_{i}),$$

(2)

where $S^{(-2)}_{i}$ is the state before the previous. For the state $S^{(-2)}_{i}$ there are no observations but the formula

$$P(S^{(-2)}_{i}=1)=\frac{1-q_{i}}{2-q_{i}-p_{i}}$$

(3)

follows from the equilibrium state of the Markov chain characterizing the brand switching.

1.1 A.1 Simulation setup

We first simulate the complete purchase histories of 100,000 individuals who are divided between the focal company and the competitors according to the market shares. Then, by using the proposed approach, we estimate the average CLV and CE from the small ‘survey sample’ of the simulated data, and compare the result to the true CLV and CE. The parameters used in the simulation are γ = 3, δ = 10, α _p=4, β _p=6, α _q=4, β _q=6 and the intensity is defined as the number of transactions per year. The value of a purchase assumed to be 100 euros. The purchase histories are generated for the 40 years forward and 30 years backward from the time of the survey. The true CLVs and CEs are calculated using the whole population and the generated purchase histories for the forthcoming 40 years. With the annual discounting rate of 10 % this leads to CE of 10.0 million euros for the population of the 100,000 individuals and an average CLV of 100 euros. For the current customers of the focal company, the average CLV equals 120 euros and for the current customers of competitors, the average CLV equals 91 euros. These numbers are compared to the estimates from a small survey samples from the same population. For the each individual selected to the sample, only the variables $S^{(0)}_ i$, $S^{(-1)}_{i}$, T _i and $T^{*}_{i}$ are recorded at the time of the survey, which means the amount of the data from the sample is exiguous compared to the full future purchase histories of 100,000 individuals. To illustrate the effect of the sample size to the accuracy of the estimates, the sample sizes are varied from 100 to 1000.

The model and the chosen prior distributions can be written as follows:

$$\begin{array}{ll} & T_{i},T_{i}^{*} \sim \text{Exp}(\lambda_{i}), \\ & \lambda_{i} \sim \text{Gamma}(\gamma,\delta), \\ & \gamma = m_{\lambda}^{2}/v_{\lambda}, \quad \delta = m_{\lambda}/v_{\lambda}, \\ & m_{\lambda},v_{\lambda} \sim \text{Gamma}(2,1),\\ & S^{(-2)}_{i} \sim \text{Bernoulli}\left( \frac{1-q_{i}}{2-q_{i}-p_{i}} \right), \\ & S^{(-1)}_{i} \sim \text{Bernoulli}\left( p_{i} S^{(-2)}_{i}+(1-q_{i})(1-S^{(-2)}_{i}) \right), \\ & S^{(0)}_{i} \sim \text{Bernoulli}\left( p_{i} S^{(-1)}_{i}+(1-q_{i})(1-S^{(-1)}_{i}) \right), \\ & p_{i} \sim \text{Beta}(\alpha_{p},\beta_{p}), \quad q_{i} \sim \text{Beta}(\alpha_{q},\beta_{q}), \\ & \alpha_{p} = k_{p} m_{p}, \quad \beta_{p} = k_{p} (1-m_{p}),\\ & \alpha_{q} = k_{q} m_{q}, \quad \alpha_{q} = k_{q} (1-m_{q}),\\ & m_{p} \sim \text{Uniform}(0,1), \quad m_{q} \sim \text{Uniform}(0,1), \\ & k_{p} \sim \text{Gamma}(10,1), \quad k_{q} \sim \text{Gamma}(10,1). \\ \end{array}$$

For parameters γ, δ, α _p, β _p, α _q and β _q we use weakly informative prior distributions (Gelman 2006). Parameters γ and δ describe the shape and scale of the Gamma distribution where the values for the intensity λ _i are drawn. We define $\gamma =m_{\lambda }^{2}/v_{\lambda }$ and δ = m _λ/v _λ where m _λ∼Gamma(2,1) is the mean of the intensity distribution and v _λ∼Gamma(2,1) is the variance of the intensity distribution. In other words, the expected mean of the intensity distribution is 2 years and the expected standard deviation of the intensity distribution is 1.4 years but there is a considerable uncertainty on the intensity distribution. With these priors, the 95 % Bayes interval for the purchase intervals in the population is (10⁻²,8×10¹¹) indicating that the priors are rather uninformative.

Parameters α _p and β _p describe the Beta distribution from where the individual repurchase probabilities are drawn and parameters α _q and β _q describe the Beta distribution from where the individual competitor repurchase probabilities are drawn. We define α _p=k _p m _p and β _p=k _p(1−m _p) where m _p∼Uniform(0,1) is the expected average repurchase probability and k _p∼Gamma(10,1) controls the variation of the repurchase probabilities in the population. Similarly we define α _q=k _q m _q and β _q=k _q(1−m _q) where m _q∼Uniform(0,1) and k _q∼Gamma(10,1). These priors for the expected average repurchase probabilities are uninformative but the Gamma(10,1) for k _p and k _q makes sure that there is a reasonable variation of repurchase probabilities in the population. With these priors, the 95 % Bayes interval for the repurchase probability p in the population is (0.001,0.999) indicating that the prior is otherwise flat but there are peaks near 0 and 1.

1.2 A.2 BUGS code

The analysis is carried out using OpenBUGS 3.2.2 (Lunn et al. 2009), R (R Core Team 2012) and R2OpenBUGS R package (Sturtz et al. 2005). The BUGS code for the model is given as:

model {

for(i in 1:N)

{

lambda[i] dgamma(gammal,deltal)

p[i] dbeta(alphap,betap)

q[i] dbeta(alphaq,betaq)

tau[i] dexp(lambda[i])

taustar[i] dexp(lambda[i])

m0[i] <- (1-q[i])/(2-q[i]-p[i])

S2[i] dbern(m0[i])

S1prob[i] <- p[i]*S2[i]+(1-q[i])*(1-S2[i])

S1[i] dbern(S1prob[i])

S0prob[i] <- p[i]*S1[i]+(1-q[i])*(1-S1[i])

S0[i] dbern(S0prob[i])

}

ml dgamma(2,1)

vl dgamma(2,1)

gammal <- ml*ml/(vl+0.00001)

deltal <- ml/(vl+0.00001)

mp dunif(0,1)

mq dunif(0,1)

kp dgamma(10,1)

kq dgamma(10,1)

alphap <- kp*mp

betap <- kp*(1-mp)

alphaq <- kq*mq

betaq <- kq*(1-mq)

}

1.3 A.3 Simulation results

The simulation results are shown in Fig. 5 and in Table 6. From Fig. 5, it can be seen that the estimated posterior distributions are concentrated around the true value of the CE and the systematic bias is small or non-existing. As expected, the variance is smaller for the larger sample sizes. Sample sizes of 800 or more seem to give sufficient accuracy of estimation.

Table 6 Estimated CLV distributions for the customers of the focal company and the customers of a competitor in the simulation example

Full size table

The CLV posterior distributions for the customers of the focal company and the customers of a competitor are presented in Table 6. It can be seen that the posteriors estimated from a sample of size 1000 are very similar to the true CLV distribution of the population.

Appendix B: BUGS code for the mobile phone data

The BUGS code for the mobile phone survey is given as

model {

for(i in 1:Nretained)

{

interval[i] dgamma(kappa,lambda[i])C(it˙min[i],it˙max[i])

log(lambda[i]) <- betaconst+betaprev[prevbrand[i]]+

betaagegr[agegr[i]]+

betagender[gender[i]]+

betaincomegr[incomegr[i]]+

betaarea[area[i]]

repurchase[i] dbern(p[i])

logit(p[i]) <- alphaconst+alphaprev[prevbrand[i]]+

alphaagegr[agegr[i]]+

alphagender[gender[i]]+

alphaincomegr[incomegr[i]]+

alphaarea[area[i]]

}

for(i in (Nretained+1):(Nretained+Nchurned))

{

interval[i] dgamma(kappa,lambda[i])C(it˙min[i],it˙max[i])

log(lambda[i]) <- betaconst+betaprev[prevbrand[i]+

betaagegr[agegr[i]]+

betagender[gender[i]]+

betaincomegr[incomegr[i]]+

betaarea[area[i]]

repurchase[i] dbern(p[i])

logit(p[i]) <- alphaconst+alphaprev[prevbrand[i]]+

alphaagegr[agegr[i]]+

alphagender[gender[i]]+

alphaincomegr[incomegr[i]]+

alphaarea[area[i]]

for(brand in 1:Nbrands)

{

q[i,brand] <- (1-equals(prevbrand[i],brand))

*qq[brand,agegr[i]]

(sum(qq[,agegr[i]])-qq[prevbrand[i],agegr[i]])

}

aquisition[i] dbern(q[i,newbrand[i]])

}

for(i in (Nretained+Nchurned+1):N)

{

interval[i] dgamma(kappa,lambda[i])C(it˙min[i],it˙max[i])

log(lambda[i]) <- betaconst+betaprev[prevbrand[i]]+

betaagegr[agegr[i]]+

betagender[gender[i]]+

betaincomegr[incomegr[i]]+

betaarea[area[i]]

}

kappa dgamma(1,1)

betaconst dnorm(0,0.001)

alphaconst dnorm(0,0.001)

for(h in 2:Nbrands)

{

betaprev[h] dnorm(0,0.001)

alphaprev[h] dnorm(0,0.001)

}

betaprev[1] <- 0

alphaprev[1] <- 0

for(h in 2:Nagegr)

{

betaagegr[h] dnorm(0,0.001)

alphaagegr[h] dnorm(0,0.001)

}

betaagegr[1] <- 0

for(h in 2:Nincomegr)

{

betaincomegr[h] dnorm(0,0.001)

}

betaincomegr[1] <- 0

for(h in 2:Narea)

{

betaarea[h] dnorm(0,0.001)

}

betaarea[1] <- 0

betagender[2] dnorm(0,0.001)

betagender[1] <- 0

alphaagegr[1] <- 0

for(h in 2:Nincomegr)

{

alphaincomegr[h] dnorm(0,0.001)

}

alphaincomegr[1] <- 0

for(h in 2:Narea)

{

alphaarea[h] dnorm(0,0.001)

}

alphaarea[1] <- 0

alphagender[2] dnorm(0,0.001)

alphagender[1] <- 0

for(h1 in 1:Nbrands)

{

for(h2 in 1:Nagegr)

{

qq[h1,h2] dbeta(2,2)

}

}

}

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karvanen, J., Rantanen, A. & Luoma, L. Survey data and Bayesian analysis: a cost-efficient way to estimate customer equity. Quant Mark Econ 12, 305–329 (2014). https://doi.org/10.1007/s11129-014-9148-4

Download citation

Received: 27 May 2013
Accepted: 24 June 2014
Published: 08 July 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11129-014-9148-4

Keywords

JEL Classifications

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey data and Bayesian analysis: a cost-efficient way to estimate customer equity

Abstract

Access this article

Similar content being viewed by others

From data acquisition to validation: a complete workflow for predicting individual customer lifetime value

Counterfactual inference for consumer choice across many product categories

Improving customer profit predictions with customer mindset metrics through multiple overimputation

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Model for the simulation example

1.1 A.1 Simulation setup

1.2 A.2 BUGS code

1.3 A.3 Simulation results

Appendix B: BUGS code for the mobile phone data

Rights and permissions

About this article

Cite this article

Keywords

JEL Classifications

Navigation

Survey data and Bayesian analysis: a cost-efficient way to estimate customer equity

Abstract

Access this article

Similar content being viewed by others

From data acquisition to validation: a complete workflow for predicting individual customer lifetime value

Counterfactual inference for consumer choice across many product categories

Improving customer profit predictions with customer mindset metrics through multiple overimputation

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Model for the simulation example

1.1 A.1 Simulation setup

1.2 A.2 BUGS code

1.3 A.3 Simulation results

Appendix B: BUGS code for the mobile phone data

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classifications

Search

Navigation