1 Introduction

Customer lifetime value (CLV) has become an important metric in marketing and specifically in customer relationship management (Rust et al. 2000). The increase in attention for CLV fits well in the emerging literature on customer behavior and customer profitability (Hogan et al. 2002). One of the key issues when firms use the CLV-metric is whether the firm can provide an adequate prediction of the CLV of each customer in their database (Malthouse and Blattberg 2005; Venkatesan and Kumar 2004). Based on these predictions, firms can decide on their investments in (segments of) customers (Zeithaml et al. 2001). Also, a summation of CLV predictions of all customers results in a valuation of the total customer base, which can be important for firm valuation purposes (Gupta et al. 2004).

A customer’s CLV is typically assessed using customer behavior data from the company’s database to predict customer behavior and profitability. The traditional direct marketing literature (Berger and Nasr 1998; Jain and Singh 2002) has proposed rather simple models, mainly using aggregated data (e.g., aggregate retention rates) to predict CLV. In the general marketing literature, researchers have proposed behavior-based models (e.g. logit-models, multivariate Probit models) to predict customer behavior, including cross-buying and retention (Kamakura et al. 1991, 2003, 2004; Knott et al. 2002). Finally, there are several studies using duration models to quantify purchase incidence (Jain and Vilcassim 1991), which in combination with a purchase quantity model, enable assessment of CLV at the individual level (Fader and Hardie 2001; Schmittlein et al. 1987).

So far, there are no studies that provide an overview and an empirical comparison of a large set of models—varying in realism and complexity—to predict CLV (Kamakura et al. 2005). Only Malthouse and Blattberg (2005) provide a comparison of two different models for CLV prediction. At the same time, research on modeling CLV is one of the MSI research priorities (MSI 2004). In this study we consider the prediction of CLV in multi-service industries, in particular in the insurance industry, where purchase behavior can be especially complex. In multi-service industries, customer behavior is multi-dimensional: not only customer retention, but also cross-buying and service usage, are important components of CLV (Bolton et al. 2004). Accounting for these behaviors at the individual level results in more complex, but also more realistic models. The question is whether increasing model complexity also yields a better predictive performance. If not, managers can feel comfortable relying on rather simple models.

In our modeling approach, we consider both relationship-level and service-level models. The relationship-level models focus on customer retention and profits aggregated across services. The service-level models take a disaggregated perspective, and can also account for cross-buying of services. Within the class of service-level models, we consider models that assume independent purchase decisions, as well as models with unobserved heterogeneity that might result in dependencies across the decisions for the various services (Kamakura et al. 1991; Li et al. 2005; Manchanda et al. 1999). The dependencies are incorporated through a latent factor structure (Kamakura et al. 2003) allowing a parsimonious specification of this complex phenomenon.

Our models in fact do not pertain to CLV prediction but to the modeling and prediction of next period’s customer behavior. Subsequently, CLV is derived from the predicted behavior. For the simplest models, prediction of CLV is straightforward. For the service-level models with explanatory variables, we use a Markov model to predict more than a single period ahead (Pfeifer and Carraway 2000; Rust et al. 2004). CLV predictions by means of Markov models have not received much attention in the CLV literature. Hence, we provide additional details on this approach and the associated complications.

After estimating the models and computing the corresponding CLV predictions, we compare the predictive performance of these models using a longitudinal dataset of customer behavior for an insurance company. Comparisons of the predictive performance are made in three domains: (1) predicting the level of individual CLV, (2) predicting the ordering of customers based on CLV, and (3) valuing the total customer base. These different types of predictions relate to different tasks marketing managers have to perform, including budgeting, segmenting, and firm valuation. To better understand the results, we also compare the predictive performance of these models after 1 year and after 4 years.

One of the most prominent findings of our comparison is that simple models perform relatively well. These simple models, however, need to account for both customer retention and cross-buying. The models that only account for customer retention fail to capture an important component of a customer’s value and consequently predict poorly. Although these findings are industry specific, they might resonate beyond the insurance industry due to the nascent nature of CLV research.

2 Modeling CLV

The basic formula for calculating CLV for customer i at time t for a finite time horizon (T) (Berger and Nasr 1998) is

$${\text{CLV}}_{{i,t}} = {\sum\limits_{\tau = 0}^T {\frac{{{\text{Profit}}_{{i{\text{,}}t + \tau }} }}{{{\left( {1 + d} \right)}^{\tau } }}} },$$
(1)

where d is a pre-determined discount rate. In multi-service industries, Profit i,t can be defined as

$$\text{Profit}_{i,t} = \sum\limits_{j = 1}^J {\text{Serv}_{ij\text{,}t} \text{*Usage}_{ij\text{,}t} \text{*Margin}_{j\text{,}t} } $$
(2)

Here J is the number of different services sold, Serv ij,t is a dummy indicating whether customer i purchases service j at time t, Usage ij,t is the amount of that service purchased, and Margin j,t is the average profit margin for service j. In line with the business practice of the company in our application, we use margins that are not customer specific. According to the company, obtaining accurate estimates of individual-level profit margins is not possible for the insurances they sell as this requires an accurate assessment of the individual-level risks that are insured.

In our application we consider models at two levels: (1) relationship-level models and (2) service-level models. The relationship-level models focus on relationship length and total profits, building directly on Eq. 1. The service-level models disaggregate a customer’s profit into the contribution per service. CLV predictions are then obtained by predicting purchase behavior at the service level and combining Eqs. 1 and 2 to forecast CLV. When analyzing service level purchases, the dependent variable will be a zero–one variable indicating whether the service was purchased in a given year. This is in line with the annual contracts used in the Dutch insurance market. More important, these contracts cannot be terminated before the end of the contract year, so purchasing an insurance policy means that the customer pays premium for the full 12 months. In other settings, usage might also take on values below one as contracts can be ended during the year. In such instances, this obviously should be modeled appropriately.

2.1 Overview of models for customer behavior

We consider both simple and more complex customer behavior models. At least two views can be taken on the decision process underlying the purchase behavior for services. The first view assumes that customers evaluate their particular service contract(s) periodically, resulting in a discrete choice model for purchases in that period. The second view stems from critical-incident thinking, which is prevalent in the service literature (Gremler 2004; Keaveney 1995). It is assumed that customers re-evaluate their service contract(s) only after some critical incident takes place, for example, the insurance company denies a claim. The corresponding model is a duration model (Fader and Hardie 2001; Reinartz and Kumar 2000), which models the time until a critical incident occurs that results in a non-renewal of service(s). Note that critical incidents typically refer to events that are under control of the company, e.g. inappropriate service deliveries. In our analysis of customer retention, we will model the arrival of relationship terminating events, which could be critical incidents in the service process but also an offer from a competitor that is accepted by a customer.

In this section we introduce the customer-behavior models, classified into relationship-level and service-level models. Table 1 provides an overview of the models and their mathematical formulation. Technical details on model specification and estimation are deferred to an appendix. The section ends with a discussion of the common set of explanatory variables we include.

Table 1 Overview of rival models

2.1.1 Relationship-level models

The relationship-level models do not explicitly model buying behavior of individual services, but rather consider retention and total profits aggregated across services. We consider six types of CLV models in the class of relationship-level models: a status quo model (1.1), which assumes that profits simply remain constant over time; a regression model (1.2), which aims at predicting a customer’s annual profit contribution (Malthouse and Blattberg 2005); customer retention models (1.3) using either aggregate or segment-level retention rates, where we consider segmentation along the familiar Recency, Frequency and Monetary value dimensions (Berger and Nasr 1998; Rossi et al. 1996); models for customer-specific retention probabilities, either obtained through a Probit model (1.4) (Bolton et al. 2000), or through a bagging approach (1.5) (Lemmens and Croux 2006); duration models for a customer’s relationship duration (1.6) and finally a Tobit II model (1.7).

For all models that focus on customer retention or relationship duration only, the assumption is that profits, conditional on retention, remain unchanged. The expected profit in period t   + 1 can thus be defined as

$$ {\text{E}}_{t} {\left\{ {{\text{Profit}}_{{i,t + 1}} } \right\}} = P_{t} {\left( {{\textit{R}}{\textit{et}}_{{i,t + 1}} } \right)}{\text{Profit}}_{{i,t}} $$
(3)

The customer relationship duration model (1.6) requires a bit more exposition given the traditional use of duration models in non-contractual rather than contractual settings. In the CLV literature there is a large stream of research based on the Pareto/NBD framework. As Schmittlein et al. (1987) already indicated (p.16) “the Pareto/NBD is inappropriate for examining renewable service contracts,..., since the opportunities for transactions occur at regular, observed intervals.” Moreover, in our application purchase quantity is always one, rendering the Pareto/NBD framework inappropriate for our application. As said, however, we do use a duration model, inspired by the frequent application of the Pareto/NBD framework in the CLV literature.

The major difference between a contractual setting and the typical application for the Pareto/NBD framework is that the end of a relationship is observed in a contractual setting and need not be inferred indirectly from customer behavior. Depending on the type of service contracts, the contract can be cancelled either at the end of the contractual period or at any moment in time. In the latter case, the time until the end of the relationship can be modeled and estimated straightforwardly with a hazard rate model as is done in the Pareto/NBD framework. In our application we have renewable annual service contracts that cannot be ended before the end of the year. As a result, we can only observe whether an event that is detrimental to the relationship has occurred within a given year, not the exact timing of that event within that period. Details on how the estimation procedure for duration models is adjusted to account for this can be found in Appendix A, but see also Meyer (1990).

The rate at which relationship-terminating events occur—the hazard rate—is likely to depend on covariates such as types of services an individual purchases. Therefore, we use a proportional hazard model (Cox 1972; Fader et al. 2004) to include explanatory variables in the hazard rate. The major advantage of the duration model approach is that it permits modeling of the phenomenon that customers with a long relationship are more loyal (Bolton 1998). As a result, not only the explanatory variables in the proportional hazard model, but also the customer’s relationship duration are central ingredients of the predicted retention probabilities.

For the shape of the hazard rate we consider two alternatives. As a flexible parametric specification for the baseline hazard we use the quadratic Box–Cox formulation proposed by Helsen and Schmittlein (1993). We also consider a fully non-parametric baseline hazard rateFootnote 1 as is proposed by Prentice and Gloeckler (1978), see also Meyer (1990).

Also the Tobit II model has not been used very often, but see Hansotia and Wang (1997) for an exception. The potential strength of this model is that profits are only considered in case of retention, separating out the effect of customer defection on profitability.

2.1.2 Service-level models

In this section, we introduce the models that concern service-level purchase behavior. Such a disaggregated approach might be preferable to relationship-level approaches. Since margins per service vary across the services and not across customers, the only variable of interest at the service level is whether the service is purchased or not. We consider two types of models: binary Probit models and duration models.Footnote 2 Although not obvious at first sight, there is a major difference between the choice model approach and the duration model approach to service purchase behavior. The choice model approach has as dependent variable the decision to purchase a service or not. The duration models focus on the duration of an existing relationship, so the decision to continue purchasing a service is modeled.Footnote 3 So, in contrast with the choice models, the duration models only model the ending of the period in which a service was purchased, not the starting of a new one.

For both choice and duration models, we consider independent and dependent service-level equations. This leads to a model with J independent Probit models (Model 2.1), a model with J independent duration models (Model 2.2), a multivariate Probit model (Model 2.3), and a multivariate duration model (Model 2.4). For dependent equations, we can incorporate unobserved heterogeneity, affecting the purchase behavior for multiple services. Simultaneous modeling of the J purchase decisions permits estimation of customer-level heterogeneity, because there are multiple observations per customer (cf. Li et al. 2005). Both firm and customer behaviors can drive the dependencies in purchase decisions across services. For instance, firms offer package deals, making it attractive for customers to buy a certain combination of services. From the customer perspective interdependencies might arise from customer satisfaction, perceived switching costs, etc.

We take a general approach to incorporate dependencies across equations building on the ideas in Kamakura et al. (2003). More specifically, the dependencies across the decisions on the J services result from an unobserved error component ɛ i,t (of dimension J) that is added to the systematic component, xβ, for both the duration models and the Probit models. This unobserved component is characterized by a factor structure: (ɛ i1,t ,...,ɛ iJ,t )T = Λ f i,t . Here, f i,t denotes the vector of unobserved factor scores and Λ the matrix with factor loadings, which together allow for dependencies between the purchase decisions. This results in a parsimonious representation of all kinds of possible dependencies. More details are provided in Appendix A.

We only allow for unobserved heterogeneity in the service-level models, as one needs very strong and untestable assumptions to identify unobserved heterogeneity from a single observation per observational unit (customer in our case). A solution at first sight would be to use 2 years of data for model estimation instead of only one. However, this does not solve the identification problem for the models that focus on customer retention, or equivalently customer relationship length. With these models, one cannot distinguish between unobserved heterogeneity and duration dependence.Footnote 4 Intuitively this is clear from the fact that only a single observation on the relationship duration is available; adding a single year only increases the informational content of this single observation. For the models that include the prediction of profits, using an additional year of data does result in identification of unobserved heterogeneity. This is, however, unlikely to be worthwhile as almost all variation is captured by the level of past profits and adding unobserved heterogeneity is unlikely to improve the predictive performance.

2.1.3 Predictors of customer behavior

The models discussed above all aim at predicting (part of) customer behavior. In line with the traditional CLV-literature (e.g. Berger and Nasr 1998), we only include past behavioral data available from the customer database to predict customer behavior. Recency, frequency, and monetary value of a customer have proven to be powerful predictors (Rossi et al. 1996). We include dummies for ownership of each insurance type in the previous year, which represent purchase frequency, but also profit (monetary value), as profit is the sum of the purchase dummies for each type of insurance times the insurance specific margins. For recency we include two dummy variables; one dummy indicating whether the customer purchased a new service the last year (purchase recency) and one dummy indicating whether the customer cancelled a service last year (cancellation recency). We also include relationship age as predictor of customer behavior. In the regression-type models we include two dummies for the first and second year of the relationship. The duration models include relationship length only through the hazard rate. Finally, loyalty program membership is included as a predictor in the models (Bolton et al. 2000).

2.2 Predicting CLV

For the relationship-level models, it is intuitively clear how to extend profit predictions further into the future. For example, for the retention models the likelihood of a customer being retained 2 years in a row is the retention probability squared. Formally, this intuition mimics a first order Markov chain for the prediction of profitability at time t  + 2, given the predictions for t   + 1 (Pfeifer and Carraway 2000; Rust et al. 2004). To predict more than one period ahead with the relationship-level models, however, we need to assume that the explanatory variables in the relationship remain constant. Alternatively, one could model the dynamics of the explanatory variables, as is done in the service-level models.

For the service-level models, we have discussed how to predict customer behavior, but still need to establish the link between customer behavior and profits. As the marginal probabilities of purchasing an insurance follow directly from the model, predicting profits in the first year is straightforward. For subsequent years, we can use the information on the predicted portfolio of services being purchased to update the explanatory variables, potentially improving the predicted purchase probabilities for all services. In doing so, we do not only need the marginal probabilities of purchasing a service but also the probabilities of owning a given portfolio of services, which we denote by y i,t  = (y i1,t ,...,y iJ,t )T. Let Y denote the set of all possible portfolios, then the corresponding expected profits of a customer are given by

$$\text{E}_t \left\{ {\text{Profit}_{i,t + 1} } \right\} = \sum\limits_{y \in Y} {P_t \left\{ {\text{portfolio}_{i,t + 1} = y} \right\}\text{*Margin portfolio y}} $$
(4)

The insurance-level models are able to generate a complete picture of the customers’ behavior over time. These models thus enable the prediction of all explanatory variables, including insurance ownership and the recency variables. In this setting with the changes in the explanatory variables explicitly modeled, a Markov chain can also be used. This, however, requires using an extensive state space that contains all the variables in the model.

We now discuss in more detail how the extensive Markov chain for the insurance-level models can be constructed. In total, there are J services, so there exist 2J possible portfolios of services purchased. Adding two recency dummies, a dummy for membership of the loyalty program, and relationship age truncated at age L, the total number of states equals 2J+3 * L, which is the number of portfolios times the four possible combinations for the recency dummies times the two levels of the loyalty program times the number of possible relationship age levels. For other applications, other state spaces have to be constructed, but the methodology extends the work by Pfeifer and Carraway (2000) and is generally applicable. Note that we do not consider the Markov chain methodology a separate CLV model. In our view, Markov chains provide a tool to compute CLV for models that describe behavior only for one period ahead, such as the retention models and the insurance-level models. However, the Markov chain methodology plays a crucial role in CLV prediction, warranting further discussion.

We illustrate the working of the Markov chain with a simplified setting. Consider the situation with ownership of only two service types, relationship duration and the two recency dummies as the variables describing the state space. Considering only relationships of 1 period or 2 periods, the number of states is 22 possible ownership combinations times 2 relationship durations times 22 possible combinations for the recency dummies, a total of 32 possible states.

The number of states that can be reached from a certain state, however, is, in general, much smaller. For example, one cannot go from a state with relationship duration j to one with relationship duration j − 1 or j + 2. Many other restrictions are more subtle and depend, for example, on the value of the recency dummies. As a simplified example, consider the states a customer can reach in period 2, when in period 1 s/he purchases service 1 and not service 2. Note that the values of the recency dummies do not matter here, as they do not affect whether a state can be reached. Table 2 presents this current state and the 16 states that are possible in period 2, as states that have t = 1 are not reachable. As is clear from this table, only four states can be reached. The actions the customer has to take to reach each of these four possible states are:

  1. State 2:

    Customer starts purchasing service 2, so recency buy equals 1. He also continues buying service 1.

  2. State 8:

    The customer purchases the same services. Recency dummies all equal zero.

  3. State 9:

    The customer stopped purchasing service 1, so recency quit equals 1, and started purchasing service 2, so recency buy also equals 1.

  4. State 15:

    The customer stopped buying services from the company. Recency quit equals 1.

Table 2 Possible transitions to future states

The state space for the purchase decisions on a larger set of services is much larger, in our application there are more than 90,000 states, but similar restrictions hold on the possible transitions from one state to another. The number of transitions that occur with positive probability is therefore large but manageable and the transition probabilities can be obtained from the econometric models that are estimated. These probabilities result from the predicted purchase probabilities for each service with only one exception: once a customer stopped purchasing services from the company, it is assumed he is lost for good, so he remains in the inactive state with probability one.

Once an estimate of the probabilities in the transition matrix is obtained, calculation of CLV over a finite or infinite period is straightforward (Pfeifer and Carraway 2000). Let the Markov chain transition matrix P, which contains the probabilities p ij with which a customer goes from state s i to s j in one period, be given by

$$\text{State at time t}\;\begin{array}{*{20}c} {} \\ {\text{S}_1 } \\ \cdot \\ \cdot \\ {\text{S}_K } \\ \end{array} \begin{array}{*{20}c} {\text{State at time t} + 1} \\ {\left( {\begin{array}{*{20}c} {\text{p}_{\text{11}} } & \cdot & \cdot & {\text{p}_{\text{1K}} } \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ {\text{p}_{\text{K1}} } & \cdot & \cdot & {\text{p}_{\text{KK}} } \\ \end{array} } \right) = P} \\ \end{array} $$
(5)

In particular, for a customer who is in state k at time t, the probability distribution for the subsequent period’s states is given by the kth row of the transition matrix P, which can be written as e k P, with e k a vector of zeroes with a 1 at the kth place. Now e k P is a vector with the state probabilities in period t + 1. These can be used to compute the state probabilities in period t + 2, which are given by e k PP = e k P 2. More general, one can show that the state probabilities in year t + τ are given by e k P τ.

Let Statemargins denote the vector stacking the annual customer profitabilities of each state. As only the services purchased are relevant for profits, the statemargins are the margins that correspond to the portfolio of services that is purchased in each state.Footnote 5 Finally, we use s i,t to denote the state vector for customer i at time t, which is a vector of zeros with a 1 at the position of the state for customer i at time t. With this notation, expected customer profitability in period t  + τ is given by

$$ {\text{E}}_{t} {\left\{ {{\text{Profit}}_{{i,t + \tau }} } \right\}} = s_{{i,t}} P^{\tau } {\textit{Statemargins}} $$
(6)

Predicted CLV levels are now easily obtained by summing up the appropriately discounted predicted future profit levels.

3 Data

Our application concerns the insurance industry. We have 6 years of behavioral data for 30,000 customers of a large Dutch direct writer. This direct writer sells many different types of insurance, some of them with very low ownership rates. We decided to include insurance types with ownership rates above 1% (see Table 3). The ownership of the other types of insurance is grouped in a category “other.” This is done because model estimation with ownership rates below 1% and stable behavior is troublesome, even on a sample of 30,000 customers (Donkers et al. 2003).

Table 3 Descriptive statistics (n = 24,055, only customers active 1/1/1998)

The time period considered in this database begins January 1, 1998 and ends January 1, 2003. All customers are active on January 1, 1998 and/or on January 1, 1999. Each year, the following information is observed: types of insurance purchased, start-date of the relationship, timing of last additional insurance purchase, timing of last cancellation of purchase of insurance, and membership of a loyalty program. Based on these data and information on the margins per insurance type, provided to us by the actuarial department of the insurance company, we can derive the following set of RFM variables: purchase recency dummy, and a cancellation recency dummy, ownership dummies for each service, customer profitability, and relationship age. In Table 3, we report descriptive statistics on the type of insurance purchased and on the other explanatory variables for the beginning of our time period and the end of the period.

Due to customer defection, one would expect purchase rates of the insurances to decrease over time for the current customer base. However, the firm is relatively successful at cross-selling, and, in fact, average purchase incidence increased for many of the insurance types we consider. For the retained customers only, purchase incidence increased for all services. This is expected, since customers who purchase more services are more likely to stay with the firm. Average profits for all customers grew to 109.7 (1998 = 100), while the average profit for retained customers in 2003 equals 129.4, an increase of 29.4%. The retention rate in the year used for model estimation is 95.7%. The retention rate in the four subsequent years is 95.7, 96.2, 97.5, and 97.8%. In developing our model we have been concerned with dependencies across the purchase decisions. The existence of such dependencies becomes clear from Table 4, where we report Pearson correlation coefficients between ownership at 1998 and ownership at 2003. There are high correlations on the diagonal, so having a certain insurance type at 1998 is highly correlated with having that insurance type in 2003. More important, there are also substantial correlations across different insurance types.

Table 4 Correlation matrix of insurance ownership in 1998 and 2003

4 Analysis

4.1 Time horizon of prediction

We use the first year of our data to estimate our models and to determine the aggregate and segment-level retention rates. These models are estimated for all customers active on January 1, 1998, where we model their behavior on January 1, 1999. The results are used to predict CLV for all customers active on January 1, 1999. In these predictions, we do not include customers defecting in the first year of data (1998–1999), while we do include customers acquired during that first year. We use the data available in the database on January 1, 1999 as predictors of future profitability. CLV is then calculated from the purchases in the subsequent 4 years (January 2000 till January 2003), with profits discounted at an annual rate of 10%. Although, theoretically, CLV predictions concern an infinite time horizon most firms use finite time horizons in practice. The time horizon of 4 years for CLV prediction is in line with recommendations of Rust et al. (2000). Note that in contrast to Eq. 1, we do not include profits for 1999 in the CLV calculation, as these profits are already observable. Including these profits would only inflate the predictive performance of all models.

4.2 Predictive performance measures

The out-of-sample predictive performance of the models is assessed with respect to three different tasks: (1) predicting the level of individual CLV, (2) predicting the ordering of customers based on CLV, and (3) valuing the total customer base. Correct predictions for the level of CLV are relevant when a company wants to target customers with a CLV above a certain level, for example, because for customers with a lower CLV the marketing action will not be profitable. More often, however, companies will be interested in targeting their most profitable customers, without being interested in the precise level of CLV. To know how well a model selects the most profitable customers, an ordering-based measure of predictive performance should be used. Given current marketing practice, the ordering-based measure will be more relevant, since CLV is more often used as a segmentation device than as a device to manage profitability of marketing activities at the individual level (Zeithaml et al. 2001). Finally, correct prediction of the total value of the customer base is of interest for firm valuation purposes (Gupta et al. 2004).

To analyze how well each model predicts the level of CLV for each individual, we use the Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE) criteria (Leeflang et al. 2000). To facilitate their interpretation, we divided both measures by the average CLV and multiplied them by 100. In this way, they can be read as MAE/RMSE as percentage of average CLV. We also measure the predictive performance using a hit-rate criterion. For this hit-rate, we categorize all customers based on their true CLV value into four equal-sized groups with increasing levels of CLV. The hit-rate is then computed as the percentage of customers whose predicted CLV falls into the same category as their actual CLV. Suppose, for example, that the 25% most profitable customers (with the highest CLV) have a CLV of more than 200. The hit rate then measures how many of these customers also have a predicted CLV of more than 200.

The predictive performance with respect to the ordering of the customers is also evaluated with a hit-rate measure, see also Malthouse and Blattberg (2005). In contrast to the hit-rate for levels, this hit-rate does not consider the level of CLV, but only the ordering of the customers with respect to CLV. For the example above, the ordering-based hit-rate measures how many customers with an actual CLV above 200—the top 25% based on actual CLV—have a predicted CLV that is in the top 25% of predicted CLVs.

To measure predictive performance at the customer base level, based on individual-level CLV predictions (Gupta et al. 2004), we calculate the percentage deviation from the true value of the total customer base.

5 Empirical results

5.1 Estimation results

The estimation results of the Probit-model for retention are reported in the first column of Table 5. All variables have the expected signs: owning a type of insurance, membership of the loyalty program, and having purchased an additional service in the last year make a customer more likely to stay. Lower retention rates are predicted for customers who cancelled an insurance policy in the last year. The second column of Table 5 presents the estimation results of the parametric hazard rate model while the final column presents the estimation results of the nonparametric hazard rate model, excluding the 39 estimates for the nonparametric hazard rate itself, corresponding to the 40 different relationship ages in our sample (and one parameter excluded for normalization). With so many parameters, the model might be somewhat over-parameterized and indeed the BIC selects the more parsimonious parametric hazard rate model. Moreover, the parametric hazard model is also preferred over the Probit model according to the BIC criterion. Note that this concerns model selection based on model fit, not based on predictive performance. The effects of the explanatory variables are very similar in the two proportional hazard models. Moreover, they are also similar to those in the choice model, except for the reversed sign and different scaling, which is as expected. This is in line with the result from Meyer (1990) that duration models with discrete time observations are equivalent to choice models.

Table 5 Estimation results Probit model for customer retention

The results from the multivariate Probit model are presented in Appendix B. All coefficients have the expected sign, at least when they are significantly different from zero. Purchasing a particular service makes a customer much more likely to repurchase that service in the next year. The loyalty program induces customers to buy more services and the recency variables seem to signal a change in the attitude of the customer. Customers who recently bought an additional service are more likely to buy even more, while the opposite holds for customers who recently cancelled an insurance policy. Also customers that are relatively new to the company are more likely to (continue to) buy services. The cross effects at the insurance level indicate that a number of insurances are grouped together. In particular, house, furniture, and liability insurances (insurance types 1, 3, and 4) are sold both individually and as a package. This is also confirmed by the estimated factor loading matrix, after varimax rotation, indicating that the purchase decisions on house (4), furniture (3) and liability (1) are highly correlated.

For the service-level duration models, most effects are also as expected. The most remarkable finding, however, is that compared to the service-level choice models fewer effects are statistically significant. This has a natural explanation, as the model uses less information than the service-level choice models. The duration model for a service is estimated only for those customers currently purchasing that service. The parameter estimates are thus based on much smaller samples than those of the choice models. Moreover, choice models determine purchase probabilities using information on the differences between the customers who do and those who don’t buy the service and on the customers that start or stop buying a service. The duration model concerns the process of continuing buying a service, so it can only use the information on customers currently purchasing that service. Possibly for the same reason, the model with a single factor is selected, based on the BIC criterion,Footnote 6 which is extremely similar to the first factor in the service-level choice models. As the two models are rather different and estimated independently, this provides clear evidence that the factor based approach advocated by Kamakura et al. (2003) is able to detect dependencies between different model equations.

5.2 Predictive performance

The predictive performance of our models is presented in Table 6. We start with a discussion of the predictive performance for individual CLV levels in the first three columns. The first remarkable finding is the large difference in performance across the three measures we consider. The second result is the poor performance of the service-level duration model, no matter what criterion is used. This poor performance could be due to the type of information that is inferred from and the limited use of the data in model estimation, discussed in the previous section.

Table 6 Predictive performance at individual and customer base level

The best model for CLV level prediction, according to MAE, is the status quo model followed by the Tobit II model; for RMSE, it is the multivariate Probit model with either dependent or independent purchase decisions. According to the hit-rate criterion, the Tobit II model is ranked high, as well as the models accounting for retention, including the recently advanced bagging technique. That the status quo model does a particularly good job when the MAE is used is due to the relatively large number of customers that indeed do not change their purchase behavior over time. For these customers, the prediction error is zero. In contrast to this, the model makes large prediction errors for customers that defect or cross-buy. Most other models will not perfectly predict CLV for the customers that do not change their portfolio, but, in general, smaller errors are made for the customers that defect. Compared to MAE, RMSE punishes large mistakes much more. Our conclusion is that, in comparison with the status quo model and the retention-based CLV models, the multivariate Probit model makes more mistakes, though fewer big mistakes. Given the large differences across the measures, it is not clear what type of model should be favored, with the exception of the service-level duration models, because these models perform poorly on all criteria. Which model performs best depends on what type of prediction error is deemed important by the company.Footnote 7

None of the models predicts customer-level CLV perfectly, as could be expected, but are the errors made by the models acceptable? Given the discrete nature of the purchase process for insurances—one either does or does not buy a particular type of insurance—large prediction errors should be expected. In comparison with predicting the overall mean, all models reduce MAE by almost 60% and RMSE by more than 40%. The models are thus clearly able to use customer specific information to improve customer-level CLV predictions, which is further illustrated by the hit rate that exceeds the 25% benchmark of random predictions by far.

In cases where only the ordering of customers is important, the status quo model performs very well. Its performance is the same as that of the profit regression model, the model accounting for average retention, or for retention based on monetary value, and the Tobit II model. Here it is important to realize that all these orderings are identical, so the status quo model might be favored because of its simplicity. Most models are better able to predict the ordering of the customers than the actual level of CLV, with the exceptions being the models accounting for retention only and the service-level duration models.

When using CLV models to value the whole customer base, all models predict a value that is below the actual value. This even holds for the status quo model, which would usually overvalue, as it does not account for customer defection. These predictions below the actual value of the customer base are due to the company’s success in cross-selling activities to the existing customers. Indeed, about 12% of the company’s total CLV is due to services customers did not buy at the time the CLV predictions are made. The best performance is shown by the profit regression model that accounts for this growth in customer profitability. In contrast, the retention-based models all attribute a similar value to the customer base, which is about 12% below its actual value, which is quite substantial. This large difference indicates that ignoring the potential value embedded in cross-selling in multi-service industries can substantially harm the predictive performance. The models that take customer retention and cross-buying into account separately and then combine the two types of behavior into a single prediction of CLV have a performance that is in between the status quo model and the profit regression model—which do very well—and the models accounting for retention only. The amount of undervaluation of the customer base is 6.3% for the Tobit II model and for the multivariate Probit model with dependent purchase decisions. The Probit model with independent purchase decisions does even better with an underestimation of total CLV with only 4.9%. Here the service-level duration models perform really poorly. An explanation is that these models not only ignore the value of cross-buying, but also incorporate the indirect effects of changes in ones portfolio. Even though this is conceptually valid, it increases the consequences of ignoring cross-buying, as the effect of larger portfolios on future purchase behavior is not captured.

Thus, for the valuation of the total customer base in multi-service industries, it seems to be important to model not only retention, but also cross-buying. The good performance of the status quo model might have little generalizability. In our application the effects of customer defection and cross-buying are similar in size and therefore the net effect is small. In other markets, one of the effects is likely to dominate the other, resulting in larger prediction errors for the status quo model. A good alternative could be the profit regression model, which will pick up the net effect of cross-buying and defection. Also the models that model cross-buying and retention separately, i.e. the service-level choice models and the Tobit II model, perform rather well.

An interesting question is whether the predictive performance of the models varies over time. For example, one might expect the status quo model to predict extremely well in the short run, but worse in the long run. For the more complex models, this deterioration is expected to be smaller as these models allow for changes in behavior that are more likely to occur as time passes by. To investigate this phenomenon, we consider the predictive performance of all models for the profits earned in 2000 and in 2003, the first and last (fourth) year of our prediction period. The results are presented in Table 7. The surprising result is that the relative position for most models is the same for the periods considered, with obviously the quality of the predictions in the first year being better than that of the last year. The only model that does seem to change its relative position is the Tobit II model which performs somewhat better in the first year than in the fourth year.

Table 7 Predictive performance at individual and customer base level, First and fourth year

Summarizing our results, there is no “best” model for the prediction of individual CLV levels. For segmentation purposes, the simple status quo model performs best. The more complex models are only interesting for the valuation of the total customer base, but here the profit regression and Tobit II models are also good alternatives. Overall, the performance of the complex service-level models with or without dependence in the purchase decisions is somewhat disappointing, especially given the large efforts required to obtain parameter estimates and CLV predictions for these models.

6 Discussion

In this paper we used a large variety of models to predict CLV. The models varied substantially in their degree of complexity, ranging from a simple status quo model to complex service-level models. The main conclusions of our research are: (1) simple models perform well and (2) focusing only on customer retention is not sufficient; cross-buying needs to be accounted for.

The fact that simple models perform well will provide a comforting idea to practitioners who frequently use relatively simple models. In general, management is reluctant to adopt complex models in cases where they perform very well (Verhoef et al. 2003). Thus, for the purpose of predicting individual CLV, the adoption of complex models by practitioners should not be expected. We realize that this conclusion is based on a single case study.

The more complex models still might have added value over the simple models, as they can aid in the targeting of customers for marketing activities. Only the service-level choice models investigate the decisions to purchase each insurance type. Such a model can therefore provide more insights for targeting customers at the individual insurance level. For example, the multivariate Probit model provides the likelihood that a customer will purchase an additional service (see also Kamakura et al. 2003); information that cannot be provided by the simple models. All service-level models also provide an estimate of the probability that a customer cancels a certain type of insurance, providing information for targeted service-level retention campaigns.

The statement that simple models work well does not hold for the models that only deal with customer retention. These models systematically underestimate customer value, as they do not account for cross-buying and the corresponding growth in profits. As a result, the simple profit regression model, which can deal with growth rates in profits, outperforms these models. The model that has a good performance on all criteria is the Tobit II model, which combines a regression model for profit growth with a Probit model for customer retention, thereby capturing both dimensions of customer behavior. The multivariate Probit model also accounts for these behaviors, but concerns decisions at the service level. The reduced parsimony of the model—it has many more parameters—might have harmed its predictive performance.

The duration models have a decent performance when applied at the relationship level. The usefulness of duration models for modelling service-level behavior, however, seems to be limited. A first problem is the difficulty of modelling cross-buying behaviour with a duration model. Another drawback is that the duration models ignore the information that exists in the difference between current buyers and non-buyers, while this information is available and used by the choice models. At the relationship level, the same information is used by the two approaches, i.e. both analyze which customer from the current database is most likely to end his/her relationship. Note that these problems arise from the multiple services setting of our application, not from the specifics of the contractual setting; this only required an adaptation of the estimation procedure.

We want to stress again that our results are only based on a single application of the models in a relatively stable industry. However, given the nascent nature of CLV research, our conclusions might resonate beyond this case study. Moreover, the importance of cross-buying suggests that customer behavior is not that stable. It is, however, important that future research replicates this study in different industries, leading to empirical generalizations on this topic.

The predicted CLV in our models can be used to segment customers. Using this segmentation firms can, for example, develop different service concepts for each segment. For example, high value customers are serviced rather fast and may choose the preferred channel. Low value customers receive lower service levels, while they are directed to low cost channels, such as the Internet (see also Malthouse and Blattberg (2005)). We recommend firms only to use these segmentations, in case CLV can be predicted with reasonably accuracy. Our study provides customer intelligence managers with several methods for CLV prediction, which can be tested in their market.

A next step in the use of CLV in marketing, is the development of an individual specific marketing mix that optimizes each customer’s CLV (e.g. Rust and Verhoef 2005; Venkatesan and Kumar 2004). Current models aiming to do so, usually only consider relatively short time periods (i.e. 1 year). Developing these optimal marketing mixes for individual customers is probably rather difficult for longer time periods, as the optimal marketing mix for the next few years will be highly dependent on environmental factors, such as the competitive setting and changes in regulation. So far, researchers have not touched upon the issue. This topic goes well beyond the scope of this study. It is certainly a very interesting avenue for future research.