1 Introduction

It might seem obvious that the probability to purchase a product or brand depends on previous purchases. Appropriate econometric approaches start from a static model that they enlarge by dynamic variables that reflect the purchase history of a household (Meyer et al. 2017). The best-known example of such a dynamic variable is the exponentially smoothed measure introduced by Guadagni and Little Guadagni and Little (1983), which these authors call brand loyalty, added to a static multinomial logit brand choice model. Most brand choice models include this dynamic variable (Chiang 1991; Chintagunta 1993).

The situation for multicategory choice models, of which multivariate logit (MVL) and multivariate probit (MVP) models are the dominant functional forms, turns out to be completely different. MVP models as a rule do not include any dynamic variable at all. Several MVL models consider one dynamic purchase variable, log-transformed time since the last purchase of a category. On the other hand, these MVL models exclude exponentially smoothed category loyalties.

In contrast to the previous literature, we add exponentially smoothed category purchases, which we simply call category loyalties in the following, to the predictors. We investigate how category loyalties improve statistical performance compared to log-transformed time since the previous purchase. Our results clearly show that category loyalties improve performance more than log-transformed times. Nonetheless, keeping log-transformed times as predictors in addition to category loyalties leads to further improvements.

Homogeneous models may overestimate the effects of dynamic variables because they ignore that households may have different category preferences that are unrelated to previous purchases (Keane 1997). Therefore we also investigate finite mixture extensions of the MVL model (FM-MVL), which by allowing for heterogeneous preferences avoid this weakness.

We do not apply the MVL model with continuous heterogeneity because of its higher computational complexity, which explains why publications using this type of MVL model do not consider more than six categories (Gentzkow 2007; Kwak et al. 2015; Richards et al. 2018). In addition, related econometric models (multinomial logit and Tobit) with finite heterogeneity have been shown to outperform their continuous counterparts (Andrews et al. 2002; Ansari and Mela 2003; Schröder and Hruschka 2017).

Multicategory choice models include independent variables, e.g., marketing variables. Our research focuses on measuring the effects of marketing variables. We do not consider machine learning algorithms such as associations rules (Agrawal and Srikant 1994; Hahsler et al. 2006) or topic models (Hruschka 2014), because they usually exclude independent variables.

Based on their global estimation performance, we compare two FM-MVL models in more detail. The basic FM-MVL model includes marketing variables as independent variables. Independent variables of the enlarged FM-MVL model consist of both marketing variables and dynamic variables. We investigate whether these two models differ with respect to coefficients and cross-category dependences. We also show that managerial implications for these two models differ. To this end, we measure the effect of marketing variables on purchase probabilities of the same category as well as on purchase probabilities of other categories. Our results demonstrate that the model without dynamic variables tends to overestimate own effects of marketing variables in many product categories. This positive omitted variable bias provides another explanation for the well-known problem of “overpromotion” in retailing.

2 Investigated dynamic purchase variables

This section is based on a thorough search of the literature in which probabilistic models, to which multivariate probit or logit models belong, serve to analyze multicategory choices, up to and including 2022. As already mentioned in the introduction, papers applying multivariate probit models do not include any dynamic purchase variable (Chib et al. 2002; Duvvuri et al. 2007; Manchanda et al. 1999; Hruschka 2013, 2017a, b, c).

Several recent relevant publications use probabilistic models with latent variables. Topic models replace choices of several product categories by a lower number of latent variables. Hruschka investigates two topic models, latent Dirichlet allocation and the correlated topic model, which do not include independent variables (Hruschka 2014). Jacobs et al. extend latent Dirichlet allocation to consider one independent variable (time of the first order at the retailer’s website), which is constant for each household (Jacobs et al. 2016). The probabilistic model of Ruiz et al. comprises three types of latent variables that compress purchases, prices and seasonal effects of products, respectively (Ruiz et al. 2020). These authors specify latent variables without dynamic effects. To summarize, publications on probabilistic models with latent variables ignore dynamic purchase variables just like publications on the multivariate probit model.

We can distinguish two groups of papers that apply variants of the MVL model. One group ignores dynamic purchase variables (Dippold and Hruschka 2013; Kwak et al. 2015; Richards et al. 2018). Papers of the other group consider one dynamic purchase variable, log-transformed time since the last purchase of the corresponding category, as one of the independent variables (Russell and Petersen 2000; Boztuğ and Hildebrandt 2008; Boztuğ and Reutterer 2008; Solnet et al. 2016). However, papers of this group do not investigate the importance of this dynamic purchase variable relative to other independent variables, e.g., marketing variables.

To log-transformed times since the last purchase, we add exponentially smoothed category loyalties in analogy to the exponentially smoothed brand loyalties, which are widespread in brand choice models. What authors of several MVL papers call loyalty in fact measures the long run propensity of a household to make a category purchase (Russell and Petersen (2000); Boztuğ and Hildebrandt (2008); Boztuğ and Reutterer (2008); Solnet et al. (2016)). Consequently, this variable is not dynamic and does not change across purchases of the same household.

We compute the loyalty of household m for category j at shopping visit t as follows:

$$\begin{aligned} \mathrm{loy}_{jmt} = \alpha \, y_{jmt-1} + (1-\alpha ) \, \mathrm{loy}_{jmt-1} \end{aligned}$$
(1)

\(0\le \alpha \le 1\) denotes the smoothing constant. The binary purchase incidence \(y_{jmt-1}\) equals one, if household m purchases category j at the previous shopping trip \(t-1\). The current category loyalty depends on the previous purchase incidence \(y_{jmt-1}\) and the previous loyalty \(\mathrm{loy}_{jmt-1}\). In a manner similar to the brand loyalty of Guadagni and Little (1983) we set initial values \(\mathrm{loy}_{jm0}\) equal to the relative purchase frequency of the respective category j across all households and shopping visits (\(t=1\) denotes the first shopping visit). The lower smoothing constant \(\alpha\) is, the more it smooths purchases of the past. This smoothing distinguishes category loyalties from log-transformed times, which may largely fluctuate between shopping. We measure the importance of these two dynamic purchase variables relative to each other and to the other independent variables.

As higher category loyalties increase purchase probabilities, we expect positive response coefficients. The situation is less clear-cut for log-transformed times, though coefficients are positive according to the majority of studies (Russell and Petersen 2000; Boztuğ and Hildebrandt 2008; Solnet et al. 2016).

We also investigate whether the effects of marketing variables suffer from an omitted variable bias (Wooldridge 2013) due to ignoring dynamic purchasing variables. If the correlation between a dynamic variable and a marketing variable is positive and the effect of the dynamic variable on purchase probability is positive, a positive bias results, i.e., a model that excludes the dynamic variable overestimates the effect of the marketing variable.

3 Investigated model variants

In this section, we present the two investigated variants of the MVL model, the homogeneous MVL model and its finite mixture extension. J column vector \(y_{mt}\) denotes market basket t of household m and consists of binary purchase indicators (J symbolizes the number of product categories). If household m purchases category j at purchase occasion t, the respective element \(y_{jmt}\) equals one. Vector \(x_{mt}\) consists of independent variables relevant for market basket t of household m.

3.1 Homogeneous multivariate logit model

In the homogeneous MVL model, each coefficient is constant across households. Extending the expression for the homogeneous MVL model without independent variables (also known as auto-logistic model) given in Besag (1972) we define the probability of market basket \(y_{mt}\) conditional on independent variables \(x_{mt}\) as follows:

$$\begin{aligned}&\exp (y_{mt}' a + x_{mt}' b \, y_{mt} + 1/2 \, y_{tm}' V y_{mt}) /C \nonumber \\&\quad \text{ with } \quad C = \sum _{\upsilon \in \{0,1\}^J} \exp (\upsilon ' a + x_{mt}' b\, \upsilon + 1/2 \, \upsilon ' V \upsilon ) \end{aligned}$$
(2)

Expression (2) shows that computation of this probability requires division by the so-called normalization constant C that is obtained by summing over all possible market baskets defined by different binary vectors \(\upsilon\). Coefficients contained in (JJ) matrix V measure pairwise interactions between categories. As a pairwise interaction of a category with itself does not make sense, all diagonal elements of V are zero. Off-diagonal elements are symmetric, i.e., \(V_{j1,j2} = V_{j2,j1}\). Column vector a consists of J constants. The (LJ) matrix b holds the effect of L independent variables on purchase probabilities. The homogeneous MVL model has been applied to market basket data by Russell and Petersen (2000) building upon earlier publications in statistics (Cox 1972; Besag 1974).

For the homogeneous MVL model, we can write the purchase probability of category j in market basket t of household m conditional on purchases of the other categories collected in vector \(y_{-jtm}\) and the independent variables \(x_{mt}\) as:

$$\begin{aligned} P(y_{jmt}=1 \vert y_{-jmt}, x_{mt}) = \varphi \left(a_{j} + b_{.j} x_{mt} + \sum _{l \ne j } V_ {j,l} \, y_{lmt}\right) \end{aligned}$$
(3)

\(\varphi\) denotes the binomial logistic function \(1/(1+exp(-Z))\). We obtain the independent logit model that excludes interactions between categories by setting all coefficients in V equal to zero.

3.2 Finite mixture multivariate logit model

We also investigate the finite mixture extension of the MVL model (FM-MVL). Coefficients of the FM-MVL model differ between household segments. The purchase probability of category j in market basket t of household m conditional on purchases of the other categories and the independent variables \(x_{mt}\) is:

$$\begin{aligned} P(y_{jmt} &= 1 \vert y_{-jmt}, x_{mt}) = \sum _{s=1}^S u_{sm} \, P_s(y_{jmt}=1 \vert y_{-jmt}) \nonumber \\ &= \sum _{s=1}^S u_{sm} \, \varphi \left(a_{sj} + b_{s.j} x_{mt} + \sum _{l \ne j } V_ {s j,l} \, y_{lmt}\right) \end{aligned}$$
(4)

S denotes the number of segments. \(u_{sm}\) is a binary membership indicator set to one if household m is assigned to segment s. \(P_s\) is the segment-specific conditional probability function.

4 Model estimation and evaluation

We exclude the null basket for which all purchase indicators \(y_j\) equal zero in accordance with previous related publications (Russell and Petersen 2000; Boztuğ and Reutterer 2008; Kwak et al. 2015). This way we model purchases conditional on the purchase of at least one category. Therefore, the number of possible market baskets is \(2^J-1\).

Maximum likelihood estimation of the MVL model requires computation of the so-called normalization constant obtained by summing over all possible market baskets (see expression (2)) in each iteration. For 31 categories we would have to deal with more than \(2.14 \times 10^{9}\) possible market baskets. Because of the impracticality of this approach, we resort to maximum pseudo-likelihood (MPL) estimation. In a simulation study Bel et al. (2018) compare MPL to maximum likelihood estimation for a maximum number of 12 alternatives. These authors conclude that MPL estimation leads to negligible efficiency losses only.

The pseudo-probability \(\tilde{P}_{jmt}\) of category j in market basket t of household m can be written for both the homogeneous MVL model and the FM-MVL model as:

$$\begin{aligned} \tilde{P}_{jmt} = P(y_{jmt}=1 \vert y_{-jmt}, x_{mt})^{y_{jmt}} \, (1- P(y_{jmt}=1 \vert y_{-jmt}, x_{mt})) ^{1-y_{jmt}} \end{aligned}$$
(5)

Expressions (3) and (4) show how to compute the conditional probability \(P(y_{jmt}=1 \vert y_{-jmt}, x_{mt})\) for the homogeneous MVL model and the FM-MVL model, respectively. \(y_{jmt}\) denotes the binary purchase indicator, which is set to one if basket t of household m contains category j. One can see from equation (5) that its first part is relevant if category j is purchased and its second part if category j is not purchased.

MPL estimation consists in maximizing the log pseudo-likelihood LPL across households, market baskets and categories:

$$\begin{aligned} LPL = \sum _{m=1}^M \sum _{t=1}^{T_m} \sum _{j=1}^J \log (\tilde{P}_{jmt}) \end{aligned}$$
(6)

\(T_m\) symbolizes the number of market baskets of household m. Due to the binary membership indicators given in expression (4) the same segment-specific conditional probability function is used for all market baskets of any household m. Equation (6) shows that the computation of log pseudo-likelihood values requires to sum across J logarithmic conditional probabilities. Summing across product categories makes MPL estimation feasible because it replaces summing across all possible baskets that would be necessary in ML estimation. In the case of a MVL model with pairwise interactions each of the J logarithmic conditional probabilities is related to the purchase incidences of the other \(J-1\) categories.

Estimation of the homogeneous MVL model turns out to be straightforward because LPL has only one local maximum. On the contrary, for the FM-MVL model LPL may have multiple local maxima. That is why we start estimation of the FM-MVL models ten times by randomly assigning each household to one of S segments. Our estimation approach for the FM-MVL model is akin to maximizing the classification likelihood (McLachlan and Basford 1988; Ngatchou-Wandji and Bulla 2013) replacing the intractable likelihood by segment-specific pseudo-probabilities.

We evaluate models by their log pseudo-likelihood on holdout data. This way we consider the complexity of models. A model, whose complexity is too high, leads to a worse (lower) log pseudo-likelihood value for the holdout data. In contrast to information criteria such as AIC or BIC, holdout validation has the advantage to do without assumptions about the true underlying model.

We randomly form two groups with about 2/3 of the households in the first group. We use data (estimation data) of the first group to estimate models. Data of the second group (holdout data) serve to evaluate models whose coefficients we estimate on data from the first group. We also use holdout log pseudo-likelihood values to decide on the number of segments S for each of the considered FM-MVL models. We select the model with S segments if both the model with a lower number of segments \(S-1\) and the model with a higher number of segments \(S+1\) attain lower holdout log pseudo-likelihood values. For our data this procedure leads to an unambiguous determination of the number of segments.

To make comparison of model performances easier, we also compute IAPP, the increase of the average pseudo-probability of the respective model over the average pseudo-probability of the least complex model, which is homogeneous and excludes both interactions and independent variables. Using the log pseudo-likelihood of the respective model LPL, the log pseudo-likelihood of the least complex model \(LPL_0\) and the total number of purchase visits across households \(n_v\) we determine this increase as follows:

$$\begin{aligned} IAPP = \exp (LPL/n_v) / \exp (LPL_0/n_v) - 1 = \exp ((LPL-LPL_0)/n_v) - 1. \end{aligned}$$
(7)

This expression shows that we define average pseudo-probabilities as geometric means, i.e., as \(\exp (LPL/n_v)\) and \(\exp ({LPL_0}/n_v)\), respectively. IAPP can be seen as measure of relative model performance. IAPP values are positive, if the average pseudo-probability of the respective model is greater than the average pseudo-probability of the least complex model. IAPP values are zero (negative), if the average pseudo-probability of the respective model equals (is lower than) the average pseudo-probability of the least complex model.

5 Model comparisons

We compare the best performing model without dynamic variables M0 to the best performing model with dynamic variables M1. Too this end, we test whether average category constants and average coefficients differ between these two models. We also examine whether models M0 and M1 lead to different results on the dependences between categories.

Average category constants and average coefficients are determined by weighting segment-specific constants or coefficients by relative segment sizes. We determine the significance of a difference by means of the following t-statistic:

$$\begin{aligned} t\_\mathrm{stat} = \frac{ \sum _{s=1}^{S(M0)} \pi _{s(M0)} \gamma _{s(M0)} - \sum _{s=1}^{S(M1)} \pi _{s(M1)} \gamma _{s(M1)}}{\sqrt{\sum _{s=1}^{S(M1)} \pi _{s(M1)} \sigma _{s(M1)}^2 }} \end{aligned}$$
(8)

S(M0) and S(M1) denote the number of segments according to M0 and M1, respectively. \(\pi _{s(M0)}, \pi _{s(M1)}\) symbolize the relative size of segment s for M0 and M1, respectively. \(\gamma _{s(M0)}, \gamma _{s(M1)}\) is a category constant or coefficient of segment s for M0 and M1, respectively. \(\sigma _{s(M1)}\) denotes the standard error of the constant or coefficient of segment s for M1.

We measure the relation of a category j conditional on another category \(j^{'}\) by the average marginal effect with respect to the purchase pseudo-probability of category j. We classify two categories as purchase complements if the average marginal effect is positive and as purchase substitutes if the average marginal effect is negative. This definition is analogous to the one put forward by Betancourt and Gautschi (1990), who consider two products as purchase complements (purchase substitutes) if they are purchased jointly more (less) frequently than expected under stochastic independence.

The average marginal effect corresponds to the difference of the average pseudo-probability of a purchase of category j given a purchase of category \(j^{'}\) and the average pseudo-probability of a purchase of category j given a non-purchase of category \(j^{'}\) (Greene 2003). Note that we average across baskets by keeping the observed values of independent variables and the observed purchase incidences of categories other than j and \(j^{'}\).

We can write the average marginal effect for segment s of model M0 or M1 as:

$$\begin{aligned} \mathrm{ame}(j,j^{'})_{s(M0)}= & {} \tilde{P}_{s(M0)} (y_j=1 \vert y_{j^{'}}=1) - \tilde{P}_{s(M0)} (y_j=1 \vert y_{j^{'}}=0) \nonumber \\ \quad \text{ or } \nonumber \\ \mathrm{ame}(j,j^{'})_{s(M1)}= & {} \tilde{P}_{s(M1)} (y_j=1 \vert y_{j^{'}}=1) - \tilde{P}_{s(M1)} (y_j=1 \vert y_{j^{'}}=0) \end{aligned}$$
(9)

\(\tilde{P}_{s(M0)}, \tilde{P}_{s(M1)}\) denote pseudo-probabilities averaged across baskets, for segment s of model M0 and M1, respectively.

The standard error of the average marginal effect for segment s of model M1 is Greene (2003):

$$\begin{aligned} \sigma (j,j^{'})_{s(M1)}= & {} \tilde{P}_{s,M1} (y_j=1 \vert y_{j^{'}}=1) (1-\tilde{P}_{s,M1} (y_j=1 \vert y_{j^{'}}=1))\nonumber \\&\quad -\, \tilde{P}_{s,M1} (y_j=1 \vert y_{j^{'}}=0) (1- \tilde{P}_{s,M1} (y_j=1 \vert y_{j^{'}}=0)) \end{aligned}$$
(10)

Finally, we compute the t-statistic of the difference of an average marginal effect between models M0 and M1 as follows:

$$\begin{aligned} t\_\mathrm{stat} = \frac{ \sum _{s=1}^{S(M0)} \pi _{s(M0)} \mathrm{ame}(j,j^{'})_{s(M0)} - \sum _{s=1}^{S(M1)} \pi _{s(M1)} \mathrm{ame}(j,j^{'})_{s(M1)}}{\sqrt{\sum _{s=1}^{s(M1)} \pi _{s(M1)} \sigma (j,j^{'})_{s(M1)}^2 }} \end{aligned}$$
(11)

6 Derivation of managerial implications

We investigate whether the two models M0 and M1 lead to different managerial implications. We consider the decision problem of choosing the category to be promoted by, e.g., a price cut, a feature, or a display. This decision depends on the effect a promotion has on purchases of the promoted category itself, the so-called own effect, as well as on cross effects, i.e., the effects on purchases of other categories. Note that we use a broad definition of promotion that includes display and feature advertising activities besides price reductions (Gedenk et al. 2010).

In a first run of our simulation approach, we set all marketing variables for all categories to zero. This way we estimate total purchase probabilities if no category is promoted. Then we estimate total purchase probabilities by setting one of the three marketing variables in one category j to one. The differences of total probabilities for this constellation and the total probabilities for no promotion measure the effects of the respective marketing variable. Such a difference represents the own effect of the marketing variable if it refers to the same category j. If a difference refers to another category, we get a cross effect of the marketing variable. We apply this procedure both for model M0 and model M1, which in the following enables us to compute differences of both own and cross effects between the two models.

As we base our estimation approach on pseudo-probabilities, we cannot directly determine purchase probabilities and have to resort to simulation. For each segment s of models M0 and M1, we generate simulated purchases by iterated Gibbs-sampling from the conditional distribution (Besag 2004) given as:

$$\begin{aligned} \varphi \left(a_{sj} + b_{s.j} x + \sum _{l \ne j } V_ {s j,l} \, y_{l}\right) \end{aligned}$$
(12)

For model M0 we obtain segment-specific purchase probabilities by averaging simulated purchases and compute total purchase probabilities as averages of segment specific probabilities weighted by relative segment size. For model M1 the dynamic variables vary across baskets.

As computation times of Gibbs sampling for each observed market basket are prohibitively high, we cluster market baskets by K-means based on the dynamic variables. We use the averages of each cluster as values of the dynamic variables. In a first step, we obtain cluster-specific and segment-specific purchase probabilities by averaging simulated purchases. Averaging these probabilities weighted by cluster size (i.e., the number of baskets assigned to a cluster) in the next step gives segment-specific purchase probabilities. Finally, we obtain total purchase probabilities as averages of segment specific probabilities weighted by relative segment size.

7 Empirical study

7.1 Data

Our data refer to 24,047 shopping visits to one specific grocery store over a one-year period made by a random sample of 1500 households. For each shopping visit, we compose a market basket from the IRI data set Bronnenberg et al. (2008). We represent a market basket by a binary vector whose elements indicate whether a household purchases each of 31 product categories (see Table 1).

The average number of shopping visits per household amounts to 16.031, its standard deviation to 13.464. The average basket size (i.e., the number of purchased categories) is 3.852, its standard deviation 2.654.

Table 1 Product categories and abbreviations

Table 2 shows relative marginal purchase frequencies for the 31 categories, and Table 3 the highest 20 pairwise relative frequencies. Milk is the category most frequently purchased. Carbonated beverage and milk are the two categories most frequently purchased together.

Table 2 Relative marginal frequencies
Table 3 Relative pairwise frequencies

The variables household size (number of persons) and household income with three categories are constant across baskets of the same household. Average household size amounts to 1.415, its standard deviation to 0.493. Low, medium and high income have relative frequencies of 0.507, 0.332, and 0.161, respectively.

Three binary marketing variables, feature, display, and price reductions indicate whether any brand of the respective category is on feature, display, and has its price reduced, respectively. Table 4 shows average values of these marketing variables for each category. Frozen dinner is the most frequently featured category. We see that carbonated beverage has the highest number of both displays and price reductions.

Table 4 Average values of marketing and dynamic variables

We consider two dynamic variables, time since the last purchase of a category and category loyalty. Table 4 also contains the average time in days since the last purchase and the average loyalty for each category using a smoothing constant \(\alpha =0.2\), which puts more weight on the loyalty of the previous shopping visit. This value of the smoothing constant leads to the best performing MVL models with category loyalty as additional independent variable according to a grid search over \([0.1, 0.2, 0.3, \ldots , 0.9]\). Given such a value, previous purchases are strongly smoothed.

Milk attains both the lowest average time and the highest category loyalty. We obtain a negative correlation between the two dynamic variables average time and loyalty across all categories amounting to − 0.542, which indicates an unproblematic degree of collinearity.

7.2 Model evaluation results

Tables 5 and 6 contain the evaluation results for independent logit models and multivariate logit models, respectively. We do not show results for models with household attributes (household size, income) because adding these variables does not improve model performance.

By looking at both holdout log pseudo-likelihood values (LPL) and increases of the average pseudo-probability of the respective model (IAPP) over the average pseudo-probability of the least complex model (homogeneous, no interactions, no independent variables) we see that:

  • multivariate logit models that include pairwise interactions between categories are better than independent logit models no matter which independent variables (if any) are considered.

  • models with marketing variables are better than the corresponding models without independent variables;

  • features appear to be more important than price reductions, the latter appear to be more important than displays;

  • models with marketing and dynamic variables are better than models with marketing variables only;

  • category loyalties are more important than log-transformed times since the last category purchase specified as \(\log (1+time)\) like in Boztuğ and Reutterer (2008);

  • FM-MVL models perform better than their homogeneous counterparts except for models which include only features as independent variables.

The average pseudo-probability of the least complex model for the holdout data amounts to about 79% of the corresponding value for the estimation data. Consequently, more complex models as a rule have more room to improve performances in the holdout data. This fact is reflected by IAPP values of the same model, which are often higher for the holdout data compared to those for the estimation data.

Table 5 Evaluation of independent logit models
Table 6 Evaluation of multivariate logit models

The best performing model without dynamic variables, M0, is a FM-MVL model with two segments, includes interactions and considers the three marketing variables features, price reductions, and displays, as independent variable. Its IAPP value amounts to 7.44. The overall best performing model M1 includes interactions and distinguishes three segments. M1 considers the two dynamic variables loyalties and time in addition to the three marketing variables as independent variables. M1 doubles IAPP compared to M0 to a value of 15.28, which constitutes a quite impressive performance improvement.

We now discuss the average coefficients of the two dynamic variables in the best performing model M1. Averages are determined by weighting segment coefficients by relative segment sizes. For \(log(1+time)\) we obtain only three significant coefficients, which are all positive (mayonnaise 0.151, peanut butter 0.075, toilet tissue 0.05). On the other hand, coefficients of loyalties are positive and significant for all categories. We obtain the lowest coefficient for razors (0.528), the highest for household cleaners (4.891).

7.3 Model comparison results

In this section, we compare the best performing model without dynamic variables M0 to the overall best performing model M1 in more detail. We start by investigating whether average category constants, average coefficients of marketing variables and average pairwise interaction coefficients differ between these two models. Table 7 shows average category constants and average coefficients of marketing variables for each model and their difference if the latter is significant. 28 of 31 category constants differ significantly, 25 of these are higher for M0. All category constants are negative.

All coefficients of marketing variables are positive, i.e., a feature (display, price reduction) increases the pseudo-probability of a purchase of the respective category. 23 of 31 feature coefficients differ significantly, 14 are higher for M0. 22 of 31 display coefficients differ significantly, 13 of these coefficients are higher for M0. 20 of 31 price reduction coefficients differ significantly. 16 of these coefficients are higher for M0.

Table 7 Average category constants and average coefficients of marketing variables

About  6%, 14%, 15%, and 65% of the 465 pairwise interactions differ significantly between models at p-values of 0.10, 0.05, 0.01 and 0.005, respectively. Therefore, about 60% of pairwise interactions coefficients differ significantly between models at p–values \(\le 0.05\). 204 of these interactions are higher for M0, 211 of these interactions are positive for M1. Table 8 contains the 20 average interaction coefficients with highest absolute differences between the two models. The lowest absolute t-statistic of these differences amounts to 12.91. Sixteen of these interactions are higher for M0, and 13 are positive for M1.

Table 8 Average pairwise interaction coefficients

We also examine whether models M0 and M1 lead to different results on the relations between categories. We measure the relation of a category j conditional on another category \(j^{'}\) by the average marginal effect with respect to the purchase pseudo-probability of category j. We average across baskets by keeping the observed values of independent variables and the observed purchase incidences of categories other j and \(j^{'}\). 94% of marginal effects differ significantly between models, 72% of these are higher for model M0.

As 76% of marginal effects are positive, most category pairs can be seen as purchase complements. Nonetheless, our results hint at a considerable number of substitutive relations between category pairs. Examples of substitutive relations indicated by model M1 are razors and milk, photo and yogurt as well as cigarettes and soup with average marginal effects of -0.294, -0.059, and -0.039, respectively.

Table 9 shows the 20 highest marginal effects in absolute size. These marginal effects are all positive and significantly lower for M1. The minimum absolute t-statistic of differences between the two models is 46.40.

Table 9 Category relations measured by average marginal effects

7.4 Managerial implications

In this section, we answer the question whether the two models M0 and M1 entail different managerial implications. Based on K-means for 62 dynamic variables (two variables in each of 31 product categories), we choose six clusters. This procedure drastically reduces computation time as we only have to sample purchases for each cluster using cluster-specific arithmetic means of the dynamic variables followed by weighting according to cluster sizes. Without clustering sampling for each of the 24,047 market baskets using observed values of dynamic variables would be necessary (please also see Sect. 6).

Table 10 shows the own effects of marketing variables that differ between the two models by at least 0.005 (i.e., a half percentage point) in absolute size. For features, such higher differences occur in 55% of the categories. For displays and price reduction, we see higher differences in 39% and 32% of the categories. Most of these higher differences are positive (71%, 92%, and 90% for features, displays, and price reductions, respectively) and therefore indicate positive omitted variable bias, whose principle we have explained in Sect. 2.

Model M0 without dynamic variables frequently overestimates own effects by falsely attributing the effects of the omitted dynamic variables to the marketing variables. These positive omitted variable biases can be traced back to both positive effects of the more important dynamic variable category loyalty on purchase probabilities and positive correlations between loyalties with marketing variables. Across all categories, these correlations amount to 0.150, 0.130, 0.143 for features, displays, and price reductions, respectively.

Table 10 Own effects of marketing variables on purchase probabilities

Results for cross effects stand in marked contrast to those for own effects (see Table 11). We get only a few absolute differences between the two models of at least 0.005 (4.6%, 6.9%, and 0.8% of the \(930 =31 \times 30\) cross effects for features, displays, and price reductions, respectively). For these differences, all coefficients of M0 are higher. In other words, models M0 and M1 agree on the size of a clear majority of cross effects.

Table 11 Cross effects of marketing variables on purchase probabilities

We consider the number of purchases of any product category as managerial objective to demonstrate implications of positive biases of own effects. Purchases equal the sum of purchase probabilities inferred for a model across households. If managers use the basic model M0 in spite of its worse statistical performance, they would set more sales promotion activities in many categories due to overestimating purchase increases. We assess the importance of a positive bias by expressing it as percentage of the marginal purchase frequency of the respective category (see Table 12). These percentages measure how much managers overestimate purchase increases in a category in relative terms if they ignore dynamic variables by relying on model M0. On average, these percentages amount to 10.64, 14.66 and 10.26 for features, displays and price reduction, respectively. Percentages higher than ten occur in nine categories. We even notice relative overestimations of at least 20% for features of diapers and household cleaners, for displays of beer & ale and of frozen pizza as well as for price reduction of beer.

Table 12 Positive own effect biases as percentages of relative marginal purchase frequencies

8 Conclusion

MVL models that allow for pairwise interactions between product categories and for latent heterogeneity clearly outperform their less complex counterparts. In a similar manner, adding dynamic variables leads to better model performance. Among dynamic variables exponentially smoothed category loyalties, which previous publications have ignored, turn out to be more important than log-transformed times since the last category purchase.

Comparing two FM-MVL models, a basic model with marketing variables as independent variables and an enlarged model that in addition considers dynamic variables, shows that coefficients of marketing variables differ in a clear majority of categories. Most coefficients for features, displays and price reductions are lower for the enlarged model. A majority of pairwise interactions differs significantly between the two models being usually lower for the enlarged model. Almost all relations between categories measured by average marginal effects are different and usually lower for the enlarged model.

It also turns out that managerial implications for the two models differ. The basic model suffers from positive omitted variable biases, i.e., it overestimates the own effects of marketing variables on purchase probabilities in many product categories. The omitted variable bias provides another explanation for the well-known problem of “overpromotion” in retailing. It seems that if retailers ignore loyalty (the extent to which people would have bought a product anyway) they are inclined to promote their products too much.

We expect such biases in market situations characterized by two conditions. One condition boils down to having many product categories with high or medium loyalty values. For the data analyzed here, the omitted variable bias of a marketing variable in a category (the difference of its own effects between models M0 and M1) increases with loyalty. This increase is reflected by correlations between category loyalties (averaged across all market baskets) and biases amounting to 0.502, 0.346, 0.216 for features, displays, and price reductions, respectively.

Taking purchase frequency as proxy for loyalty, positive biases should be more common in categories with high or medium purchase frequencies. High or medium purchase frequencies are typical for categories such as food, detergents, cleaning products, hygienic products, cosmetics, pet food, some clothing products, footwear, fuel, alcohol, and digital entertainment products.

The other condition for positive biases requires positive correlations of marketing variables with loyalties across categories. Once again, we use purchase frequency as proxy of category loyalty. A study of Fader and Lodish for 331 different grocery categories shows that features, displays and price reductions are more frequent in categories with high penetration and high purchase frequency in comparison to categories with high penetration and low purchase frequency (Fader and Lodish 1990). High penetration means that a high percentage of households makes at least one purchase per year.

Let us qualify the discussion of the model comparison results. As we cannot be sure that the investigated models include the true model, we have to expect a bias corresponding to the distance between the true model and any estimated model. Clearly, a direct comparison to the unknown true model is out of the question. We had to rely on indirect comparison by using holdout data that the true model has generated. On the other hand, the basic model is too simple as its low performance shows.

Of course, several avenues for further related research on dynamic multicategory choice remain. One possibility consists in investigating the relevance of dynamic variables for non-food product categories (e.g., consumer electronics, apparel). Moreover, future research efforts could deal with models with alternative functional forms (e.g., the multivariate probit model).