Skip to main content

Implications of parent brand inertia for multiproduct pricing


This paper explores and quantifies the importance of parent brand state dependence to forward looking pricing outcomes in the area of umbrella branding and multi-product firms. We show through numerical simulations that loyalty (inertia) to the parent brand can decrease prices and reduce profits, as well as mitigate or even reverse the benefits of joint profit maximization relative to sub-brand profit maximization. These effects are mediated by brand asymmetries and the relative magnitude of sub-brand state dependence effects. Empirically, we focus on the Yogurt category, where we consider parent brands with several sub-brands. Using household level scanner data, we estimate the parameters that characterize consumer demand while flexibly accounting for consumer heterogeneity. We also estimate unobserved product costs based on a forward looking price setting game. Through counterfactual analysis, we study the overall effect of parent brand state dependence on prices and profits, as well as the empirical impact of joint profit maximization and changes in firms’ beliefs regarding consumer inertia. Our findings have implications for markets where demand is likely characterized by parent brand dynamics.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    In our empirical application, a small number of household Yogurt trips (roughly 3.4%) involve the purchase of more than one sub-brand. For our empirical analysis, we treat these outcomes as separate (contemporaneous) purchases, effectively allowing these consumers to be loyal to more than one product (or brand).

  2. 2.

    DIC (defined in Eq. 7) is based on the Deviance of the model (\(D(y,\hat {\theta }|M)=-2 \times \log [l(\hat {\theta }|M)]\)) which reflects discrepancy between the model and data. DIC is a more complete model selection metric compared to the simple Deviance because it reflects coefficient uncertainty and penalizes model complexity (Gelman et al. 2004; Gamerman and Lopes 2006).

  3. 3.

    For example, in a market of one type and twelve sub-brands belonging to less than twelve parent brands, the state space has dimension of eleven ((12-1)x1). If one wants to add a consumer type, the dimensionality increases to twenty-two ((12-1)x2). Additional details regarding the overall size of the state space (and dimensions of the associated grid) and the clear trade-off between consumer types and included brands are provided in the Appendix.

  4. 4.

    In a previous version of the paper, we instead allowed for two consumer types, but a smaller set of sub-brands. While the main implications were qualitatively similar (results available upon request), the smaller set of products is insufficient to highlight the strategic impact of multi-product pricing. A potential advantage of specifying the pricing game with a single consumer type is that it makes it easier for firms to track the current state when applying our methodology in practice. It is significantly more costly to track the loyalty of different consumer types in conjunction with each type’s preference and price sensitivity parameters.

  5. 5.

    5Morrow and Skerlos (2010) prove and characterize the existence of Bertrand Nash equilibrium between multi-product firms for a very general logit based demand system. Their framework imposes relatively week restrictions on the specification of utility functions and price effects. To overcome the problem that the logit-based profit functions of multi-product firms are not quasi-concave, they base their approach on fixed-point equations. Although they generalize their numerical fixed-point approach for equilibrium price computations to the Mixed logit context, they do not prove existence there. In forward looking settings, there are proofs for existence of equilibrium between single product firms under standard logit demand. A relatively common approach is to prove the quasi-concavity of the profit function (i.e. current period profits plus the value function for a given state vector) which in turn guarantees the existence of a unique profit maximizing price. Besanko et al. (2010), and DHR (2009) for a special case of their model, prove the existence of equilibrium through quasi-concavity. Unfortunately, the same approach cannot be followed in the case of multi-product firms as the profit function is no longer quasi-concave.

  6. 6.

    Computing optimal prices requires knowing each firm’s costs. We first assume (for purposes of conducting a variety of theoretical exercises) that costs are known to the researcher. We will later describe a method for recovering cost estimates that are subsequently used in the counterfactual exercises provided post-estimation.

  7. 7.

    These results are available from the authors upon request.

  8. 8.

    With regards to the effect of PBSD on the benefits of joint profit maximization, we found that brand asymmetry did not add any new insights to the findings reported above, so we do not repeat a similar discussion here.

  9. 9.

    Excluding Dannon, the rest of the brands collectively only had eight new flavor introductions over the two years covered by our sample. While Dannon did introduce several new varieties at this time (to build its position with new SKUs and refresh existing labels), it is important to note that Dannon enjoyed wide distribution in the US for many years prior to our sample period and is a well-established brand. Moreover, the new varieties mainly involved a re-positioning of 0.5 pound SKUs to match the 0.375 pound size of the market leader Yoplait.

  10. 10.

    Preference estimates are negative because they are estimated against the normalized outside option, which has the greatest share in the sample since most consumers do not purchase yogurt every time they go to a grocery store.

  11. 11.

    We note that, similar to the approach used by Dubé et al. (2010), we only test for the interaction term by comparing to a model specification which includes brand experience but not the interaction. Comparing to our full base model would not be appropriate since the models evaluated for this sub-section contain additional information.

  12. 12.

    Note that, given our state dependent model, we always need to condition the firm’s profit function to a state vector. While the forward looking costs are estimated based on a random sample of observed states, the myopic costs are only calibrated based on one state.

  13. 13.

    The myopic cost estimate of Dannon Fruit on the Bottom is not statistically different from the forward looking one. The case of Old Home 100 Calories is different and we attribute its relatively low dynamic cost estimate to its strong preference coefficient in the demand model. For example, if we look at the average price of Old Home 100 Calories, it is only a little lower from that of Old Home (1.34 vs 1.38) while it’s mean posterior preference parameter is much higher (−2.8 vs −4.3). The cost model infers that, since Old Home 100 Calories is priced comparably to Old Home despite having tastes so much in its favor, it must have a much lower cost. Note that, while the myopic estimate for the 100 Calories sub-brand is also lower, the difference is more pronounced in the forward looking case.

  14. 14.

    Results not shown, but available upon request. Note that this situation is not a coherent equilibrium, since both sets of firms now have inconsistent beliefs regarding the other type’s behavior (i.e. they are setting prices using policies consistent with different underlying games).

  15. 15.

    We use 48 years or 2500 weeks - we have also tested a horizon of 3000 and 3500 weeks and results do not change substantially, evidence that any simulation error is already too small when using 2500 weeks.


  1. Anand, B., & Shachar, R. (2004). Brands as beacons: a new source of loyalty to multiproduct firms. Journal of Marketing Research, 16, 135.

    Article  Google Scholar 

  2. Anderson, S., & De Palma, A. (2006). Market performance with multiproduct firms. The Journal of Industrial Economics, 54(1), 95.

    Article  Google Scholar 

  3. Arie, G., & Grieco, P. (2014). Who pays for switching costs? Quantitative Marketing and Economics, 12(4), 379.

    Article  Google Scholar 

  4. Bajari, P., Benkard, L., & Levin, J. (2007). Estimating dynamic models of imperfect competition. Econometrica, 75(5), 1331.

    Article  Google Scholar 

  5. Besanko, D., Doraszelski, U., Kryukov, Y., & Satterthwaite, M. (2010). Learning-by-doing, organizational forgetting, and industry dynamics. Econometrica, 78(2), 453.

    Article  Google Scholar 

  6. Bronnenberg, B., Kruger, M., & Mela, C. (2008). Database paper: The IRI marketing data set. Marketing Science, 27(4), 745.

    Article  Google Scholar 

  7. Cabral, L. (2008). Small switching costs lead to lower prices. Working Paper, New York University.

  8. Cabral, L. (2016). Dynamic pricing in customer markets with switching costs. Review of Economic Dynamics, 20, 43.

    Article  Google Scholar 

  9. Doganoglou, T. (2010). Switching costs, experience goods and dynamic price competition. Quantitative Marketing and Economics, 8, 167.

    Article  Google Scholar 

  10. Doraszelski, U., & Satterthwaite, M. (2010). Computable Markov-perfect industry dynamics. RAND Journal of Economics, 41(2), 215.

    Article  Google Scholar 

  11. Draganska, M., & Jain, D. C. (2005). Consumer preferences and product-line pricing strategies: an empirical analysis. Marketing Science, 25(2), 164.

    Article  Google Scholar 

  12. Dubé, J. P., Hitsch, G., Rossi, P., & Vitorino, M. A. (2008). Category pricing with state dependent utility. Marketing Science, 27(3), 417.

    Article  Google Scholar 

  13. Dubé, J. P., Hitsch, G., & Rossi, P. (2009). Do switching costs make markets less competitive? Journal of Marketing Research, 46(4), 435.

    Article  Google Scholar 

  14. Dubé, J. P., Hitsch, G., & Rossi, P. (2010). State dependence and alternative explanations for consumer inertia. RAND Journal of Economics, 41(3), 417.

    Article  Google Scholar 

  15. Erdem, T. (1998). An empirical analysis of umbrella branding. Journal of Marketing Research, 35, 339.

    Article  Google Scholar 

  16. Erdem, T., & Sun, B. (2002). An empirical investigation of the spillover effects of advertising and sales promotions in umbrella branding. Journal of Marketing Research, 39, 408.

    Article  Google Scholar 

  17. Farrell, J., & Klemperer, P. (2007). Coordination and lock-in: competition with switching costs and network effects, Handbook of industrial organization, Vol. 3. Elsevier.

    Google Scholar 

  18. Gamerman, D., & Lopes, H. (2006). Markov chain Monte Carlo: stochastic simulation for Bayesian inference, 2nd edn. Chapman & Hall/CRC.

  19. Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian data analysis, 2nd edn. Chapman & Hall/CRC.

  20. Horsky, D., & Pavlidis, P. (2011). Brand loyalty induced price promotions: an empirical investigation. Working paper, University of Rochester.

  21. Judd, K. (1998). Numerical methods in economics. Cambridge, Massachusetts: The MIT Press.

    Google Scholar 

  22. Klemperer, P. (1995). Competition when consumers have switching costs: an overview with applications to industrial organization, macroeconomics and international trade. The Review Of Economic Studies, 62(4), 515.

    Article  Google Scholar 

  23. Miklos-Thal, J. (2012). Linking reputations through umbrella branding. Quantitative Marketing and Economics, 10(3), 335.

    Article  Google Scholar 

  24. Morrow, R., & Skerlos, S. (2010). On the existence of Bertrand-Nash equilibrium prices under logit demand. Working Paper. University of Michigan.

  25. Rossi, P., Allenby, G., & McCulloch, R. (2005). Bayesian statistics and marketing. New York: Wiley.

  26. Seetharaman, P. B. (2004). Modelling multiple sources of state dependence in random utility models: a distributed lag approach. Marketing Science, 23(2), 263.

    Article  Google Scholar 

  27. Seetharaman, P. B., Ainslie, A., & Chintagunta, P. (1999). Investigating household state dependence effects across categories. Journal of Marketing Research, 36(4), 488.

    Article  Google Scholar 

  28. Viard, B. (2007). Do switching costs make markets more or less competitive? The case of 800-number portability. RAND Journal of Economics, 38(1), 146.

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Paul B. Ellickson.

Appendix: Additional details regarding computation and estimation

Appendix: Additional details regarding computation and estimation

State space discretization

One of the challenges we had to overcome in order to compute optimal dynamic price policies was the infinite dimension of the state space. Following the literature we addressed this issue by approximating the infinite state space with a finite set of points and interpolating. The approximation begins by discretizing the state space on a multidimensional grid. Each axis of the grid corresponds to one dimension of the state space, namely the fraction of a particular consumer type loyal to a specific sub-brand. We discretize each axis of the grid with a finite number of points g, such that: \(0=g_{jt}^{n1}<...<g_{jt}^{nL}=1,\,\,\forall j,\,\,\forall n\). The grid is formed by the Cartesian product of all finite sets of points for each axis, such that \({\sum }_{k=1}^{J}g_{kt}^{nl}\leq 1,\,\,\forall l=1,...,L,\,\,\forall n\). Intuitively, this condition says that, for any consumer type n, the fraction of the market loyal to each sub-brand in the choice set should sum to one across all sub-brands. The main computational challenge in solving a discretized version of a dynamic programming problem with a continuous state space is caused by dimensionality. This is because the number of grid points at which one must solve for the value and policy functions increase exponentially with the dimensionality of the state space. For the game described in Section 3, the state space has dimension equal to G = (J − 1) × N, which is the number of choice alternatives minus one, times the number of consumer types. Due to the fact that loyalty states for each type sum to one across sub-brands, only J-1 loyalty states need to be tracked for each type. For a regular grid with L points in each axis, the total number of points would be L G. The grid for our problem is not rectangular, but rather triangular (if we think about it in two dimensions) due to the requirement that the states of loyalty of a consumer type to all sub-brands must sum to one. Even though this condition reduces the number of grid points significantly, as the dimensionality of the state space increases, the computational requirements of the grid become exorbitant. In practice we are using a grid consisting of six points in each dimension (0, 0.2, 0.4, 0.6, 0.8, 1). While the complete Cartesian product for such a grid would have 362 million points (611), the condition that the state shares for all brands should sum up to one limits our total number of grid points to 4368. To see this, consider the case when sub-brand A has a state vector of 1, then the only possible states for the other sub-brands is to have zero state share. Similarly, when sub-brands A and B have state shares of 0.4 and 0.6 respectively, the only possible states for the other sub-brands are zero state shares.

It is relatively easy to see that the dimensionality of the state space increases faster with the number of brands when the number of consumer types is higher. This is especially true because adding consumer types is not economizing as much on the condition that state shares sum up to one (this condition allowed us to work with 4368 grid points instead of 363 million grid points). An additional consumer type would imply another vector of sub-brand state shares (so another 4368 points to enumerate) and then the total state space would be the product of the two consumer type grids (43682). This creates a trade-off between using more consumer types or more choice alternatives. By settling with one consumer type, we were able to include all twelve sub-brands of the sample in the pricing model, thereby ensuring internal consistency.


During computation, we use polynomial based interpolation for all cases where we need to compute the value or policy functions on state space points outside the grid. Our polynomial approximation function has the general form given below

$$\begin{array}{@{}rcl@{}} \hat{y_{j}}(s)= \hat{a_{0}}+\sum\limits_{n=1}^{N}\sum\limits_{j=1}^{J-1}\sum\limits_{m\in\{0.5,1,2,3\}}\hat{a_{njm}} \left( {s_{j}^{n}}\right)^{m} +\sum\limits_{k=1}^{K}\hat{a_{k}}\prod\limits_{l\in I_{k}}{s_{l}^{n}} \end{array} $$

It includes all the first, second and third order terms of the state variables, their square root, and several sets of (two way, three way, four way, etc.) interactions between states of different brands for the same consumer type. In the implementation of the algorithm, the correlation between predicted polynomial values and actual values is about 0.99 while the average percent error of the prediction (MAPE) is at most 0.1%. This suggests that the approximation works well in practice.

Policy function iteration

The dynamic game analysis proceeds in two steps. In a first phase we compute optimal pricing policies for each point in the state space and in a next step we compute steady state prices and shares for all sub-brands in the sample. The steps for the policy function iteration are as follows:

  1. 1.

    Start with initial guesses of value functions and price strategies. For all reported results we use zero to initiate the value functions and optimal prices for static period profits to initiate the policy functions, for each point in the state space; \({V_{f}^{0}}(s)=0\) and \({\sigma _{f}^{0}}(s)=max_{P_{j}}\pi _{f}\left [s,P,\sigma _{-f}^{0}(s)\right ]\,\,\,\forall s\in S,\,\,\,\forall f\). The initial policies are iterated so that they are best responses for the static case.

    • We experiment with different initial guesses, for the value and policy functions, to examine whether the equilibrium policies are the same or change depending on the starting values; the latter would imply the existence of multiple equilibria. It is noteworthy that all trials generated the same equilibrium policies.

  2. 2.

    For each firm, given \(V_{f}^{iter-1}\) and σ iter−1, compute \(V_{f}^{iter}\); here superscripts refer to iterations of the algorithm. The computation of value functions involves the purchase probabilities that determine both static and future discounted profits.

    1. (a)

      Iterate the Bellman equation until convergence, \(\frac {max|V_{f}^{iter}-V_{f}^{iter-1}|}{max|V_{f}^{iter}|}\leq \varepsilon _{v},\,\,\,\forall f\). Then move to the next step. We set ε v = 0.0000001.

  3. 3.

    For each firm, given \(V_{f}^{iter}\) and σ iter−1, compute σ iter. This involves finding the optimal price for f, for each point on the grid, given the right hand side of the Bellman equation and the rival’s policy. In practice we iterate across firms, solving for optimal prices at all points of our state space grid, given rival prices from the previous iteration. Upon convergence, no firm can get a higher value function by changing its policy. Policies are then best responses for each state, conditional on value functions. To speed our price optimization sub-routine and to avoid local maxima issues we start with a simple global search to identify the region of the global maximum for prices. That is, we evaluate the right hand side of the Bellman equation in 20 cent intervals to find the neighborhood of the solution. We then bound our quasi-Newton optimization algorithm in a 40 cent neighborhood of the solution identified with the initial search.

    1. (a)

      If the computed policies converge for each firm , \(\frac {max|\sigma _{f}^{iter}-\sigma _{f}^{iter-1}|}{max|\sigma _{f}^{iter}|}\leq \varepsilon _{\sigma },\,\,\,\forall f\), stop the algorithm. We set ε σ = 0.00001.

    2. (b)

      If not, update the policies and return to step 2.

Doraszelski and Satterthwaite (2010) give the following definition for a Markov-perfect equilibrium: “An equilibrium involves value and policy functions V and σ such that i) given σ f , V solves the Bellman equation for all f and ii) given σ f (s) and V f , σ f (s) solves the maximization problem on the right hand side of the Bellman equation for all s and all f.” Markov-perfect equilibria are by definition sub-game perfect, meaning that firms follow optimal strategies at each possible state. Upon convergence, the algorithm described above satisfies these general conditions and thus computes a Markov-perfect equilibrium.

To ensure that our solution is not a local minimum, but rather a global maximum, as well as a best response for all profit maximizing firms, we augmented our algorithm with an additional step that checks our equilibrium policies with a global optimizer that combines genetic algorithms with derivative based search. Indeed, when we used this global optimizer for our final policy iteration, equilibrium policies did not change, evidence that our original approach was suitably robust. Details about this optimizer can be found in Mebane W. and Sekhon J., 2011, “Genetic Optimization Using Derivatives: The rgenoud package for R.”, Journal of Statistical Software, 42(11).

To complete the game specification we also need to make assumptions regarding parameters that are part of the model but are unobserved, namely the discount factor and the total size of the market. The parametrization used in the algorithm is: β = 0.998, M S = 100000.

Steady state computation

To compute the steady state of a pricing game specification, we start from some initial state vector and, given the best response price policies, we first find the optimal price of each firm for the given period. Next, we compute the states that prevail in the market under the prices of the last iteration and use them as the next period’s state vector. We repeat the process until both the state variables and the prices of each specific product alternative converge. To verify if the obtained steady state is unique, we repeat the computations several times starting from different initial states. Throughout all trials, the algorithm always converged to the same unique steady state.

BBL estimation procedure

Our implementation of the BBL estimator involves the following steps:

  1. 1.

    We estimate policies in a first stage using spline-based regression. Policies for each sub-brand are estimated as very flexible functions of the loyalty shares of the various sub-brands (i.e. observed state vectors)

  2. 2.

    We define a set of inequalities H based on deviations from the optimal estimated policies by varying amounts (e.g. + / − 0.005, + / − 0.01, + / − 0.02) separately for each sub-brand.

  3. 3.

    Using our first stage policies, we forward simulate “correct” and “perturbed” value functions for each inequality in H in the following way:

    1. (a)

      Draw a random state vector s 0 and use it as a starting point.

    2. (b)

      Simulate current period profit flow \(\pi _{ft}(s_{t},\hat {\sigma }(s_{t}, v_{t}))\) using the current state vector and estimated policies. The estimated policies \(\hat {\sigma }(s_{t}, v_{t})\) also include i.i.d. random shocks drawn from the residuals of the first stage estimation.

    3. (c)

      Calculate next period’s state vector based on choice probabilities that correspond to the demand model, the policies and the state vector from the previous step. This is specified explicitly in our pricing game and is described in Section 3.3 and Eq. 9, \(s_{t+1}^{n}=g(P_{t},{s_{t}^{n}})=Q^{n}(P_{t})\times {s_{t}^{n}}\).

    4. (d)

      Continue forward simulation until discounted period profit flows are negligible.Footnote 15

    5. (e)

      Repeat for R different starting state vectors and average the results. This step averages out both different starting state vectors and the random shocks of the policies.

      $$ \hat{V}_{f}(s;\sigma;\theta)=\frac{1}{R}\sum\limits_{r=1}^{R}\left[\sum\limits_{t=0}^{T}\beta^{t}\pi_{ft}(s_{t},\hat{\sigma}(s_{t}, v_{t});\theta)\right] $$
  4. 4.

    We use the approach described above to obtain “correct” and “perturbed” value functions for a set of inequalities that constitute the equilibrium conditions of our model. The final step of our approach recovers cost parameters through a minimum distance estimator that penalizes violations of the equilibrium conditions. Denoting the correct and perturbed value functions by V f (s; σ j ,σ j ; 𝜃) and \(V_{f}(s;\sigma ^{\prime }_{j},\sigma _{-j};\theta )\) respectively, the second stage estimator minimizes the following objective function.

    $$ Q(\theta)=\frac{1}{H}\sum\limits_{h=1}^{H}\left( min\left\{V_{f}(s;\sigma_{f},\sigma_{-f};\theta)-V_{f}\left( s;\sigma^{\prime}_{f}, \sigma_{-f};\theta\right),0\right\}\right)^{2} $$

Applying the methodology is facilitated by the fact that the value functions for the pricing model of this study can be written as a linear function of the structural parameters which, in this case, are costs. This allows for significant computational economies in that the forward simulation need only be done once, before the estimation, and not for every trial parameter vector of the estimation routine. For demonstration purposes, we describe the linear form of the model below.

$$\begin{array}{@{}rcl@{}} V_{f}(s;\sigma;\theta) \!&=& {\sum}_{t=0}^{\infty}\beta^{t}\left[{\sum}_{j\in f}(P_{jt}-c_{j})\times D_{jt}\times M\right]\\ &=&{\sum}_{t=0}^{\infty}\beta^{t}\left[{\sum}_{j\in f}P_{jt}\!\times\! D_{jt}\!\times\! M\!\right] \,-\, {\sum}_{j\in f}\!\left[{\sum}_{t=0}^{\infty}\beta^{t}(D_{jt}\!\times\! M)\!\right]\!\times\! c_{j}\\ \end{array} $$

The linearity of the value function allows us to construct empirical analogs of both the “true” and perturbed value functions using forward simulated estimates of D j t , pricing policies, market sizes and any trial values for the cost parameter vector.

The demand parameters we use when solving for the cost parameters are Bayesian posterior estimates that correspond to the average household. We note that consistent estimation of the dynamic parameters requires consistent estimates of the demand parameters as inputs. Our Bayesian estimates are asymptotically equivalent to MLE estimates and therefore satisfy this requirement, so long as the demand model is correctly specified.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pavlidis, P., Ellickson, P.B. Implications of parent brand inertia for multiproduct pricing. Quant Mark Econ 15, 369–407 (2017).

Download citation


  • State dependence
  • Multiproduct firms
  • Umbrella branding
  • Forward looking prices

JEL Classification

  • M2
  • M3
  • L1