Skip to main content
Log in

A comparison of semiparametric and heterogeneous store sales models for optimal category pricing

  • Regular Article
  • Published:
OR Spectrum Aims and scope Submit manuscript

Abstract

Category management requires sales response models helping to simultaneously estimate marketing mix effects for all brands of a product category. We, therefore, develop a general heterogeneity seemingly unrelated regression (SUR) model accommodating correlations between sales across brands. This model contains a latent class SUR model, the well-known hierarchical Bayesian SUR model and the homogeneous SUR model as special cases. We further propose a hierarchical Bayesian semiparametric SUR model based on Bayesian P-splines which comprises a homogeneous semiparametric SUR model as nested version. The results of an empirical application with store-level scanner data indicate that the flexible SUR approaches of modeling price response clearly outperform the various parametric (homogeneous and heterogeneous) SUR approaches with respect to not only predictive validity but also total expected category profits. In particular, functional flexibility turns out to be the primary driver for improving the predictive performance of a store sales model as heterogeneity pays off only once functional flexibility has been accounted for. Furthermore, since both flexible SUR models perform nearly equally well with respect to expected category profits, a uniform pricing strategy which is much less complex to implement than micromarketing can be recommended for our data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Further studies either account for heterogeneity by using only store dummies (e.g., Reibstein and Gatignon 1984) or by estimating sales response for each store separately (e.g., Mulhern and Leone 1991) or do not account for heterogeneity (e.g., Hall et al. 2010).

  2. Optimal prices based on a homogeneous semiparametric SUR model were already determined in Weber (2015).

  3. For simplicity, we assume that correlations between brands are the same for all stores, i.e. \(\varSigma _{i}=\varSigma \) \(\forall i\).

  4. Identification problems due to label switching (Celeux et al. 2000) are considered by determining identifiability constraints as described in Frühwirth-Schnatter et al. (2004). Subsequently, we use a constrained permutation sampler as suggested in Frühwirth-Schnatter (2001).

  5. Thus, our proposed model constitutes an extension of the hierarchical Bayesian semiparametric approach introduced by Lang et al. (2015).

  6. According to, e.g., Chintagunta et al. (2003), Kadiyali et al. (2000) and Sudhir (2001), retailers (as well as manufacturers) make their pricing decisions every week.

  7. Note that store-specific or micromarketing pricing strategies can be derived not only from the hierarchical Bayesian SUR model but also from the general heterogeneity SUR model since price coefficients are store-specific in the latter model as well.

  8. We set \(U=200\).

  9. Originally, the data is provided at the UPC-level and consists of 18 UPCs. In order to handle this number of products we aggregated highly correlated UPCs to 8 brands accounting for a market share of about 96 % in the refrigerated orange juice category (64 oz) during the considered time span.

  10. For details compare Steiner et al. (2007, p. 387).

  11. The notation which is not explained here is adopted from model (1).

  12. Homogeneity refers to the effects of all independent variables. The model, however, contains store-specific intercepts in order to account for differences in baseline sales across stores (like all other models do). A table summarizing all model specifications is given in Appendix 3.

  13. We will explain below why we abstain from estimating the PHetSM for a higher number of segments.

  14. The software R is free and available at http://cran.r-project.org/.

  15. It is not possible to compute the log model likelihood for the flexible models since this measure of fit is not applicable in models with improper priors as are used for the nonlinear functions of FHomSM and FHBSM.

  16. For Citrus Hill, PHomSM reveals a sum of absolute errors of 46912, FHomSM a much smaller one of 44230. The considerably worse RMSE value of FHomSM can be traced back to one single week in the validation sample that obtains a much higher weight in case of squared errors as compared to absolute errors. In order to stay conservative with this flexible model, we omit this week later in our comparison of expected profits across models.

  17. Please note that the high price interval consists of only one observed price in case of the store brand Dominick’s. Hence, the respective price elasticity (in absolute terms) is probably overestimated for high prices of Dominick’s.

  18. According to van Heerde et al. (2002), one should choose the model with the best predictive performance.

  19. For the sake of completeness, we further determined store-specific optimal prices based on the latent class model PLCSM and the general heterogeneity model PHetSM (note that this model revealed one empty segment) and plugged them into the profit function of FHBSM. Relative to expected profits of FHBSM, optimal prices obtained from these two models result in losses of about 8 % in case of PLCSM and 10 % in case of PHetSM.

  20. Note that based on the profit function of FHBSM optimal prices of this model have to provide the largest expected profit for the whole product category in each week. Nevertheless, optimal prices of the other models can lead to better results at the brand-level for some brands.

  21. We repeated our optimization exercise for PHomSM over four weeks by simulating a more elastic demand. Similar to Chintagunta et al. (2003), we found that expected profits are lower when accounting for price endogeneity. However, this is reasonable since expected unit sales are in consequence lower, too. Optimal prices turn out somewhat different while the average price level of the product category does not change according to expression (19).

References

  • Ailawadi KL, Beauchamp JP, Donthu N, Gauri D, Shankar V (2009) Communication and promotion decisions in retailing: a review and directions for future research. J Retail 85:42–55

    Article  Google Scholar 

  • Ainslie A, Rossi PE (1998) Similarities in choice behavior across product categories. Mark Sci 17:91–106

    Article  Google Scholar 

  • Ali MM, Törn A (2004) Population set-based global optimization algorithms: some modifications and numerical studies. Comput Oper Res 31:1703–1725

    Article  Google Scholar 

  • Allenby GM, Arora N, Ginter JL (1998) On the heterogeneity of demand. J Mark Res 35:384–389

    Article  Google Scholar 

  • Andrews RL, Currim IS, Leeflang PSH, Lim J (2008) Estimating the SCAN*PRO model of store sales: HB, FM or just OLS? Int J Res Mark 25:22–33

    Article  Google Scholar 

  • Basuroy S, Mantrala MK, Walters RG (2001) The impact of category management on retailer prices and performance: theory and evidence. J Mark 65:16–32

    Article  Google Scholar 

  • Bezawada R, Balachander S, Kannan PK, Shankar V (2009) Cross-category effects of aisle and display placements: a spatial modeling approach and insights. J Mark 73:99–117

    Article  Google Scholar 

  • Blattberg RC, Neslin SA (1990) Sales promotion: concepts. methods and strategies. Prentice Hall, Englewood Cliffs

  • Bolton RN, Shankar V (2003) An empirically derived taxonomy of retailer pricing and promotion strategies. J Retail 79:213–224

    Article  Google Scholar 

  • Brezger A, Steiner WJ (2008) Monotonic regression based on Bayesian P-splines: an application to estimating price response functions from store-level scanner data. J Bus Econ Stat 26:90–104

    Article  Google Scholar 

  • Celeux G, Hurn M, Robert CP (2000) Computational and inferential difficulties with mixture posterior distributions. J Am Stat Assoc 95:957–970

    Article  Google Scholar 

  • Chen Y, Hess JD, Wilcox RT (1999) Accounting profits versus marketing profits: a relevant metric for category management. Mark Sci 18:208–229

    Article  Google Scholar 

  • Chib S, Seetharaman PB, Strijnev A (2002) Analysis of multi-category purchase incidence decisions using IRI market basket data. Adv Econom 16:57–92

    Article  Google Scholar 

  • Chintagunta PK (2000) A flexible aggregate logit demand model. Working paper, University of Chicago

  • Chintagunta PK (2002) Investigating category pricing behavior at a retail chain. J Mark Res 39:141–154

    Article  Google Scholar 

  • Chintagunta PK, Dubé JP, Singh V (2003) Balancing profitability and customer welfare in a supermarket chain. Quant Mark Econ 1:111–147

    Article  Google Scholar 

  • Dobson PW, Waterson M (2008) Chain-store competition: customized vs. uniform pricing. Working paper, University of Warwick

  • Eiben AE, Smith JE (2003) Introduction to evolutionary computing. Springer, Berlin

    Book  Google Scholar 

  • Elrod T, Russell G, Shocker A, Rao V, Bayus B, Caroll D, Kamakura W, Shankar V (2002) Inferring market structure from customer response to competing and complementary products. Mark Lett 13:221–232

    Article  Google Scholar 

  • Fahrmeir L, Kneib T, Lang S (2007) Regression - Modelle, Methoden und Anwendungen. Springer, Berlin

    Google Scholar 

  • Frühwirth-Schnatter S (2001) Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J Am Stat Assoc 96:194–209

    Article  Google Scholar 

  • Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York

    Google Scholar 

  • Frühwirth-Schnatter S, Tüchler R, Otter T (2004) Bayesian analysis of the heterogeneity model. J Bus Econ Stat 22:2–15

    Article  Google Scholar 

  • Frühwirth-Schnatter S, Tüchler R, Otter T (2005) Capturing unobserved consumer heterogeneity using the Bayesian heterogeneity model. In: Taudes A (ed) Adaptive information systems and modelling in economics and management science. Springer, Wien, pp 57–70

    Chapter  Google Scholar 

  • Gelfand AE, Sahu SK, Carlin BP (1995) Efficient parametrisations for normal linear mixed models. Biometrika 82:479–488

    Article  Google Scholar 

  • Greene WH (2008) Econometric analysis, 6th edn. Prentice Hall, New Jersey

    Google Scholar 

  • Hall JM, Kopalle PK, Krishna A (2010) Retailer dynamic pricing and ordering decisions: category management versus brand-by-brand approaches. J Retail 86:172–183

    Article  Google Scholar 

  • Hoch SJ, Lodish LM (1998) Store brands and category management. Working paper, The Wharton School, University of Pennsylvania

  • Hoch SJ, Kim BD, Montgomery AL, Rossi PE (1995) Determinants of store-level price elasticity. J Mark Res 32:17–29

    Article  Google Scholar 

  • Hruschka H (2006a) Relevance of functional flexibility for heterogeneous sales response models: a comparison of parametric and semi-nonparametric models. Eur J Oper Res 174:1009–1020

    Article  Google Scholar 

  • Hruschka H (2006) Statistical and managerial relevance of aggregation level and heterogeneity in sales response models. Mark J Res Manage 2(2006):94–102

    Google Scholar 

  • Hruschka H (2007) Clusterwise pricing in stores of a retail chain. OR Spectr 29:579–595

    Article  Google Scholar 

  • Inman JJ, Winer RS, Ferraro R (2009) The interplay between category characteristics and shopping trip factors on in-store decision making. J Mark 73:19–29

    Article  Google Scholar 

  • Kadiyali V, Chintagunta P, Vilcassim N (2000) Manufacturer-retailer channel interactions and implications for channel power: an empirical investigation of pricing in a local market. Mark Sci 19:127–148

    Article  Google Scholar 

  • Kamakura WA, Kang W (2007) Chain-wide and store-level analysis for cross-category management. J Retail 83:159–170

    Article  Google Scholar 

  • Khan RJ, Jain DC (2005) An empirical analysis of price discrimination mechanisms and retailer profitability. J Mark Res 42:516–524

    Article  Google Scholar 

  • Kim B, Blattberg RC, Rossi PE (1995) Modeling the distribution of price sensitivity and implications for optimal retail pricing. J Bus Econ Stat 13:291–303

    Google Scholar 

  • Kim B, Srinivasan K, Wilcox R (1999) Identifying price sensitive customers: the relative merits of demographic vs. purchase pattern information. J Retail 75:1–21

    Article  Google Scholar 

  • Klapper D (2000) Einflußgrößen von regulären Preiselastizitäten, Preisaktionselastizitäten und Kreuzpreiselastizitäten - Determinants of regular, promotional and cross price elasticities. OR Spectr 22:135–157

    Article  Google Scholar 

  • Koop G (2003) Bayesian econometrics. Wiley, Chichester

    Google Scholar 

  • Kopalle PK, Biswas D, Chintagunta PK, Fan J, Pauwels K, Ratchford BT, Sills JA (2009) Retailer pricing and competitive effects. J Retail 85:56–70

    Article  Google Scholar 

  • Kumar V, Leone RP (1988) Measuring the effect of retail store promotions on brand and store substitution. J Mark Res 25:178–185

    Article  Google Scholar 

  • Lang S, Brezger A (2004) Bayesian P-splines. J Comput Graph Stat 13:183–212

    Article  Google Scholar 

  • Lang S, Adebayo SB, Fahrmeir L, Steiner WJ (2003) Bayesian geoadditive seemingly unrelated regression. Comput Stat 18:263–292

    Article  Google Scholar 

  • Lang S, Steiner W, Weber A, Wechselberger P (2015) Accommodating heterogeneity and nonlinearity in price effects for predicting brand sales and profits. Eur J Oper Res 246:232–241

    Article  Google Scholar 

  • Lenk PJ, DeSarbo WS (2000) Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika 65:93–119

    Article  Google Scholar 

  • Levy M, Chen H, Ray S, Bergen M (2004) Asymmetric price adjustment in the small: an implication of rational inattention. Discussion Paper Series 04-23, Utrecht School of Economics

  • Luan YJ, Sudhir K (2010) Forecasting marketing-mix responsiveness for new products. J Mark Res 47:444–457

    Article  Google Scholar 

  • Manchanda P, Ansari A, Gupta S (1999) The “shopping basket”: a model for multicategory purchase incidence decisions. Mark Sci 18:95–114

    Article  Google Scholar 

  • Manchanda P, Rossi PE, Chintagunta P (2004) Response modeling with nonrandom marketing-mix variables. J Mark Res 41:467–478

    Article  Google Scholar 

  • Mebane WR Jr, Sekhon JS (2011) Genetic Optimization using derivatives: the rgenoud package for R. J Stat Softw 42:1–26

    Article  Google Scholar 

  • Montgomery AL (1997) Creating micro-marketing pricing strategies using supermarket scanner data. Mark Sci 16:315–337

    Article  Google Scholar 

  • Montgomery AL, Bradlow ET (1999) Why analyst overconfidence about the functional form of demand models can lead to overpricing. Mark Sci 18:569–583

    Article  Google Scholar 

  • Montgomery AL, Rossi PE (1999) Estimating price elasticities with theory-based priors. J Mark Res 36:413–423

    Article  Google Scholar 

  • Mulhern FJ, Leone RP (1991) Implicit price bundling of retail products: a multiproduct approach to maximizing store profitability. J Mark 55:63–76

    Article  Google Scholar 

  • Murray CC, Talukdar D, Gosavi A (2010) Joint optimization of product price, display orientation and shelf-space allocation in retail category management. J Retail 86:125–136

    Article  Google Scholar 

  • Natter M, Reutterer T, Taudes A (2007) An assortmentwide decision-support system for dynamic pricing and promotion planning in DIY retailing. Mark Sci 26:576–583

    Article  Google Scholar 

  • Nijs VR, Srinivasan S, Pauwels K (2007) Retail-price drivers and retailer profits. Mark Sci 26:473–487

    Article  Google Scholar 

  • Otter T, Gilbride TJ, Allenby GM (2011) Testing models of strategic behavior characterized by conditional likelihoods. Mark Sci 30:686–701

    Article  Google Scholar 

  • Petrin A, Train K (2010) A control function approach to endogeneity in consumer choice models. J Mark Res 47:3–13

    Article  Google Scholar 

  • Reibstein DJ, Gatignon H (1984) Optimal product line pricing: the influence of elasticities and cross-elasticities. J Mark Res 21:259–267

    Article  Google Scholar 

  • Rossi PE (2014) Even the rich can make themselves poor: a critical examination of IV methods in marketing applications. Mark Sci 33:655–672

    Article  Google Scholar 

  • Rossi PE, Allenby GM, McCulloch R (2005) Bayesian statistics and marketing. Wiley, Chichester

    Book  Google Scholar 

  • Russell GJ, Kamakura WA (1997) Modeling multiple category brand preference with household basket data. J Retail 73:439–461

    Article  Google Scholar 

  • Russell GJ, Petersen A (2000) Analysis of cross category dependence in market basket selection. J Retail 76:367–392

    Article  Google Scholar 

  • Sekhon JS, Mebane WR Jr (1998) Genetic optimization using derivatives. Political Anal. 7:187–210

    Article  Google Scholar 

  • Shankar V, Bolton RN (2004) An empirical analysis of determinants of retailer pricing strategy. Mark Sci 23:28–49

    Article  Google Scholar 

  • Shankar V, Kannan PK (2014) An across store analysis of intrinsic and extrinsic cross-category effects. Cust Needs Solut 1:143–164

    Article  Google Scholar 

  • Shankar V, Krishnamurthi L (1996) Relating price sensitivity to retailer promotional variables and pricing policy: an empirical analysis. J Retail 72:249–272

    Article  Google Scholar 

  • Shankar V, Inman JJ, Mantrala M, Kelley E, Rizley R (2011) Innovations in shopper marketing: current insights and future research issues. J Retail 87S:S29–S42

    Article  Google Scholar 

  • Silva-Risso JM, Bucklin RE, Morrison DG (1999) A decision support system for planning manufacturers’ sales promotion calendars. Mark Sci 18:274–300

    Article  Google Scholar 

  • Smith M, Kohn R (2000) Nonparametric seemingly unrelated regression. J Econ 98:257–281

    Article  Google Scholar 

  • Song I, Chintagunta PK (2006) Measuring cross-category price effects with aggregate store data. Manag Sci 52:1594–1609

    Article  Google Scholar 

  • Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit. J Roy Stat Soc: Ser B 64:583–639

    Article  Google Scholar 

  • Steiner WJ, Brezger A, Belitz C (2007) Flexible estimation of price response function using retail scanner data. J Retail Consum Serv 14:383–393

    Article  Google Scholar 

  • Sudhir K (2001) Structural analysis of manufacturer pricing in the presence of a strategic retailer. Mark Sci 20:244–264

    Article  Google Scholar 

  • Tellis GJ, Zufryden FS (1995) Tackling the retailer decision maze: which brands to discount, how much, when and why? Mark Sci 14:271–299

    Article  Google Scholar 

  • van Heerde HJ, Leeflang PSH, Wittink DR (2002) How promotions work: SCAN*PRO-based evolutionary model building. Schmalenbach Bus Rev 54:198–220

    Google Scholar 

  • Verbeke G, Lesaffre E (1996) A linear mixed-effects model with heterogeneity in the random-effects population. J Am Stat Assoc 91:217–221

    Article  Google Scholar 

  • Vilcassim NJ, Chintagunta PK (1995) Investigating retailer product category pricing from household scanner panel data. J Retail 71:103–128

    Article  Google Scholar 

  • Villas-Boas JM, Winer RS (1999) Endogeneity in brand choice models. Manag Sci 45:1324–1338

    Article  Google Scholar 

  • Weber A (2015) Modeling price response from store sales: The roles of store heterogeneity and functional flexibility. Shaker Verlag, Aachen

  • Wedel M, Zhang J (2004) Analyzing brand competition across subcategories. J Mark Res 41:448–456

    Article  Google Scholar 

  • Wedel M, Zhang J, Feinberg F (2004) A model-based approach to setting optimal retail markups. Working paper, University of Michigan

  • Weicker K (2007) Evolutionäre Algorithmen, 2nd edn. Teubner Verlag, Wiesbaden

    Google Scholar 

  • Zellner A (1962) An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J Am Stat Assoc 57:348–368

    Article  Google Scholar 

  • Zenor MJ (1994) The profit benefits of category management. J Mark Res 31:202–213

    Article  Google Scholar 

Download references

Acknowledgments

The data for our empirical study was provided by the James M. Kilts Center, GSB, University of Chicago. We further thank three anonymous referees for their critical comments and valuable recommendations on our way toward a publishable manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anett Weber.

Appendices

Appendix 1: General heterogeneity SUR model

1.1 The marginal model

We use the partly marginalized Gibbs sampler suggested by Frühwirth-Schnatter et al. (2004) where the random effects are integrated out when sampling S and \(\gamma =(\alpha ,\beta ^{G}_{1},\ldots ,\beta ^{G}_{K})\). If segment membership is known the random effects \(\beta _{i}\) can be rewritten as

$$\begin{aligned} \beta _{i}= & {} \beta ^{G}_{k}+b_{i},\quad b_{i}\sim N(0,Q^{G}_{k}). \end{aligned}$$
(22)

Under the assumption that \(b_{i}\) and \(\epsilon _{i}\) are independent (Frühwirth-Schnatter 2006) the marginal model is obtained by substituting (22) into model (4):

$$\begin{aligned} y_{i}=X_{i}\alpha + W_{i}\beta ^{G}_{k} + \tilde{\epsilon _{i}}, \quad \tilde{\epsilon _{i}}\sim N(0,W_{i}Q^{G}_{k}(W_{i})^{\prime }+\varSigma \otimes I_{T_{i}}). \end{aligned}$$
(23)

Introducing indicator variables

$$\begin{aligned} H^{(k)}_{i}= & {} {\left\{ \begin{array}{ll} 1, &{} \text {if } S_{i}=k,\\ 0, &{} \text {otherwise}, \end{array}\right. }\quad k=1,\ldots ,K, \end{aligned}$$
(24)

leads to the following representation of the marginal model:

$$\begin{aligned} y_{i} = Z_{i}\gamma +\epsilon ^{*}_{i},\quad \epsilon ^{*}_{i} \sim N(0,V_{i}) \end{aligned}$$
(25)

with the design matrix

$$\begin{aligned} Z_{i}= & {} \left( X_{i}\ \ W_{i}H^{(1)}_{i}\ \ \cdots \ \ W_{i}H^{(K)}_{i}\right) \end{aligned}$$
(26)

and the covariance matrix

$$\begin{aligned} V_{i} = W_{i}H^{(1)}_{i}Q^{G}_{1}\left( W_{i}\right) ^{\prime }+ \cdots + W_{i}H^{(K)}_{i}Q^{G}_{K}\left( W_{i}\right) ^{\prime }+\varSigma \otimes I_{T_{i}}. \end{aligned}$$
(27)

1.2 Prior specifications

The parameter vectors \(\alpha \) and \(\beta ^{G}_{k}\) (\(k=1,\ldots ,K\)) are assumed to be a priori independent leading to a joint prior \(N(\mu _{0\gamma },\varSigma _{0\gamma })\) for \(\gamma =(\alpha ,\beta ^{G}_{1},\ldots ,\beta ^{G}_{K})\) with \(\mu _{0\gamma }=(\mu _{0\alpha },\mu _{0\beta ^{G}_{1}},\ldots ,\mu _{0\beta ^{G}_{K}})^{\prime }\). \(\varSigma _{0\gamma }\) is a block diagonal matrix with diagonal elements \((\varSigma _{0\alpha },\varSigma _{0\beta ^{G}_{1}},\ldots ,\varSigma _{0\beta ^{G}_{K}})\). We determined the prior means \(\mu _{0\alpha }\) for the fixed effects \(\alpha \) and \(\mu _{0\beta ^{G}_{k}}\) for the group-specific effects \(\beta ^{G}_{k}\) (\(k=1,\ldots ,K\)) by estimating a homogeneous SUR model via Gibbs sampling implemented in the R function rsurGibbs() contained in the R package bayesm (see Rossi et al. 2005 for details).

About the remaining parameters we stay nearly noninformative choosing \(\varSigma _{0\alpha }=\varSigma _{0\beta ^{G}_{1}}=\cdots =\varSigma _{0\beta ^{G}_{K}}=50I\), \(a_{0\varSigma }=1\) and \(B_{0\varSigma }=0.005I\) and \(e_{0k}=1\) for \(k=1,\ldots ,K\). The covariance matrices \(Q^{G}_{k}\) \((k=1,\ldots ,K)\) are defined as diagonal matrices with diagonal elements \((\tau ^{2}_{k1},\ldots ,\tau ^{2}_{kr})\) (Gelfand et al. 1995, p. 483) where \(r=\sum ^{M}_{m=1}r_{m}\) corresponds to the dimension of \(\beta _{i}\). Thus, inverse gamma priors \({IG}(a^{k}_{0h},b^{k}_{0h})\) are placed on the variance parameters \(\tau ^{2}_{kh}\) \((h=1,\ldots ,r)\) with hyperparameters \(a^{k}_{0h}=b^{k}_{0h}=0.001\) (\(h=1,\ldots ,r\), \(k=1,\ldots ,K\)). For our data used in the empirical application, we found that applying diagonal covariance matrices \(Q^{G}_{k}\) \((k=1,\ldots ,K)\) considerably improved the model fit of the hierarchical Bayesian SUR model and the general heterogeneity SUR model (measured by the model likelihood). In the context of a single (HB) regression model, Andrews et al. (2008) have shown that the predictive performance of the model can be improved (and besides model complexity will be reduced) by not estimating covariances between store-specific effects. The respective assumption that, for example, the price sensitivity of a store i is independent of the display activity in another store \(i^{\prime }\) is reasonable in the context of sales data belonging to one retail chain since customers usually shop in the store which is closest to their home (cf. Andrews et al. 2008, p. 25). If the covariance matrices \(Q^{G}_{k}\) (\(k=1,\ldots ,K\)) were non-diagonal matrices we could choose \(a_{0Q^{G}_{K}}=r+1\) and \(B_{0Q^{G}_{K}}=rI\) for \(k=1,\ldots ,K\) as hyperparameters for the inverse Wishart prior.

Parameter estimates are based on the last 2000 iterations of the MCMC algorithm. Due to high computing times the burn-in period was limited to 1000 iterations for the one-segment models and the unidentified versions of the two-segment models, and to 8000 iterations when the latent class SUR model had to be identified via constrained permutation sampling. Trace plots of sampled coefficients indicated that these numbers of burn-in iterations are sufficient to ensure convergence of the MCMC algorithm.

1.3 Bayesian inference

The joint posterior of the latent random effects \(\beta ^{I}=(\beta _{1},\ldots ,\beta _{N})\), the latent segment indicator \(S=(S_{1},\ldots ,S_{N})\) and all unknown parameters \(\phi =(\alpha ,\beta ^{G}_{1},\ldots ,\beta ^{G}_{K},\) \(Q^{G}_{1},\ldots ,Q^{G}_{K},\eta _{1},\ldots ,\eta _{K},\varSigma )\) given the data \(y=(y_{1},\ldots ,y_{N})\) is proportional to

$$\begin{aligned} \pi (\beta ^{I},S,\phi | y)\propto & {} f(y| \beta ^{I},S,\phi )\pi (\beta ^{I}| S,\phi )\pi (S|\phi )\pi (\phi ). \end{aligned}$$
(28)

For a fixed number K of segments the following sampling scheme results (compare Frühwirth-Schnatter et al. 2004, 2005, and the derivation of steps 4 (a2) and (b) in Weber (2015)):

  1. 1.

    Sample \(S_{i}\), \(i=1,\ldots ,N\), from

    $$\begin{aligned} P(S_{i}=k|\cdot )\propto & {} \eta _{k}\cdot f(y_{i}| \alpha ,\beta ^{G}_{k},Q^{G}_{k},\varSigma ),\quad k=1,\ldots ,K, \end{aligned}$$
    (29)

    where the likelihood \(f(y_{i}| \alpha ,\beta ^{G}_{k},Q^{G}_{k},\varSigma )\) is represented by the density of the normal distribution

    $$\begin{aligned} N(X_{i}\alpha + W_{i}\beta ^{G}_{k},W_{i}Q^{G}_{k}(W_{i})^{\prime }+\varSigma \otimes I_{T_{i}}). \end{aligned}$$
  2. 2.

    Sample \(\eta \) from the Dirichlet distribution \(D(e_{0,1}+N_{1},\ldots ,e_{0,K}+N_{K})\) with \(N_{k}=\#\{S_{i}=k\}\).

  3. 3.

    \(\alpha ,\beta ^{G}_{1},\ldots ,\beta ^{G}_{K}\) and \(\beta ^{I}\) are conditionally independent and can be sampled within two blocks.

    1. (a)

      Sample \(\gamma \) from the normal distribution \(N(\mu _{\gamma },\varSigma _{\gamma })\) with

      $$\begin{aligned} \varSigma _{\gamma }= & {} \left( \sum ^{N}_{i=1}\left( Z_{i}\right) ^{\prime }\left( V_{i}\right) ^{-1}Z_{i}+\left( \varSigma _{0\gamma }\right) ^{-1}\right) ^{-1}, \\ \mu _{\gamma }= & {} \varSigma _{\gamma }\left( \sum ^{N}_{i=1}\left( Z_{i}\right) ^{\prime }\left( V_{i}\right) ^{-1}y_{i}+\left( \varSigma _{0\gamma }\right) ^{-1}\mu _{0\gamma }\right) . \end{aligned}$$
    2. (b)

      Sample \(\beta _{i}\), \(i=1,\ldots ,N\), from the normal distribution \(N\left( \mu _{\beta _{i}},\varSigma _{\beta _{i}}\right) \) with

      $$\begin{aligned} \varSigma _{\beta _{i}}= & {} \left( \left( W_{i}\right) ^{\prime }\left( \varSigma \otimes I_{T_{i}}\right) ^{-1}W_{i} + \left( Q^{G}_{S_{i}}\right) ^{-1}\right) ^{-1}, \\ \mu _{\beta _{i}}= & {} \varSigma _{\beta _{i}}\left( \left( W_{i}\right) ^{\prime }\left( \varSigma \otimes I_{T_{i}}\right) ^{-1}\left( y_{i}-X_{i}\alpha \right) + \left( Q^{G}_{S_{i}}\right) ^{-1}\beta ^{G}_{S_{i}} \right) . \end{aligned}$$
  4. 4.

    The covariance matrices \(Q^{G}_{1},\ldots ,Q^{G}_{K}\) and \(\varSigma \) are conditionally independent so that sampling can be done within two blocks.

    1. (a1)

      If the covariance matrices \(Q^{G}_{k}\), \(k=1,\ldots ,K\), are non-diagonal matrices they can be sampled from the inverse Wishart distribution \({IW}(a_{Q_{k}},B_{Q_{k}})\) with

      $$\begin{aligned} a_{Q_{k}} = a_{0Q^{G}_{k}}+N_{k}/2, \quad B_{Q_{k}} = B_{0Q^{G}_{k}}+\frac{1}{2}\sum ^{N}_{i=1}H^{(k)}_{i} (\beta _{i}-\beta ^{G}_{k})^{\prime }(\beta _{i}-\beta ^{G}_{k}). \end{aligned}$$
    2. (a2)

      If the covariance matrices \(Q^{G}_{k}\), \(k=1,\ldots ,K\), are diagonal matrices of the form \(Q^{G}_{k} = \mathrm{diag}(\tau ^{2}_{k1},\ldots ,\tau ^{2}_{kr})\) \((k=1,\ldots ,K)\) the variance parameters \(\tau ^{2}_{kh}\), \(k=1,\ldots ,K\) and \(h=1,\ldots ,r\), can be sampled from the inverse gamma distribution \({IG}(a_{\tau _{kh}},b_{\tau _{kh}})\) with

      $$\begin{aligned} a_{\tau _{kh}} = a^{k}_{0h}+N_{k}/2, \quad b_{\tau _{kh}} = b^{k}_{0h}+\frac{1}{2}\sum ^{N}_{i=1}H^{(k)}_{i} (\beta _{ih}-\beta ^{G}_{kh})^{2}. \end{aligned}$$
    3. (b)

      Sample \(\varSigma \) from the inverse Wishart distribution \({IW}(a_{\varSigma },B_{\varSigma })\) with

      $$\begin{aligned} a_{\varSigma } = a_{0\varSigma }+\frac{1}{2}\sum ^{N}_{i=1}T_{i}, \quad B_{\varSigma } = B_{0\varSigma }+\frac{1}{2}\sum ^{N}_{i=1}A^{(i)} \end{aligned}$$

      with \(A^{(i)} \!=\! (Y_{i} \!-\! P_{i}^{*}B_{i})^{'}(Y_{i} \!-\! P_{i}^{*}B_{i}), Y_{i} \!=\! (y_{i1},...,y_{iM}), P_{i}^{*} \!=\! (P_{i1},...,P_{iM}), P_{im} = (X_{im},W_{im}), B_{i} = diag(\gamma _{i1},...,\gamma _{iM}) \,\mathrm{and}\, \gamma _{im} = (\alpha _{m},\beta _{im})^{'}.\)

The R code of the estimation procedure is available from the authors upon request.

Appendix 2: Hierarchical Bayesian semiparametric SUR model

Inference uses MCMC simulation, drawing from full conditionals of single parameters or blocks of parameters given the rest and the data. Let \(y_{m}=(y_{m11},\ldots ,y_{mNT})'\) and \(\eta _{m}=(\eta _{m11},\ldots ,\eta _{mNT})'\) denote the vector on the m-th response variable and the corresponding vector of predictors. Then the additive predictors in (7) can be written as

$$\begin{aligned} \eta _{m}~=~\sum _{j=1}^{M} A_{mj} X_{mj} \beta _{mj} ~+~ \sum _{l=0}^{L} \, W_{ml} \, Z \gamma _{ml}~+~V_m \delta _m, \end{aligned}$$
(30)

where \(A_{mj}\) is a \(n \times n\) diagonal matrix with possible entries \(1+\alpha _{m1j},\dots ,1+\alpha _{mNj}\) depending on the store \(i=1,\dots ,N\) a particular observation pertains to (with n being the total number of observations of brand m), \(X_{mj}\) corresponds to the design matrix for the jth price effect of brand m with elements given by the B-spline basis functions evaluated at the observed prices \(P_{jit}\), \(W_{ml}\) is the (diagonal) design matrix for the lth parametrically and store-specifically modeled variable \(D_{mit}^l\), Z is a 0/1 incidence matrix indicating if a particular observation belongs to store i, \(V_{m}\) is the design vector for the homogeneous parametric effect of \(E_t\), \(\beta _{mj} = (\beta _{mj1},\dots ,\beta _{mjO_{mj}})'\) is the vector of regression parameters for function \(f_{mj}\).

An alternative formulation in terms of the random coefficients is

$$\begin{aligned} \eta _{m}~=~\sum _{j=1}^{M} (f_{mj} + \tilde{X}_{mj} Z_{} \alpha _{mj}) ~+~ \sum _{l=0}^{L} \, W_{ml} \, Z \gamma _{ml} ~+~V_m\delta _m, \end{aligned}$$
(31)

where \(\tilde{X}_{mj} = \mathrm{diag}(f_{mj}(P_{m11}),\dots ,f_{mj}(P_{mNT}))\) and \(\alpha _{mj}=(\alpha _{m1j},\dots ,\alpha _{mNj})'\) the vector of random coefficients for function \(f_{mj}\).

Let \(\beta =(\ldots ,\beta _{mj}',\ldots )'\) and \(\alpha =(\ldots ,\alpha _{mj}',\ldots )'\) denote the stacked vector of all regression parameters, \(\tau ^2=(\dots ,\tau ^2_{mj},\dots )'\), \(\phi ^2=(\dots ,\phi ^2_{mj},\dots )'\), \(\psi ^2=(\dots ,\phi ^2_{ml},\dots )'\) the vectors of corresponding variances \(\tau _{mj}^2\), \(\phi _{mj}^2\), \(\psi ^2_{ml}\) and \(\delta =(\delta _1', \ldots ,\delta _M')'\) the stacked vector of all fixed effects parameters.

Posterior analysis is then based on

$$\begin{aligned}&{\pi \left( \beta ,\alpha , \tau ^2,\phi ^2, \psi ^2, \delta ,\varSigma \big | y\right) } \nonumber \\&\quad \propto f\left( y\big | \beta ,\alpha ,\delta ,\varSigma \right) \prod ^{M}_{m=1} \prod ^{M}_{j=1} \left[ \pi \left( \beta _{mj}\big |\tau ^{2}_{mj}\right) \pi \left( \alpha _{mj}\big |\phi ^{2}_{mj}\right) \pi \left( \tau ^{2}_{mj}\right) \pi \left( \phi ^{2}_{mj}\right) \right] \nonumber \\ \end{aligned}$$
(32)
$$\begin{aligned}&\qquad \times \prod ^{M}_{m=1} \prod ^{L}_{l=0} \left[ \pi \left( \gamma _{ml}\big |\psi ^{2}_{ml}\right) \pi (\psi ^2_{ml}) \right] \pi \left( \delta \right) \pi \left( \varSigma \right) \end{aligned}$$
(33)

with \(f(y|\cdot )\) denoting the likelihood of the data.

Parameters are estimated via Gibbs sampling within the following blocks (Lang et al. 2003):

  1. 1.

    Sample \(\beta _{mj}\), \(m=1,\ldots ,M\), \(j=1,\ldots ,M\). The full conditional for \(\beta _{mj}\) is Gaussian, \(\beta _{mj}|\cdot ~ \sim ~ N(\mu _{mj},~P_{mj}^{-1})\), with precision matrix

    $$\begin{aligned} P_{mj}= \frac{X_{mj}' A_{mj}^2 X_{mj}}{\sigma ^2_{m|-m}}~ + ~\frac{ K_{mj}}{\tau _{mj}^2} \end{aligned}$$

    and mean

    $$\begin{aligned} \mu _{mj}~ = ~P_{mj}^{-1}\left( \frac{1}{\sigma ^2_{m |-m}} X'_{mj}A_{mj} (y_{m} - o_{m})\right) . \end{aligned}$$

    Here, \(\sigma ^2_{m|-m}\) is the (conditional) variance

    $$\begin{aligned} \sigma ^2_{m |-m}~=~\sigma ^2_m - \varSigma _{m,-m}\varSigma _m^{-1}\varSigma _{m,-m}', \end{aligned}$$

    derived from partitioning \(\varSigma \) into

    $$\begin{aligned} \varSigma = \left( \begin{array}{cc} \sigma ^2_m &{}\quad \varSigma _{m,-m}\\ \varSigma _{m,-m}'&{}\quad \varSigma _m\\ \end{array}\right) \end{aligned}$$

    (after reordering for the mth component of the error variable). The vector \(o_{m}\) is an offset vector defined in Lang et al. (2003).

  2. 2.

    Sample \(\alpha _{mj}\), \(m=1,\ldots ,M\), \(j=1,\ldots ,M\). The full conditionals for the scaling factors are derived from (31) with \(\alpha _{mj} \, | \, \cdot \sim \, N(\mu _{mj},~P_{mj}^{-1})\) and

    $$\begin{aligned} P_{mj}= \frac{Z' \tilde{X}_{mj}^2 Z}{\sigma ^2_{m|-m}}~ + ~\frac{ I}{\phi _{mj}^2} \end{aligned}$$

    and mean

    $$\begin{aligned} \mu _{mj}~ = ~P_{mj}^{-1}\left( \frac{1}{\sigma ^2_{m |-m}} Z' \tilde{X}_{mj} (y_{m} - o_{m})\right) . \end{aligned}$$
  3. 3.

    Sample \(\gamma _{ml}\), \(m=1,\dots ,M\), \(l=0,\dots ,L\). Full conditionals are standard and omitted here.

  4. 4.

    Sample \(\delta _{m}\), \(m=1,\ldots ,M\). Full conditionals are standard and omitted here.

  5. 5.

    Sample \(\tau ^{2}_{mj}\), \(m=1,\ldots ,M\), \(j=1,\ldots ,M\). Full conditionals for the variance parameters \(\tau _{mj}^2\) are inverse Gamma distributions with parameters

    $$\begin{aligned} a_{mj}=0.001 + \frac{rank(K_{mj})}{2},\quad b_{mj} = 0.001 + \frac{1}{2}\beta _{mj}'K_{mj}\beta _{mj}. \end{aligned}$$
    (34)
  6. 6.

    Sample \(\phi ^{2}_{mj}\), \(m=1,\ldots ,M\), \(j=1,\ldots ,M\). Full conditionals for the variance parameters \(\phi _{mj}^2\) are inverse Gamma distributions with parameters

    $$\begin{aligned} a_{mj}=0.001+\frac{N}{2},\quad b_{mj}=0.001+ \frac{1}{2}\alpha _{mj}' \alpha _{mj}. \end{aligned}$$
    (35)
  7. 7.

    Sample \(\varSigma \). The full conditional for \(\varSigma \) is an inverse Wishart distribution with parameters

    $$\begin{aligned} a=1 +\frac{n}{2},\quad B=0.005I+\frac{1}{2} \sum _{i=1}^{N} \sum _{t=1}^T (y_{it}-\eta _{it})(y_{it}-\eta _{it})'. \end{aligned}$$
    (36)

    where \(Y_{it} = (y_{1it},\dots ,y_{Mit})'\) and \(\eta _{it} = (\eta _{1it},\dots ,\eta _{Mit})'\)

The complete sampling scheme can be found in Lang et al. (2003, pp. 270–271) in combination with Lang et al. (2015). The estimation procedure of the semiparametric SUR model is implemented in the software BayesX and the code is available from the authors.

Appendix 3: Overview of model specifications

Model

Specification

Modeled effects

  

Intercept

Price effects

Display and price ending effects

Holiday effect

PHomSM

Parametric, homogeneous

Store-specific

Equal across stores

Equal across stores

Equal across stores

PLCSM

Parametric, heterogeneous

Store-specific

Segment-specific

Segment-specific

Equal across stores

PHBSM

Parametric, heterogeneous

Store-specific

Store-specific

Store-specific

Equal across stores

PHetSM

Parametric, heterogeneous

Store-specific

Store-specific within segments

Store-specific within segments

Equal across stores

FHomSM

Semiparametric, homogeneous

Store-specific

Equal across stores and nonparametric

Equal across stores

Equal across stores

FHBSM

Semiparametric, heterogeneous

Store-specific

Store-specific and nonparametric

Store-specific

Equal across stores

Appendix 4: Optimization details

The optimization algorithm implemented in the R function genoud() is able to solve problems for objective functions that are nonlinear or even discontinuous in parameters and provides a high probability of finding the global optimum (Sekhon and Mebane 1998). The evolutionary algorithm starts with a population of trial solutions (in our application one trial solution corresponds to a vector of prices of the 8 brands). Then, a set of heuristic rules or operators (basically reproduction, mutation, crossover) is used to modify the trial solutions in order to increase their fitness values (i.e., the values of the function which is optimized). The selection of trial solutions for reproduction depends on their value of the objective function. The best trial solution is reproduced in each generation. The remaining trial solutions are recombined or mutated by applying the other operators (for details compare Sekhon and Mebane 1998, p. 192). That way, a new population (generation) results that “tends to be, on average, better than its predecessor” (Mebane and Sekhon 2011, p. 3). The following pseudo-code (adapted from Eiben and Smith 2003, p. 16) presents the general procedure:

figure a

Since the evolutionary algorithm is basically a genetic one where the code-strings are floating-point vectors (see, e.g., Ali and Törn 2004 for a description of genetic algorithms) its performance strongly depends on the population size (option pop.size) which must to be sufficiently large (Mebane and Sekhon 2011). In our application we chose \(pop.size=3000\) which leads to relatively high computing times but increases the probability of finding the global optimum. The maximum number of generations was set to \(max.generations=300\). The algorithm stops if the objective function is not improved anymore in a fixed number (wait.generations) of generations. We set \(wait.generations=10\). On average 25 generations were required to solve the optimization problems. The option Domains comprises lower and upper bounds for each variable which correspond to the observed price range for each price variable in our applications. Finally, \(boundary.enforcement=2\) ensures candidates that lie within the bounds specified in Domains. The R code is available from the authors upon request.

In our empirical application, optimization is done separately for each of the 89 weeks as well as separately for each store/segment in case of the heterogeneous models. The following table summarizes the size of the respective optimization problems for each model:

Model

Number of problems to solve

PHomSM, FHomSM

89 weeks \(=\) 89 problems to solve

 

1 problem \(=\) 1 run for all brands and all stores

PLCSM (2)

89 weeks \(\times \) 2 segments \(=178\) problems to solve

 

1 problem \(=\) 1 run for all brands and all stores of 1 segment

PHBSM, FHBSM

89 weeks \(\times \) 81 stores \(= 7209\) problems to solve

 

1 problem \(=\) 1 run for all brands (in 1 store)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weber, A., Steiner, W.J. & Lang, S. A comparison of semiparametric and heterogeneous store sales models for optimal category pricing. OR Spectrum 39, 403–445 (2017). https://doi.org/10.1007/s00291-016-0459-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00291-016-0459-6

Keywords

Navigation