Cross channel effects of search engine advertising on brick & mortar retail sales: Meta analysis of large scale field experiments on Google.com

Abstract

We investigate the cross channel effects of search engine advertising on Google.com on sales in brick and mortar retail stores. Obtaining causal and actionable estimates in this context is challenging: Brick and mortar store sales vary widely on a weekly basis; offline media dominate the marketing budget; search advertising and demand are contemporaneously correlated; and estimates have to be credible to overcome agency issues between the online and offline marketing groups. We report on a meta-analysis of a population of 15 independent field experiments, in which 13 well-known U.S. multi-channel retailers spent over $4 Million in incremental search advertising. In test markets category keywords were maintained in positions 1-3 for 76 product categories with no search advertising on these keywords in the control markets. Outcomes measured include sales in the advertised categories, total store sales and Return on Ad Spending. We estimate the average effect of each outcome for this population of experiments using a Hierarchical Bayesian (HB) model. The estimates from the HB model provide causal evidence that increasing search engine advertising on broad keywords on Google.com had a positive effect on sales in brick and mortar stores for the advertised categories for this population of retailers. There also was a positive effect on total store sales. Hence the increase in sales in the advertised categories was incremental to the retailer net of any sales borrowed from non-advertised categories. The total store sales increase was a meaningful improvement compared to the baseline sales growth rates. The average Return on Ad Spend (ROAS) is positive, but does not breakeven on average although several retailers achieved or exceeded break-even based only on brick and mortar sales. We examine the robustness of our findings to alternative assumptions about the data specific to this set of experiments. Our estimates suggest online and offline are linked markets, that media planners should account for the offline effects in the planning and execution of search advertising campaigns, and that these effects should be adjusted by category and retailer. Extensive replication and a unique research protocol ensure that our results are general and credible.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Notes

  1. 1.

    For a definition of “right rail” see http://www.sempo.org/?page=glossary&hhSearchTerms=%22right+and+rail%22

  2. 2.

    http://www2.census.gov/retail/releases/historical/ecomm/12q4.pdf

  3. 3.

    http://www.forrester.com/Forrester+Research+Online+Retail+Forecast+2012+To+2017+US/fulltext/-/E-RES90661

  4. 4.

    According to data reported by Shopper Trak, foot traffic to retailers over the holiday season dropped by over 50% over the three holiday seasons spanning 2011 to 2013 (Banjo and FitzGerald 2014).

  5. 5.

    For example Table 1 provides information on the media spending of each retailer in our data set. The media spending is rank ordered from highest spend to lowest spend for each retailer. As this table shows the retailers in our experiment spent heavily on newspapers, TV, free standing inserts, radio and direct mail.

  6. 6.

    For example, “Savings account” is a broad search term and is also referred to as top of the funnel search term or a generic search term. “Citi Bank savings account” is a branded search term since it includes the brand name of the retailer.

  7. 7.

    For an illustrative example, consider a well known retailer–Walmart. According to Investor FAQ’s on Walmart’s web site Walmart serves more than 245 million customers and members weekly worldwide (http://stock.walmart.com/faqs/, accessed on 1/27/2015). According to Wal-Mart’s web site “Every month more than 60 percent of Americans shop at Walmart”, (http://news.walmart.com/news-archive/2013/05/04/walmart-launches-national-advertising-campaign-to-show-the-real-walmart, accessed on 1/27/2015).

  8. 8.

    Berndt (1991) makes a similar observation about seasonality and advertising.

  9. 9.

    http://www.predictivetechnologies.com/

  10. 10.

    In some experiments total store refers to total sales of all the categories in the store, in others it refers to a top level aggregation of categories that include the advertised categories.

  11. 11.

    Defined as the total store sales or total sale for an aggregation of advertised and related non-advertised categories.

  12. 12.

    Calculated based on data reported by the National Retail Federation.

  13. 13.

    https://www.iab.com/wp-content/uploads/2016/04/IAB_Internet_Advertising_Revenue_Report_FY_2016.pdf

  14. 14.

    https://www.emarketer.com/Article/US-Digital-Ad-Spending-Surpass-TV-this-Year/1014469

  15. 15.

    http://www.iab.net/media/file/IABInternetAdvertisingRevenueReportFY2012POSTED.pdf(last accessed on October 31, 2013).

  16. 16.

    We could not include three field experiments for whom the relevant legal agreements could not be located or verified. These results of these field experiments were reviewed by the authors. The results of these three field experiments are consistent with our findings and would not impact the main conclusions of our research.

  17. 17.

    Due to the disclosure policy we do not provide descriptives such as annual sales, type of retail format (e.g., department store versus specialty store), focal categories, or number of stores.

  18. 18.

    More details about APT and its clients can be obtained from the firm’s web site at www.predictivetechnologies.com

  19. 19.

    Defined in Section 4.2.1.

  20. 20.

    This was simply due to a data capture omission that occurred in the time lag it took to get the permissions obtained for this study.

  21. 21.

    APT expresses these as respectively, 0 to 0.05, 0.05 to 0.3, and 0.3 to 0.5. Note that 0.5 is the largest possible p-value in a one-sided test.

  22. 22.

    Ad spending data was not disaggregated to individual test stores.

  23. 23.

    See Maniadis et al. (2014) for a discussion of why we should be cautious of initial empirical findings in general.

  24. 24.

    On this point, see Ioannidis (2005). In Corollary 1 he notes that research findings are more likely true in scientific fields that undertake large studies. For example in controlled clinical trials in cardiology several thousand subjects are randomized (Yusuf et al. 1984).

  25. 25.

    Note that for a one tailed test, p-values are calculated as follows \(t_{n_{ij}-1}(1-p_{ij})=|\text {Lift}_{ij}|/\sigma _{ij}\).

  26. 26.

    Calculated based on data reported by the National Retail Federation.

  27. 27.

    The base line CAGR is 1.55%. If sales in the base year is 100, then one year later sales would be 101.55. Applying the incremental lift of 1.18% onto 101.55 we get 101.55 × 0.0118 = 1.20. So adding the incremental growth, sales one year later would be 101.55+1.20=102.75. So the new growth rate post search advertising on broad keywords is 2.75%. Compared to the pre-period growth rate of 1.55% the increase in growth was 77.5%. This back of the envelope analysis estimates growth based on a single year.

  28. 28.

    This estimate was obtained by the authors based on an analysis of the Profit and Loss statements of the retailers who participated in the experiments.

  29. 29.

    Since only p-value intervals are reported for many estimates it is not possible to generate a forest plot of the reported estimates.

  30. 30.

    For the midpoint alternative the p-values are fixed, treated as a known quantity. For the uniform prior alternative, there is an additional layer in the hierarchical model, and p-values are drawn from those prior. Therefore, the results in these two scenarios will not be identical.

References

  1. Agarwal, A., Hosanagar, K., & Location, M.S. (2008). Location, location: An analysis of profitability of position in online advertising markets. Journal of Marketing Research, 46(6), 1057–1073.

    Google Scholar 

  2. Babapulle, M.N., Joseph, L., Bélisle, P., Brophy, J.M., & Eisenberg, M.J. (2004). A hierarchical Bayesian meta-analysis of randomised clinical trials of drug-eluting stents. The Lancet, 364(9434), 583–591. https://scholar.google.com/scholar?hl=en&q=Babapulle%2C+M.N.%2C+Joseph%2C+L.%2C+Be%C2%B4lisle%2C+P.%2C+Brophy%2C+J.M.%2C+%26+Eisenberg%2C+M.J.+%282004%29.+A+hierarchical+bayesian+978+meta-analysis+of+randomised+clinical+trials+of+drugeluting+stents.+The+Lancet%2C+364%289434%29%2C+583%E2%80%93591.&btnG=&as_sdt=1%2C5&as_sdtp=.

    Article  Google Scholar 

  3. Bajari, P., & Hortacsu, A. (2003). The winner’s curse, reserve prices, and endogenous entry: Empirical insights from ebay auctions. RAND Journal of Economics, 329–355.

  4. Banjo, S., & FitzGerald, D. (2014). Stores confront new world of reduced shopper traffic. Wall Street Journal, 4. https://scholar.google.com/scholar?q=Banjo%2C+S.%2C+%26+FitzGerald%2C+D.+%282014%29.+Stores+confront+new+world+of+reduced+shopper+traffic.+Wall+Street+982+Journal.&btnG=&hl=en&as_sdt=0%2C5.

  5. Berndt, E.R. (1991). The practice of econometrics: classic and contemporary. Addison-Wesley Reading MA.

  6. Berry, D.A., Berry, S.M., McKellar, J., & Pearson, T.A. (2003). Comparison of the dose-response relationships of 2 lipid-lowering agents: a bayesian meta-analysis. American Heart Journal, 145(6), 1036–1045.

    Article  Google Scholar 

  7. Blake, T., Nosko, C., & Tadelis, S. (2013). Consumer heterogeneity and paid search effectiveness: A large scale field experiment. NBER Working Paper, 1–26.

  8. Brucks, M. (1985). The effects of product class knowledge on information search behavior. Journal of Consumer Research, 12(1), 1–16.

  9. Brynjolfsson, E., Dick, A.A., & Smith, M.D. (2010). A nearly perfect market?. QME, 8(1), 1–33.

    Google Scholar 

  10. DerSimonian, R., & Kacker, R. (2007). Random-effects model for meta-analysis of clinical trials: an update. Contemporary clinical trials, 28(2), 105–114.

    Article  Google Scholar 

  11. Eastlack, J.O. Jr, & Rao, A.G. (1989). Advertising experiments at the campbell soup company. Marketing Science, 8(1), 57–71.

    Article  Google Scholar 

  12. Farley, J.U., & Lehmann, D.R. (1986). Generalizing about market response models: Meta-analysis in marketing. Lexington: Lexington Books.

    Google Scholar 

  13. Farley, J.U., Lehmann, D.R., & Sawyer, A. (1995). Empirical marketing generalization using meta-analysis. Marketing Science, 14(3_supplement), G36–G46.

    Article  Google Scholar 

  14. FitzGerald, D. (2013). Retail sales on thanksgiving, black friday rose 2.3 reports. http://online.wsj.com/news/articles/SB10001424052702304017204579230801763930942.

  15. Fulgoni, G.M., & Morn, M.P. (2009). Whither the click? how online advertising works. Journal of Advertising Research, 49(2), 134.

    Article  Google Scholar 

  16. Greenland, S. (1994). Invited commentary: a critical look at some popular meta-analytic methods. American Journal of Epidemiology, 140(3), 290–296.

    Article  Google Scholar 

  17. Holmes, E. (2014). Why online retailers like Bonobos, Boden, Athleta mail so many catalogs. https://scholar.google.com/scholar?hl=en&q=Holmes%2C+E.+%282014%29.+Why+online+retailers+like+bonobos%2C+boden%2C+athleta+mail+so+many+catalogs.+Wall+Street+1007+Journal.&btnG=&as_sdt=1%2C5&as_sdtp=. Accessed 18 Apr 2014.

  18. Hong, H., & Shum, M. (2006). Using price distributions to estimate search costs. The RAND Journal of Economics, 37(2), 257–275.

    Article  Google Scholar 

  19. Imbens, G.W., & Rubin, D.B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.

  20. Ioannidis, J.P.A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.

    Article  Google Scholar 

  21. Joo, M., Wilbur, K.C., Bo, C., & Yi, Z. (2013). Television advertising and online search. Management Science, 60(1), 56–73.

    Article  Google Scholar 

  22. Levitt, S.D., & List, J.A. (2007). Viewpoint: On the generalizability of lab behaviour to the field. Canadian Journal of Economics/Revue canadienne d’é,conomique, 40(2), 347–370.

    Article  Google Scholar 

  23. Lewis, R.A., & Reiley, D.H (2014). Online ads and offline sales: measuring the effect of retail advertising via a controlled experiment on Yahoo! Quantitative Marketing and Economics, 12.3, 235–266. https://scholar.google.com/scholar?q=Randall+lewis+and+david+reiley&btnG=&hl=en&as_sdt=0%2C5.

  24. Lieber, E., & Syverson, C. (2012). Online versus offline competition. Oxford Handbook of the Digital Economy, 189–223. https://scholar.google.com/scholar?hl=en&as_sdt=0,5&q=Lieber,+E.,+%26+Syverson,+C.+(2012).+Online+vs+offline+competition.+Oxford+Handbook+of+the+Digital+1020+Economy,+189%E2%80%93223.

  25. Lodish, L.M., Abraham, M., Kalmenson, S., Livelsberger, J., Lubetkin, B., Richardson, B., & Stevens, M.E. (1995). How tv advertising works: a meta-analysis of 389 real world split cable tv advertising experiments. Journal of Marketing Research, 125–139.

  26. Maniadis, Z., Tufano, F., & List, J.A. (2014). One swallow doesn’t make a summer: New evidence on anchoring effects. The American Economic Review, 104 (1), 277–290.

    Article  Google Scholar 

  27. Naik, P.A., & Peters, K. (2009). A hierarchical marketing communications model of online and offline media synergies. Journal of Interactive Marketing, 23(4), 288–299.

    Article  Google Scholar 

  28. Narayanan, S., & Kalyanam, K. (2015). Position effects in search advertising and their moderators: A regression discontinuity approach. Marketing Science, 34.3, 388–407.

  29. Ratchford, B.T., Lee, M.-S., & Talukdar, D. (2003). The impact of the internet on information search for automobiles. Journal of Marketing Research, 193–209.

  30. Rutz, O.J., & Bucklin, R.E. (2011). From generic to branded: a model of spillover in paid search advertising. Journal of Marketing Research, 48(1), 87–102.

    Article  Google Scholar 

  31. Sahni, N.S (2015). Effect of temporal spacing between advertising exposures: Evidence from online field experiment. Quantitative Marketing and Economics, 13.3, 203–247.

  32. Sahni, N.S (2016). Advertising spillovers: evidence from online field experiments and implications for returns on advertising. Journal of Marketing Research, 53(4), 459–478.

  33. Sethuraman, R., Tellis, G.J., & Briesch, R.A. (2011). How well does advertising work? generalizations from meta-analysis of brand advertising elasticities. Journal of Marketing Research, 48(3), 457– 471.

    Article  Google Scholar 

  34. Shapiro, S., MacInnis, D.J., & Heckler, S.E. (1997). The effects of incidental ad exposure on the formation of consideration sets. Journal of consumer research, 24 (1), 94–104.

    Article  Google Scholar 

  35. Sutton, A.J., & Higgins, J. (2008). Recent developments in meta-analysis. Statistics in medicine, 27(5), 625–650.

    Article  Google Scholar 

  36. Tellis, G.J. (1988). The price elasticity of selective demand: A meta-analysis of economic models of sales. Journal of Marketing Research (JMR) 25(4).

  37. Varian, H.R. (2007). Position auctions. International Journal of Industrial Organization, 25(6), 1163– 1178.

    Article  Google Scholar 

  38. Verhoef, P.C., Neslin, S.A., & Vroomen, B. (2007). Multichannel customer management: Understanding the research-shopper phenomenon. International Journal of Research in Marketing, 24(2), 129– 148.

    Article  Google Scholar 

  39. Yang, S., & Ghose, A. (2010). Analyzing the relationship between organic and sponsored search advertising: Positive, negative, or zero interdependence?. Marketing Science, 29(4), 602–623.

    Article  Google Scholar 

  40. Yusuf, S., Collins, R., & Peto, R. (1984). Why do we need some large, simple randomized trials?. Statistics in medicine, 3(4), 409–420.

    Article  Google Scholar 

  41. Zettelmeyer, F., Morton, F.S., & Silva-Risso, J. (2006). How the internet lowers prices Evidence from matched survey and automobile transaction data. Journal of Marketing Research, 43(2), 168–181.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kirthi Kalyanam.

Appendix

Appendix

Degrees of freedom of the t-test, for inferring standard errors

We inferred standard errors from reports of the estimated mean and p-value from a one-sided t-test. The degrees of freedom (df) of the t-test was {number of test stores + number of unique control stores – 2}, which was not available. (Recall that control stores could be matched to more than one test store.) Lacking the actual df, we used for n i j the total number of stores in the chain, which is necessarily larger. The inferred standard error is not sensitive to this choice for df in the range used here, as we now show.

We show this for the combined lift estimate \(\overline {L}\). Note that

$$\begin{array}{@{}rcl@{}} & & p\text{-value}=\alpha \\ & \text{if and only if } & \text{Probability}(\overline{L}/\text{SE}>t_{\alpha,df})=\alpha\text{, because the test was one-sided} \\ & \text{if and only if } & \text{SE}=\overline{L}/t_{\alpha,df}, \end{array} $$
(8)

where t α, d f is the 100(1 − α) percentile of the standard t distribution with df degrees of freedom. As df is increased, t α, d f decreases, so SE increases. Thus, by using a df value larger than the correct but unknown df, we infer standard errors that are larger than the true values, which is conservative in the sense of acting as if each experiment was less informative than it actually was.

Further, t α, d f is insensitive to df in the range we used; thus, the inferred standard errors are insensitive. For example, t α, d f is 1.708 for df = 25; 1.671 for df = 60; 1.658 for df = 120; 1.645 for df = (i.e., the normal distribution). Over this range of df, t α, d f decreases by 3.7%, and the ends of this range differ by far more than our conservative df differs from the actual but unknown df.

Standard errors for return on ad spending (ROAS)

The data available for this study did not include standard errors for individual experiments’ ROAS estimates, and the corresponding p-values were reported in ranges. To avoid excessive conservatism in inferring standard errors for the ROAS estimates, we computed standard errors for ROAS as follows. The derivation has two steps. The first conditions on each experiment’s average (over test stores) expected sales (i.e., acts as if average expected sales is known), while the second step removes that conditioning (i.e., treats average expected sales as a random variable).

Conditioning on an experiment’s average expected sales \(\overline {ES}\), from Section 6.1,

$$\begin{array}{@{}rcl@{}} \overline{L} & = & \frac{1}{n\overline{ES}}\sum\limits_{s}(AS_{s}-ES_{s}) \\ \text{and }ROAS & = & \frac{1}{ADS}\sum\limits_{s}(AS_{s}-ES_{s}). \end{array} $$
(9)

Assume that \({\sum }_{s}(AS_{s}-ES_{s})\sim N(\mu ,\tau ^{2})\); then

$$ \begin{array}{ll}\overline{L} & \sim N\left( \frac{\mu}{n\overline{ES}},\frac{\tau^{2}}{n^{2}\overline{ES}^{2}}\right)\text{ and}\\ ROAS & \sim N\left( \frac{\mu}{ADS},\frac{\tau^{2}}{ADS^{2}}\right). \end{array} $$
(10)

We have \(\hat {\sigma }_{L}^{2}\), the estimated variance (square of standard error) for an experiment’s lift estimate; we want an estimate for the variance of ROAS, \(\hat {\sigma }_{R}^{2}\). From Eq. 10,

$$ \begin{array}{ll}\hat{\sigma}_{L}^{2} & =\frac{\hat{\tau}^{2}}{n^{2}\overline{ES}^{2}}\\ \text{so that }\hat{\tau}^{2} & =n^{2}\overline{ES}^{2}\hat{\sigma}_{L}^{2},\\ \text{Therefore }\hat{\sigma}_{R}^{2} & =\frac{\hat{\tau}^{2}}{ADS^{2}}\\ & =\frac{n^{2}\overline{ES}^{2}}{ADS^{2}}\hat{\sigma}_{L}^{2}. \end{array} $$
(11)

Conditional on \(\overline {ES}\), then, we can estimate the standard error of ROAS. However, \(\overline {ES}\) is itself a random variable because test stores were randomly sampled. To remove this conditioning, note that by Eq. 9, conditional on the vector of expected sales for the test stores, ES,

$$ ROAS|\boldsymbol{ES}\sim N\left( \frac{n\overline{ES}}{ADS}\theta,\frac{n^{2}\overline{ES}^{2}}{ADS^{2}}{\sigma_{L}^{2}}\right). $$
(12)

where \(\theta =E(\bar {L})\). The variance of ROAS is thus

$$ \begin{array}{ll}\operatorname{Var}(ROAS) & =\operatorname{Var}[\operatorname{E}(ROAS|\boldsymbol{ES})]+\operatorname{E}[\operatorname{Var}(ROAS|\boldsymbol{ES})]\\ & =\operatorname{Var}\left( \frac{n\overline{ES}}{ADS}\theta\right)+\operatorname{E}\left( \frac{n^{2}\overline{ES}^{2}}{ADS^{2}}{\sigma_{L}^{2}}\right)\\ & =\frac{n^{2}\theta^{2}}{ADS^{2}}\operatorname{Var}(\overline{ES})+\frac{n^{2}{\sigma_{L}^{2}}}{ADS^{2}}\operatorname{E}(\overline{ES}^{2})\\ & =\frac{n^{2}\theta^{2}}{ADS^{2}}\operatorname{Var}(\overline{ES})+\frac{n^{2}{\sigma_{L}^{2}}}{ADS^{2}}\{\operatorname{Var}(\overline{ES})+[\operatorname{E}(\overline{ES})]^{2}\}\\ &=\frac{n^{2}{\sigma_{L}^{2}}}{ADS^{2}}[\operatorname{E}(\overline{ES})]^{2}+\frac{n^{2}(\theta^{2}+{\sigma_{L}^{2}})}{ADS^{2}}\operatorname{Var}(\overline{ES}). \end{array} $$
(13)

In the last line above, the first term is the conditional estimate of \({\sigma _{R}^{2}}\) with \(\operatorname {E}(\overline {ES})\) substituted for \(\overline {ES}\). Thus the second term is the primary consequence of removing the conditioning on \(\overline {ES}\).

The data available for this study included, for each experiment, the sample mean and standard deviation of the E S s , so we can estimate \(\operatorname {Var}(\overline {ES})\) if we can make a plausible assumption about the correlation between E S s and E S t for test stores s and t. To do this, we assumed the E S s were exchangeable with correlation r between each pair of test stores. Therefore,

$$ \begin{array}{ll}\operatorname{Var}(\overline{ES}) & =\frac{1}{n^{2}}\operatorname{Cov}\left( {\sum}_{s=1}^{n}ES_{s},{\sum}_{t=1}^{n}ES_{t}\right)\\ & =\frac{1}{n^{2}}\left( n\sigma_{ES}^{2}+n(n-1)r\sigma_{ES}^{2}\right)\\ & =\sigma_{ES}^{2}\left( \frac{1}{n}+\frac{n-1}{n}r\right). \end{array} $$
(14)

We roughly estimated the correlation coefficient r as n s t /10, where n s t is the number of control stores shared by the s th and t th test stores. Simulations in which control stores were assigned at random to test stores gave the estimates of r in Table 9.

A standard error for each experiment’s ROAS was obtained by substituting Table 9’s estimated r’s and other known quantities (e.g., standard errors for sales lift) into Eq. 13. The unconditional and conditional estimates of \({\sigma _{R}^{2}}\) (with \(\operatorname {E}(\overline {ES})\) substituted for \(\overline {ES}\)) were very similar. Meta-analyses using the two different estimates gave very similar results, so Section 6.2.2 presents the simpler analysis using the conditional estimates of \({\sigma _{R}^{2}}\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kalyanam, K., McAteer, J., Marek, J. et al. Cross channel effects of search engine advertising on brick & mortar retail sales: Meta analysis of large scale field experiments on Google.com. Quant Mark Econ 16, 1–42 (2018). https://doi.org/10.1007/s11129-017-9188-7

Download citation

Keywords

  • Search engine advertising
  • Cross channel impact
  • Field experiments
  • Bayesian meta analysis
  • Retail marketing
  • Advertising
  • Retail sales
  • Replication

JEL Classification

  • M31
  • M37
  • L86
  • C39
  • C21
  • C11
  • L81