Marketing response and temporal aggregation

This paper deals with inferring key parameters on marketing response at a true high frequency while data are partly or fully available only at a lower frequency aggregate levels. The familiar Koyck model turns out to be very useful for this purpose. Assuming this model for the high-frequency data makes it possible to infer the high-frequency parameters from modified Koyck type models when lower frequency data are available. This means that inference using the Koyck model is robust to temporal aggregation.


Introduction
This paper makes the case that the familiar Koyck (1954) or geometric lag model, which is often used to model marketing response 1 to for example advertising, is robust to temporal aggregation. The assumption is that there is an adequate Koyck model for high-frequency data. The focus is on four cases, and these are the case (a) of high frequency, say, sales, and advertising, the case (b) of high-frequency sales and aggregated advertising, the case (c) of aggregated sales and higher frequency advertising and the well-known case 2 (d) of both low-frequency sales and low-frequency advertising. The two cases (b) and (c) which involve one of the two variables being aggregated have not been studied before.
Temporal aggregation is relevant for the analysis of marketing response, in particular when examining carry-over effects, see also Leone (1995). When the true marketing response process occurs at a high-frequency level, of say hours, and the data are only available at a lower frequency level, say days, the estimation results from models for lower frequency data cannot be one-to-one translated to the true higher frequency process. This point was already made 3 in the seminal paper by Clarke (1976), who argued that the model for aggregated data must differ from the model for the high-frequency data.
A key issue for temporal aggregation is that aggregation makes the model to change. That is, if one fits an econometric time series model, like the Koyck model, to sales and advertising data, then the model changes due to aggregation. 4 Clarke (1976) was the first to recognize that not modifying the model leads to biased results and incorrect advertising duration intervals. In the present paper, it is demonstrated that the Koyck model is very useful even when the data are all or partly available in temporally aggregated format. In fact, it is shown that the Koyck model is robust to such aggregation. The focus is on just two variables for 1 Nowadays, the model is also used to estimate such effects for various marketing variables like satisfaction, quality, distribution, and online chatter on a variety of dependent variables like sales, market shares, and even earning and stock market returns. Examples of studies using versions of the Koyck model are Berkowitz et al. (2001), Breuer et al. (2011, Chessa and Murre (2007), Dekinder and Kohli (2008), Graham and Frankenberger (2011), Herrington and Dempsey (2005), Kappe et al. (2014), Prabhu et al. (2005), Tellis et al. (2000), Yoo and Mandhachitara (2003), Farace et al. (2019), andVillarroel Ordenes et al. (2019). Recent studies using the Koyck model in other disciplines than marketing are Mulchandani et al. (2019), and Acar and Temiz (2017). 2 See Kanetkar et al. (1986), Leone (1983, 1986), and Weinberg and Weiss (1982), and recently Tellis and Franses (2006). 3 Today still, the impact of temporal aggregation acquires much attention in the marketing literature, see for example, Calli et al. (2012), Kappe et al. (2014), Lambrecht and Tucker (2013), Sethuraman et al. (2011), Sood et al. (2014, Tirunillai and Tellis (2012), and Xi et al. (2014). 4 See, for example, exercise 3.3 in Franses, van Dijk and Opschoor (2014, p. 75) which concerns the case where an autoregression of order 1 becomes an autoregressive moving average model of order (1,1). A classic study in this context is Working (1960). notational convenience, but extensions to more variables are conceptually straightforward.
The outline of the paper is as follows. Section 2 presents the Koyck model. Section 3 discusses four variants of temporal aggregation. Each variant is illustrated with a real-life example. The data source is Tellis et al. (2000). A scatter plot of sales against advertising in Fig. 2 suggests that there is a positive correlation between the two variables. Mathematical derivations are delegated to the technical appendix. Section 4 presents the results of some simulation experiments, and Sect. 5 concludes.

The Koyck model
The Koyck (1954) model, or geometric lag model, yields insights in the key parameters on marketing response. When sales are denoted as y t , and advertising (or any other marketing-mix variable) as x t , and L is the familiar lag operator with and deleting the intercept for notational convenience, the original Koyck model reads as follows: where t is an uncorrelated white noise process with mean 0 and variance 2 and | | < 1 . When using the L operator, it reads as follows: Hence, the infinite regression model in (1) can be written as follows: This expression suggests what became known as the "Koyck transformation," i.e., when both sides of (4) are multiplied with 1 − L , one obtains The Koyck model has an autoregressive term y t−1 , a term involving current advertising x t and a so-called moving average term t − t−1 . From the model parameters, one can derive the short-run (or current or direct) effect of advertising, using the partial derivative: The total (or carry-over) effect of advertising follows from As the focus is on the direct effect and the carry-over effect, in practice, one usually considers the unrestricted version of (5), i.e., (2) (3) (4) (7) where and are not restricted to be equal from the start. 5 Suppose that an analyst knows that the advertising response process works at the high-frequency data level, denoted as t, with t = 1, 2, … , N . For example, t can be associated with weeks within a period of 4 weeks. Suppose further that the sales and advertising data can be available after temporal aggregation at a lower frequency, denoted as T. For example, weekly data could be aggregated to fourweekly data. To introduce some formal notation, consider the polynomial S(L) defined as which amounts to a temporal aggregation of the high-frequency data over K periods. In the case of weeks and hours, K would be equal to 168. Hence, T = 1, 2, … , N K . Further, consider the notion of skip sampling at every Kth observation at frequency t. This means that, for t equal to K, 2 K, 3 K, and so on, there is an observation at the lower frequency T, with = 1, 2, 3, … , N K . For the hourly case, where the first hour of the week can be 1.00AM on Monday morning, then K = 168 concerns 12.00PM on Sunday evening.

Four cases of aggregation
In relation to the frequencies t and T, there are now four cases of potential interest and practical relevance.

High-frequency sales and high-frequency advertising
The first and most simple case is when the analyst has data on sales and advertising both at the high frequency t. A Koyck model as in (8) can be estimated using Maximum Likelihood for the illustrative data, where now also an intercept is included. This results in the following estimates (with estimated standard errors in parentheses) of the two key parameters: The R 2 of this model is 0.682. The short-run effect is 0.279, and the total long-run effect is Suppose now that this model for the weekly data corresponds with the true frequency of the sales and advertising relationship. The topic of interest in this paper is that it can happen that one does not have the weekly data, but for example, only four-weekly data. This can occur when commercials are only broadcasted once per four weeks, while sales are measured per week. Or, the other way around that commercials are broadcasted once per week, while sales are only measured at a four-weekly level.
It might be the case that one has (a) weekly data for both sales and advertising as above, (b) weekly data on sales but only four-weekly data for advertising, (c) four-weekly data on sales and weekly data on advertising, or (d) fourweekly data for both sales and advertising. The question is now whether in cases (b), (c), and (d), one can estimate the parameters concerning the true high, weekly, frequencyrelating sales with advertising. The key assumption is of course that (a) amounts to the correct frequency, but, note again, this is here for illustration only. Whether it is true for the illustration data is unknown, and therefore, later on a simulation experiment will be carried out. In the high-frequency case, skip sampling will lead to suboptimal inference in terms of efficiency as information will be lost. Consider and K periods later: Skip-sampling towards the frequency implied by K would allow the inclusion of y t−1 and x t in the model, but not the moving average term with t−1 , t+K−1 , and so on. This, thus, leads to bias in estimating . So, when all high-frequency data are available, it is recommended to consider a model for the high-frequency data and not to temporally aggregate the high-frequency data. See also Tellis and Franses (2006) for evidence based on simulations.

High-frequency sales and low-frequency advertising
The second case (b) is where sales are observed at frequency t, while advertising is observed at the lower frequency T after aggregating over K units. In the Appendix, it is derived that the modified Koyck model becomeŝ The parameters in (10) can be estimated using the unrestricted maximum likelihood method. 6 Note that the parameters in (10) are estimated for N/K observations instead of N, and temporal aggregation means loss of efficiency. For the running example with the data in Figs. 1 and 2, the key estimation results for (10) (with an intercept) for 25 effective observations are The R 2 of this model is 0.974. The short-run effect is 0.363, whereas the total long-run effect at the high frequency is We see that this long-run effect is a bit larger than the "true" high-frequency effect of 4.543, while the "true" short-run effects are close to each other.

Low-frequency sales and high-frequency advertising
The third case 7 is when sales are observed at frequency T, after aggregating over K units, while advertising is observed at the higher frequency t. In the Appendix, it is derived that the modified Koyck model becomes When the unrestricted version of this model is estimated, that is, when we replace T − K T−1 in (12) by T − T−1 , then, for the running example, the estimation results obtained using iterative Maximum Likelihood for (11) are (10) Y T = S(L)y t+K−1 + X T + u T − u T−1 . (11) which gives ̂= 4 √ 0.947 = 0.986 . The R 2 of this model is 0.769. The short-run effect is, however, not significant. This may perhaps reflect that the weekly frequency cannot be assumed to be the true frequency.

Low-frequency sales and low-frequency advertising
Finally, the fourth case (d) arises where both sales and advertising are observed only after temporal aggregation at the low-frequency T. Tellis and Franses (2006) conveniently show that when it is assumed 8 that an advertising impulse occurs only once in each Kth period, and at the same time within that Kth period, (5) can become where 1 and 2 are functions of and , such that Tellis and Franses (2006) recommend that if aggregation is necessary, one should collect data such that the key assumption on the advertising process holds.
For the illustrative four-weekly data, the key estimation results for (12), where we replace T − K T−1 in (12) by T − T−1 , are which gives ̂= 4 √ 0.902 = 0.975 . The R 2 of this model is 0.796. The short-run effect = (1− )( 1 + 2) 1− 4 is 0.022, whereas the total long-run effect for the high-frequency data would be We now see that this long-run effect is about one fifth of the "true" high-frequency effect of 4.543. This result may perhaps be driven by the potential fact that the advertising impulse does not occur only once in each four-weekly period, at least for these illustrative data.

Simulation experiments
The empirical results in the previous section for just a single illustrative case in part seem to confirm that the Koyck model is robust to temporal aggregation, at least, after proper modification. Cases (c) and (d) did not work so well in the illustration, although the parameter is estimated at a consistent value across the four cases. As this is just a single empirical case with actual data, we now turn to simulation experiments.
The data-generating process (DGP) is with t ∼ N(0, 1) , y 0 = 0 , and x t is the absolute value of a draw from a N(0, 1) distribution halfway the K-period, and otherwise, it is zero. So, x t has a positive value after each K periods, where the size of the value can change over time. Tellis and Franses (2006) use the same format for the simulations. Here, we set K = 5. The sample size is set at 1000. The x t obtains a positive non-zero value at observation 3 within K = 5. The shortrun effect is set at = 5 , and we set the decay parameter at = 0.8. Hence, the true carry-over effect is 5 1−0.8 = 25. Table 1 reports the estimates of , , and 1− , when averaged over 100 replications, which is a reasonable amount for a sample size of 1000. Each time, as in the illustration before, we use an unrestricted version of the Koyck model, in terms of the moving average part. The simulation results seem to confirm the theory that the Koyck model is robust to temporal aggregation, for cases (b) and (d), although we observe some bias for case (c). This last bias seems to be caused by (on average) too small an estimate of and too small an estimate of .

Conclusion
This paper has shown that the Koyck (1954) model is a useful model to estimate advertising response at the true high frequency, even when the analyst has temporally y t = y t−1 + x t + t − t−1 aggregated sales data or temporally aggregated advertising data, or both. Inference using the Koyck model is robust to temporal aggregation. An empirical example, in part, and a simulation exercise, almost fully, emphasized the theoretical claims. Further research should concern more illustrations to see how the Koyck model fares in other empirical settings. Also, more theoretical results can be derived that in case, the Koyck model is extended to more than a single explanatory variable.
The practical implications are that, given a situation of partial or full temporal aggregation of the data, a practitioner can retrieve the proper current and carry-over effects of marketing efforts on marketing response at the high frequency.

Appendix
Case ( The Y T is the temporally aggregated sales variable in a K-period interval, where S(L)y t+K−1 can simply be constructed from the available high-frequency sales data at time t = K, 2K, 3K , and so on, and where u T − u T−1 is a first-order moving average process with mean zero and where u T has variance 2 u . Case (c): sales are observed at frequency T, after aggregating over K units, while advertising is observed at the higher frequency t.
To see how this translates to the Koyck model, one can replace y t−1 on the right-hand side of (5) by and repeat this K times to obtain Y T = S(L)y t+K−1 + X T + u T − u T−1 . y t−1 = y t−2 + x t−1 + t−1 − t−2 y t = K y t−K + 1 + L + 2 L 2 + ⋯ + K−1 L K−1 x t + t − K t−K Table 1 Average estimates of , , , and 1− , when averaged over 100 replications, K = 5 , sample size is 1000 In the data-generating process, we set = 0.8 and = 5.
Cases 1 + 2 in case (d) Multiplying both sides of this last expression with S(L) gives Skip sampling at each Kth observation results in a model for the temporally aggregated data like With high-frequency data on advertising, the analyst can rely on an iterative Maximum Likelihood method to alternate between estimating and creating the relevant observations for 1 + L + 2 L 2 + ⋯ + K−1 L K−1 S(L)x t .