1 Introduction

The advertising industry plays a prominent role in driving consumer behaviour in several fields, e.g., food advertising (Harris et al. 2009) and public service advertising (O’Keefe and Reid 2020). In general, the literature is rich in studies analyzing the relationship that advertising has with the number of sales, the brand purchase choices, and the psychology (see, e.g., Raj 1982; Krishnamurthi and Raj 1985; Guadagni and Little 1983; Snyder and DeBono 1985; Putrevu 2001).

However, there are few specific quantitative studies on advertising costs. For instance, Danaher and Rust (1996) consider advertising as an investment and find the level of expenditure that maximises the return on investment. This is maximization constrained to a set budget without deepening the timing of an advertising campaign. Martín-Oliver and Salas-Fumás (2008) study how investment in advertising, together with other inputs as labour, physical, and IT capital, affects the demand for deposit and loan bank services through a static model of profit maximisation. Therefore, we aim to estimate the value of a contract between a television network and a company willing to advertise its business on this network without budget constraints and valuing advertising airtime. To this extent, we compute this value by employing a real options approach, which is widely discussed in the literature (see, e.g., Lo and Lan 2008).

A key aspect of the real option pricing method is that it is not possible to build replicating portfolios because the assets are not tradable. However, it can be possible to relate the valuation of real projects to quoted assets with the same level of risk as the non-traded ones (see Borison (2005) and Smith and Nau (1995)). We can identify four main pricing methods. First, the Black and Scholes (B&S) option pricing model (Black and Scholes 1973) used for real option in McDonald and Siegel (1986). The second most popular method is the so-called Binomial Option Pricing Model (BOPM), which limits the underlying asset movement to two choices: up by a factor u and down by a factor d. The first contribution of the BOPM for financial options is by Cox et al. (1979). However, Kellogg and Charnes (2000) used decision trees and binomial lattice methods to value biotech companies as the sum of the value of their drug development programs. Also Di Bari et al. (2023) study the impact of polarity score on real option valuation by using a binomial approach. An alternative approach is to employ the Monte Carlo method for valuing options (see, e.g., Boyle et al. 1997; Glasserman 2004).Footnote 1 In particular, for the evaluation of real options with the Monte Carlo method see Abdel Sabour and Poulin (2006). Finally, we note the probabilistic present worth analysis, developed by Carmichael et al. (2011), who computes the present value as the sum of all the discounted cash flow at each period using only expected values and variances. Moreover, the same author presented a complete collection of plain and compound real options (Carmichael 2016).

Each of these pricing methods has strengths and weaknesses. For example, the B&S model presents the following limitations: the arbitrage principle is not applicable to real options as real assets are not traded; the Geometric Brownian motion may be a suitable model for stock price movements, but its applicability to real assets is not straightforward (Damodaran 1999; Newton et al. 2001); the computation of volatility in the real options analysis is difficult (Amram and Kulatilaka 1998; Kodukula and Papudesu 2006; Lewis et al. 2008); contrary to the real options, financial options are usually exercised instantaneously (Damodaran 1999; Lewis et al. 2008); while the decision pertaining to a financial option cannot change the value of the underlying asset, the same is not true for real options (Newton et al. 2001). The BOPM method is useful for pricing vanilla options with early exercise opportunities because of its accuracy and rapid convergence. However, it can be quite difficult to adapt to complex situations as stated in Fadugba et al. (2012). Contrary to the previous methods, the Monte Carlo method can handle any type of option with different complexities. However, its weakness resides in the required computational effort, which can be intense in more complex situations. Finally, the probabilistic present worth can be used with any kind of distribution for the asset value. However, it was employed using only the first two moments leading to an estimate that can be biased and must be improved.

In our work, we adopt the present worth method to compute the value of a contract between a television network and a company willing to advertise on this network. Contrary to the option valuation in Carmichael (2016), we employ a Markov chain reward to model the number of viewers of the advertising, thus including a time dependence into the computation of the present value. Moreover, we do not limit the analysis to the use of expected values and variances of the cash flows, but we also include higher moments. Then, we obtain the probability distribution with the maximum entropy approach by Mead and Papanicolaou (1984).

Specifically, we consider an option that gives the possibility, in exchange for a price paid today, to exercise the option at a future time to choose between airing the advertisement or not.

Our methodology is based on Markov chains due to the high versatility and robustness demonstrated in describing a variety of real-world problems (see, e.g., D’Amico and Villani 2021; Petronio et al. 2014; Kalligeris et al. 2021; De Blasis 2020. Other examples of works that have used Markov Chains in the context of options are D’Amico (2006), D’Amico (2008) and Duan et al. (2003).

The contribution of this work is twofold. On the one hand, we propose a new methodology to compute the pricing of an option in which the underlying asset is a television advertisement. In particular, this strategy requires the calculation of the expected value of the payoff function of a European call option depending on the sum of discounted cash flows for the pricing of a real option. To this extent, we compute the moments derived through a general Markovian reward process by identifying new recursive equations. On the other hand, by employing these moments as constraints within an entropy optimization problem, we can derive the density function of the underlying asset.

The methodology is applied to advertising data freely provided by the Italian Auditel website. The dataset comprises aggregated data referring to eight time slots for each month. The application to real data shows that results are consistent to variations of the general parameters of the model. Moreover, they show an increasing accuracy when using higher order moments, contrary to the classical B&S model which uses only two moments.

The paper is organised as follows. In Sect. 2, we present the mathematical model for option pricing. Then, in Sect. 3, an application case of the model is proposed and, finally, in Sect. 4, the conclusions of our work are given.

2 The model

We assume that company A wants to advertise its business and company B is a television network. To this end, they sign an option contract that gives the possibility to company A to advertise its business on a television owned by company B. At time \(t=0\) the option has a price equal to P and it can be exercised at the strike price K, at a specific time \(t_0\). Exercising the option allows company A to air multiple advertisements at future times \(t_0,t_1,\ldots ,t_n\). At maturity, cash flows are generated depending on the number of viewers \(N(t_i)\), tuned at time \(t_i\), through a function \(G:{\mathbb {N}}_0 \rightarrow \mathrm{I\! R}^+\) where \(G=G(N(t_i))\). The cash flow related to the option is illustrated in Fig. 1 and note that \(t_i\) times are not necessarily equispaced.

Fig. 1
figure 1

Cash flow of the option

Obviously, the value assumed by \(G(N(t_i))\) depends on the estimate that company A will make.

Let r be a fixed discount rate.Footnote 2 We compute the sum of all the discounted cash flows as

$$\begin{aligned} Z=\sum _{i=1}^n e^{-(t_i-t_0)r} G(N(t_i)), \end{aligned}$$
(1)

and, since the considered option is a European call option, we can define the payoff function as

$$\begin{aligned} W=\max \left( Z-K, 0\right) . \end{aligned}$$
(2)

Finally, we can compute the option price as

$$\begin{aligned} {\mathbb {E}}[W]=\int _{K}^{\infty }z\cdot f_Z(z) dz, \end{aligned}$$
(3)

where \(f_Z\) is the density function of the random variable Z.

Unfortunately, we do not know the density function \(f_Z\). However, one possibility to find it is to employ the maximum entropy approach by Mead and Papanicolaou (1984), which requires the knowledge of the moments \({\mathbb {E}}[Z^k]\), \(k \in {\mathbb {N}}\).

2.1 Method of moments

In the classical moment problem, the positive density \(f_Z(z)\) is sought from the knowledge of its finite \(N+1\) power moments. However, we can find an infinite number of densities with the same \(N+1\) moments. To overcome this issue, Mead and Papanicolaou (1984) proposed the maximum-entropy approach that leads to the construction of a sequence of approximations. The solution to the problem coincides with the solution to the constrained maximization of the following function:

$$\begin{aligned} S(f_Z(z))=-\int _{a}^{b}[f_Z(z)\ln {f_Z(z)}-f_Z(z)]dz+\sum _{k=0}^N \lambda _k \left( \int _{a}^{b} z^kf_Z(z)dz-\mu _k \right) , \end{aligned}$$

where a and b are the extremes of the distribution support, \(\mu _k\), with \(0\le k\le N\), are the first \(N+1\) true moments, and \(\lambda _n\) are the Lagrange multipliers. Setting the partial derivatives of the function equal to zero, \(\frac{\partial S}{\partial f_Z(z)}=0\) and \(\frac{\partial S}{\partial \lambda _k}=0\), the authors obtain the maximum of the entropy, with general solution

$$\begin{aligned} f_Z(z)=e^{-\lambda _0-\sum _{k=1}^N \lambda _kz^k}. \end{aligned}$$

Now, to find the maximum entropy solution, they solved the following system of \(N+1\) equations,

$$\begin{aligned} \int _{a}^{b}z^kf_Z(z)dz=\mu _k, \quad k=0,1,\ldots ,N. \end{aligned}$$

2.2 Moments computation

In order to compute the moments of the random variable Z, we assume that the number of viewers behaves like a Markov reward process. The related Markov chain \(\{J_{n}\}_{n\in {\mathbb {N}}}\) has a state space \(E=\left\{ 1, 2,\ldots ,s\right\} \) indicating the regimes of the viewers. The chain respects the following Markov property:

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}(J_{i+1}=j_{i+1}| J_{i}=j_i,J_{i-1}=j_{i-1},\ldots ,J_0=j_0 )\\&\quad ={\mathbb {P}}(J_{i+1}=j_{i+1}|J_{i}=j_i)\\&\quad =p_{j_i,j_{i+1}},\\ \end{aligned} \end{aligned}$$
(4)

where \(p_{j_i,j_{i+1}}\) represents the probability to reach state \(j_{i+1}\) starting from state \(j_i\). These probabilities, which together make up the transition probability matrix \({\textbf{P}}\), are easily calculated by counting how many times the chain reaches the state \(j_{i+1}\) from the state \(j_i\) divided by the total number of times the chain reaches the state \(j_{i+1}\). Each state of the Markov process indicates at any time the listening regime. For instance, if \(s=3\), \(J_i=1\) could be associated with a low level of audience, \(J_i=2\) could represent a medium level and \(J_i=3\) will stand for a high audience level.

Conditionally to the regime \(J_i=l\) with \(l\in E\), the random number of listeners has specific cumulative distribution functions:

$$\begin{aligned} F_l(y)={\mathbb {P}}(N(t_i)\le y \mid J_{i}=l), \,\,\,\,\,\,\,\,\,\,\,\,\, \forall i\in {\mathbb {N}}_0. \end{aligned}$$
(5)

We define the k-th moment of Z, conditioned to the starting state \(J_{t_{0}}=j_0\) of our process, referred to \({\textbf{t}}_0^n =(t_0,t_1,\ldots ,t_n)\) that is the vector of times in which the advertisement will be aired, as follows:

$$\begin{aligned} M_{j_0}^{(k)}({\textbf{t}}_0^n):={\mathbb {E}}\left[ \left( \sum _{r=1}^n e^{-(t_r-t_0)r} G(N(t_r))\right) ^k \bigg |J_{t_{0}}=j_0 \right] . \end{aligned}$$
(6)

We split the previous equation into two addends as follows,

$$\begin{aligned} M_{j_0}^{(k)}({\textbf{t}}_0^n)={\mathbb {E}}\left[ \left( e^{-r(t_1-t_0)}G(N(t_1))+\sum _{a=2}^n e^{-r(t_a-t_0)} G(N(t_a))\right) ^k \bigg |J_{t_{0}}=j_0 \right] . \end{aligned}$$
(7)

Then, applying Newton’s binomial formula, we obtain

$$\begin{aligned} M_{j_0}^{(k)}({\textbf{t}}_0^n)=\sum _{m=0}^{k}\left( {\begin{array}{c}k\\ m\end{array}}\right) {\mathbb {E}}\biggl [ \left( e^{-r(t_1-t_0)}G(N(t_1))\right) ^m\cdot \left( \sum _{a=2}^n e^{-r(t_a-t_0)} G(N(t_a))\right) ^{k-m} \bigg |J_{t_{0}}=j_0 \biggr ], \end{aligned}$$
(8)

and we can consider the following three expected values,

$$\begin{aligned} \begin{aligned} M_{j_0}^{(k)}({\textbf{t}}_0^n)&= {\mathbb {E}}\left[ \left( e^{-r(t_1-t_0)}G(N(t_1))\right) ^k|J_{t_{0}}=j_0\right] \\&\quad +\sum _{m=1}^{k-1}\left( {\begin{array}{c}k\\ m\end{array}}\right) {\mathbb {E}}\left[ \left( e^{-r(t_1-t_0)}G(N(t_1))\right) ^m\left( \sum _{a=2}^n e^{-r(t_a-t_0)}G(N(t_a))\right) ^{k-m} \bigg |J_{t_{0}}=j_0 \right] \\ {}&\quad +{\mathbb {E}}\left[ \left( \sum _{a=2}^n e^{-r(t_a-t_0)}G(N(t_a))\right) ^{k} \bigg |J_{t_{0}}=j_0 \right] . \end{aligned} \end{aligned}$$
(9)

Now, the previous expected values are calculated. As far as the first addend is concerned, the tower property of conditional expectation gives

$$\begin{aligned} {\mathbb {E}}\left[ {\mathbb {E}} \left[ \left( e^{-r(t_1-t_0)}G(N(t_1))\right) ^k|J_{t_{1}}, J_{t_{0}}=j_0\right] |J_{t_{0}}=j_0 \right] . \end{aligned}$$
(10)

We proceed to compute the internal expected value as,

$$\begin{aligned} e^{-rk(t_1-t_0)} \int _{0}^{\infty }(g(y))^kf_{J_1}(y)dy. \end{aligned}$$
(11)

This means that the value assumed by the first addend of Eq. (9) is

$$\begin{aligned} \sum _{j_1\in E} p_{j_0,j_1}^{(t_1-t_0)}e^{-rk(t_1-t_0)}\int _{0}^{\infty }(g(y))^kf_{j_1}(y)dy. \end{aligned}$$
(12)

The quantity \(p_{j_0,j_1}^{(t_1-t_0)}\) is the probability of reaching the state \(j_1\) in the time \(t_1\) starting from state \(j_0\) in the time \(t_0\), obtainable as the place element \(j_0,j_1\) of the \(t_1-t_0\) power of the transition matrix \({\textbf{P}}\). Similarly, we calculate the expected value of the second addendum of Eq. (9). However, this time, we condition on the chain state and the number of tuned people,

$$\begin{aligned} \sum _{m=1}^{k-1}\left( {\begin{array}{c}k\\ m\end{array}}\right) {\mathbb {E}} \biggl [ {\mathbb {E}}\biggl [ \left( e^{-r(t_1-t_0)}G(N(t_1))\right) ^m\cdot \left( \sum _{a=2}^n e^{-r(t_a-t_0)}G(N(t_a))\right) ^{k-m} \bigg |J_{t_{1}}, N(t_1) \biggr ]\bigg |J_{t_{0}}=j_0 \biggr ]. \end{aligned}$$
(13)

The deterministic part comes out of the innermost expected value,

$$\begin{aligned} \sum _{m=1}^{k-1}\left( {\begin{array}{c}k\\ m\end{array}}\right) {\mathbb {E}} \biggl [ \left( e^{-r(t_1-t_0)}G(N(t_1))\right) ^m\cdot {\mathbb {E}}\biggl [ \left( \sum _{a=2}^n e^{-r(t_a-t_0)}G(N(t_a))\right) ^{k-m} \bigg |J_{t_{1}}, N(t_1) \biggr ]\bigg |J_{t_{0}}=j_0 \biggr ]. \end{aligned}$$
(14)

Now, the internal expected value is computed as follows:

$$\begin{aligned}&{\mathbb {E}}\left[ \left( \sum _{a=2}^n e^{-r(t_a-t_0)}G(N(t_a))\right) ^{k-m} \bigg |J_{t_{1}}, N(t_1) \right] \\ {}&\quad ={\mathbb {E}}\left[ \left( \sum _{a=2}^n e^{-r(t_a-t_1+t_1-t_0)}G(N(t_a))\right) ^{k-m} \bigg |J_{t_{1}}, N(t_1) \right] \\ {}&\quad =e^{-r(k-m)(t_1-t_0)}{\mathbb {E}}\left[ \left( \sum _{a=2}^n e^{-r(t_a-t_1)}G(N(t_a))\right) ^{k-m} \bigg |J_{t_{1}}, N(t_1) \right] \\&\quad =e^{-r(k-m)(t_1-t_0)}M_{j_{1}}^{(k-m)}({\textbf{t}}_1^n). \end{aligned}$$

By substituting the previous value into the second addend of Eq. (9), we obtain

$$\begin{aligned} \begin{aligned}&\sum _{m=1}^{k-1}\left( {\begin{array}{c}k\\ m\end{array}}\right) {\mathbb {E}} \left[ \left( e^{-r(t_1-t_0)}G(N(t_1))\right) ^m e^{-r(k-m)(t_1-t_0)} M_{j_{1}}^{(k-m)}({\textbf{t}}_1^n)\bigg |J_{t_{0}}=j_0 \right] \\&\quad =\sum _{m=1}^{k-1}\left( {\begin{array}{c}k\\ m\end{array}}\right) e^{-r(t_1-t_0)m}\sum _{j_1\in E}p_{j_0,j_1}^{(t_1-t_0)}\int _0^\infty (g(y))^m f_{j_1}(y)dy\\ {}&\quad \cdot e^{-r(k-m)(t_1-t_0)}\cdot M_{j_{1}}^{(k-m)}({\textbf{t}}_1^n). \end{aligned} \end{aligned}$$

After performing the necessary calculations, we arrive at the following form,

$$\begin{aligned} \sum _{m=1}^{k-1}\left( {\begin{array}{c}k\\ m\end{array}}\right) e^{-rk(t_1-t_0)}\sum _{j_1\in E} p_{j_0,j_1} ^{(t_1-t_0)}M_{j_1}^{(k-m)}({\textbf{t}}_1^n)\int _{0}^{\infty }(g(y))^m f_{j_1}(y)dy. \end{aligned}$$
(15)

The final step is to determine the expected value of the third addend of Eq. (9) that is equal to

$$\begin{aligned}&{\mathbb {E}}\left[ {\mathbb {E}}\left[ \left( \sum _{a=2}^n e^{-r(t_a-t_0)} G(N(t_a))\right) ^{k}\bigg |J_{t_{1}} \right] \bigg |J_{t_{0}}=j_0 \right] \\ {}&\quad ={\mathbb {E}}\left[ {\mathbb {E}}\left[ \left( \sum _{a=2}^n e^{-r(t_a-t_0+t_1-t_1)} G(N(t_a))\right) ^{k}\bigg |J_{t_{1}} \right] \bigg |J_{t_{0}}=j_0 \right] \\ {}&\quad =\sum _{j_1\in E}p_{j_0,j_1}^{(t_1-t_0)}e ^{-rk(t_1-t_0)}M_{j_{1}}^{(k)}({\textbf{t}}_1^n). \end{aligned}$$

The moment equation results from replacing the addends in (9),

$$\begin{aligned} \begin{aligned} M_{j_0}^{(k)}({\textbf{t}}_0^n)&=e^{-rk(t_1-t_0)}\sum _{j_1\in E} p_{j_0,j_1}^{(t_1-t_0)}\int _{0}^{\infty }(g(y))^kf_{j_1}(y)dy\\&\quad +\sum _{m=1}^{k-1}\left( {\begin{array}{c}k\\ m\end{array}}\right) e^{-rk(t_1-t_0)}\sum _{j_1\in E} p_{j_0,j_1} ^{(t_1-t_0)}M_{j_1}^{(k-m)}({\textbf{t}}_1^n)\int _{0}^{\infty }(g(y))^m f_{j_1}(y)dy\\ {}&\quad +e^{-r(t_1-t_0)k}\sum _{j_1\in E} p_{j_0,j_1}^{(t_1-t_0)} M_{j_1}^{(k)}({\textbf{t}}_1^n). \end{aligned} \end{aligned}$$
(16)

At this point, it is possible to find the first moment by setting \(k=1\) and we obtain the following result,

$$\begin{aligned} \begin{aligned} M_{j_0}^{(1)}({\textbf{t}}_0^n)&=e^{-r(t_1-t_0)}\sum _{j_1 \in E} p_{j_0,j_1}^{(t_1-t_0)}\int _{0} ^{\infty }(g(y))f_{j_1}(y)dy\\&\quad +e^{-r(t_1-t_0)}\sum _{j_1\in E} p_{j_0,j_1}^{(t_1-t_0)} M_{j_1}^{(1)}({\textbf{t}}_1^n). \end{aligned} \end{aligned}$$

After some arrangements, we have

$$\begin{aligned} M_{J_{t_{0}}}^{(1)}({\textbf{t}}_{0}^n)=e^{-r(t_1-t_{0})}\sum _{j_{t_{1}}\in E}p_{J_{t_{0}},j_1}^{(t_1-t_{0})}\left[ \int _{0}^{\infty }(g(y))f_{j_1}(y)dy +M_{j_1}^{(1)}({\textbf{t}}_1^n) \right] . \end{aligned}$$
(17)

To solve the previous equation, we set \(t_0=t_{n-1}\), which is equivalent to say that \(t_1=t_{n}\). Thus, we obtain

$$\begin{aligned} M_{j_{n-1}}^{(1)}({\textbf{t}}_{n-1}^n)=e^{-r(t_n-t_{n-1})}\sum _{j_n\in E}p_{j_{n-1},j_n}^{(t_n-t_{n-1})}\left[ \int _{0}^{\infty }(g(y))f_{j_n}(y)dy +M_{j_n}^{(1)}({\textbf{t}}_n^n) \right] . \end{aligned}$$

The moment \(M_{j_n}^{(1)}({\textbf{t}}_n^n)\) is null by definition, which occurs when \(t_0=t_n\). Therefore, the formula becomes

$$\begin{aligned} M_{j_{n-1}}^{(1)}({\textbf{t}}_{n-1}^n)=e^{-r(t_n-t_{n-1})}\sum _{j_n\in E} p_{j_{n-1},j_n}^{(t_n-t_{n-1})} \int _{0}^{\infty }(g(y))f_{j_n}(y)dy. \end{aligned}$$
(18)

We note that all terms to the right of the equality symbol are known and that this calculation must be done for all possible values of \(j_{n-1}\in E\). Subsequently, we replace \(t_0=t_{n-2}\), thus we have

$$\begin{aligned} M_{j_{n-2}}^{(1)}({\textbf{t}}_{n-2}^n)=e^{-r(t_{n-1}-t_{n-2})}\sum _{j_{n-1}\in E} p_{j_{n-2},j_{n-1}}^{(t_{n-1}-t_{n-2})} \cdot \left[ \int _{0}^{\infty }(g(y))f_{j_{n-1}}(y)dy +M_{j_{n-1}}^{(1)}({\textbf{t}}_{n-1}^n) \right] . \end{aligned}$$

The moment \(M_{j_{n-1}}^{(1)}({\textbf{t}}_{n-1}^n)\) is known from the previous step, hence the formula becomes,

$$\begin{aligned} \begin{aligned} M_{j_{n-2}}^{(1)}({\textbf{t}}_{n-2}^n)&=e^{-r(t_{n-1}-t_{n-2})}\sum _{j_{n-1}\in E} p_{j_{n-2},j_{n-1}}^{(t_{n-1}-t_{n-2})} \Bigg [ \int _{0}^{\infty }(g(y))f_{j_1}(y)dy \\&\quad +e^{-r(t_n-t_{n-1})}\sum _{j_n\in E} p_{j_{n-1},j_n}^{(t_n-t_{n-1})} \int _{0}^{\infty }(g(y))f_{j_n}(y)dy \Bigg ]. \end{aligned} \end{aligned}$$
(19)

At this point, we can continue recursively setting \(t_0=t_{n-3}\) up to \(t_0\) to finally have the first moment.

To compute the second moment, we set \(k=2\) in (16) and we obtain

$$\begin{aligned} \begin{aligned} M_{j_0}^{(2)}({\textbf{t}}_0^n)&=e^{-2r(t_1-t_0)}\sum _{j_1\in E} p_{j_0,j_1}^{(t_1-t_0)}\int _{0}^{\infty }(g(y))^2f_{j_1}(y)dy\\&\quad +\sum _{m=1}^{2-1}\left( {\begin{array}{c}2\\ m\end{array}}\right) e^{-2r(t_1-t_0)}\sum _{j_1\in E} p_{j_0,j_1} ^{(t_1-t_0)}M_{j_1}^{(2-m)}({\textbf{t}}_1^n)\int _{0}^{\infty }(g(y))^m f_{j_1}(y)dy\\ {}&\quad +e^{-2r(t_1-t_0)}\sum _{j_1\in E} p_{j_0,j_1}^{(t_1-t_0)} M_{j_1}^{(2)}({\textbf{t}}_1^n), \end{aligned} \end{aligned}$$

thus,

$$\begin{aligned} \begin{aligned} M_{j_0}^{(2)}({\textbf{t}}_0^n)&=e^{-2r(t_1-t_0)}\sum _{j_1\in E} p_{j_0,j_1}^{(t_1-t_0)}\int _{0}^{\infty }(g(y))^2f_{j_1}(y)dy\\&\quad +2e^{-2r(t_1-t_0)}\sum _{j_1\in E} p_{j_0,j_1} ^{(t_1-t_0)}M_{j_1}^{(1)}({\textbf{t}}_1^n)\int _{0}^{\infty }(g(y)) f_{j_1}(y)dy\\ {}&\quad +e^{-2r(t_1-t_0)}\sum _{j_1\in E} p_{j_0,j_1}^{(t_1-t_0)} M_{j_1}^{(2)}({\textbf{t}}_1^n). \end{aligned} \end{aligned}$$
(20)

We set \(t_0=t_{n-1}\), which is identical to saying \(t_1=t_{n}\), to find the solution,

$$\begin{aligned} \begin{aligned} M_{j_{n-1}}^{(2)}({\textbf{t}}_{n-1}^n)&=e^{-2r(t_n-t_{n-1})}\sum _{j_n\in E} p_{j_{n-1},j_n}^{(t_n-t_{n-1})}\int _{0}^{\infty }(g(y))^2f_{j_n}(y)dy\\&\quad +2e^{-2r(t_n-t_{n-1})}\sum _{j_n\in E} p_{j_{n-1},j_n} ^{(t_n-t_{n-1})}M_{j_n}^{(1)}({\textbf{t}}_{n}^n)\int _{0}^{\infty }(g(y)) f_{j_n}(y)dy\\ {}&\quad +e^{-2r(t_n-t_{n-1})}\sum _{j_n\in E} p_{j_{n-1},j_n}^{(t_n-t_{n-1})} M_{j_n}^{(2)}({\textbf{t}}_{n}^n). \end{aligned} \end{aligned}$$

The moments \(M_{j_n}^{(1)}({\textbf{t}}_n^n)\) and \(M_{j_n}^{(2)}({\textbf{t}}_{n}^n)\) are null by definition, which occurs when \(t_0=t_n\). Therefore, the formula becomes

$$\begin{aligned} M_{j_{n-1}}^{(2)}({\textbf{t}}_{n-1}^n)=e^{-2r(t_n-t_{n-1})}\sum _{j_n\in E} p_{j_{n-1},j_n}^{(t_n-t_{n-1})}\int _{0}^{\infty }(g(y))^2f_{j_n}(y)dy. \end{aligned}$$
(21)

We note that all terms to the right of the equality symbol are known. Subsequently, we replace \(t_0=t_{n-2}\) and \(t_1=t_{n-1}\), thus we obtain

$$\begin{aligned} \begin{aligned} M_{j_{n-2}}^{(2)}({\textbf{t}}_{n-2}^n)&=e^{-2r(t_{n-1}-t_{n-2})}\sum _{j_{n-1}\in E} p_{j_{n-2},j_{n-1}}^{(t_{n-1}-t_{n-2})}\int _{0}^{\infty }(g(y))^2f_{j_{n-1}}(y)dy\\&\quad +2e^{-2r(t_{n-1}-t_{n-2})}\sum _{j_{n-1}\in E} p_{j_{n-2},j_{n-1}} ^{(t_{n-1}-t_{n-2})}M_{j_{n-1}}^{(1)}({\textbf{t}}_{n-1}^n)\int _{0}^{\infty }(g(y)) f_{j_{n-1}}(y)dy\\ {}&\quad +e^{-2r(t_{n-1}-t_{n-2})}\sum _{j_{n-1}\in E} p_{j_{n-2},j_{n-1}}^{(t_{n-1}-t_{n-2})} M_{j_{n-1}}^{(2)}({\textbf{t}}_{n-1}^n). \end{aligned} \end{aligned}$$

Now, we substitute the previously calculated \(M_{j_{n-1}}^{(1)}({\textbf{t}}_{n-1}^n)\) and \(M_{j_{n-1}}^{(2)}({\textbf{t}}_{n-1}^n)\), and we have

$$\begin{aligned} \begin{aligned} M_{j_{n-2}}^{(2)}({\textbf{t}}_{n-2}^n)&=e^{-2r(t_{n-1}-t_{n-2})}\sum _{j_{n-1}\in E} p_{j_{n-2},j_{n-1}}^{(t_{n-1}-t_{n-2})}\int _{0}^{\infty }(g(y))^2f_{j_{n-1}}(y)dy\\&\quad +2e^{-2r(t_{n-1}-t_{n-2})}\sum _{j_{n-1}\in E} p_{j_{n-2},j_{n-1}}^{(t_{n-1}-t_{n-2})}e^{-r(t_n-t_{n-1})}\sum _{j_{n}\in E} p_{j_{n-1},j_n}^{(t_n-t_{n-1})}\\&\quad \cdot \int _{0}^{\infty }(g(y))f_{j_n}(y)dy\int _{0}^{\infty }(g(y)) f_{j_{n-1}}(y)dy\\ {}&\quad +e^{-2r(t_{n-1}-t_{n-2})}\sum _{j_{n-1}\in E} p_{j_{n-2},j_{n-1}}^{(t_{n-1}-t_{n-2})}e^{-r(t_n-t_{n-1})}\\ {}&\quad \cdot \sum _{j_n\in E} p_{j_{n-1},j_n}^{(t_n-t_{n-1})}\int _{0}^{\infty }(g(y))^2f_{j_n }(y)dy. \end{aligned} \end{aligned}$$
(22)

Finally, we can recursively set \(t_0=t_{n-3}\) up to \(t_0\), to obtain the second moment.

Similarly to the computation of the first and second moments, we can compute higher moments.

3 Case study

We propose an application of the previously described method to advertising data from the Italian television network. In particular, we employ the audience data provided by Auditel and available at https://www.auditel.it/en/data/. We obtain the monthly Average Minute Rating (AMR) data which represents the average number of viewers for each of the following time slotsFootnote 3: 2:00 AM–7:00 AM, 7:00 AM–9:00 AM, 9:00 AM–12:00 PM, 12:00 PM–3:00 PM, 3:00 PM–6:00 PM, 6:00 PM–8:30 PM, 8:30 PM–10:30 PM and 10:30 PM–2:00 AM. As an example let “Canale 5” be the channel on which the advertisement will be broadcast. We collected the data referred to this channel from January 2017 to December 2021.

In this way, the time series of AMR consists of a number of observations equal to eight-time slots for 12 months for five years, resulting in 480 monthly observations.

To get an overview, we report the summary statistics in Table 1 that shows that the AMR series assumes values from 359,774 to 4,756,853 viewers, with a mean of 1,945,300 and a standard deviation of 1,002,336.

Table 1 Descriptive statistics of AMR

In addition, we plot the full dataset in Fig. 2 that shows a monthly and yearly seasonality. Therefore, to eliminate the periodic component, we perform a deseasonalization employing the moving average with a period equal to eight which is the number of the time slots for each month.

Fig. 2
figure 2

Time series of Average Minute Rating (AMR) with rolling average expressed in \(10^6\)

Fig. 3
figure 3

Histogram of the discretized states distribution superimposed on the moving average histogram

To apply the model and thus compute the relative parameters, we discretize the moving average series into a 3-state space. Specifically, we assign to the lower state all values included between the lowest one to \(\mu -\frac{\sigma }{2}\), to the middle state the observations from \(\mu -\frac{\sigma }{2}\) to \(\mu +\frac{\sigma }{2}\), and to the higher state the values between \(\mu +\frac{\sigma }{2}\) and the greatest one.

To get an overview of the data distribution we report the histogram in Fig. 3 along with the frequencies of the discretized observations. The central dashed vertical line represents the mean of the distribution.

Using the maximum likelihood estimation (see, e.g., Billingsley 1961; Bharucha-Reid 1962, we are able to compute the transition probability matrix, as follows

$$\begin{aligned} {\textbf{P}}= \begin{bmatrix} 0.965 &{} 0.035 &{} 0.000 \\ 0.038 &{} 0.902 &{} 0.060 \\ 0.000 &{} 0.045 &{} 0.955 \end{bmatrix}. \end{aligned}$$
(23)

We observe higher values along the main diagonal. Thus, the probability of remaining, for a time period, in the starting state is very high for one period. From this matrix, we obtain the following stationary distribution

$$\begin{aligned} \mathbf {\pi }= \begin{bmatrix} 0.311&0.296&0.393 \end{bmatrix}, \end{aligned}$$
(24)

that is the probability of the three audience levels in the stationary regime. Note that it is approximately equally distributed.

Being useful for the calculation of moments, we plot the Empirical Cumulative Distribution Function (ECDF) of State 1, State 2, and State 3 compared to the global ECDF in Fig. 4. We can see that the ECDFs of States 2 and 3 grow faster than the ECDF referring to State 1.

Since we want to check that the Markovian model is a right choice for the data considered, we perform the independence test (Basawa and Rao (1980)):

$$\begin{aligned} H_0:p_{i,j}=\phi _j, \quad H_1:p_{i,j}\ne \phi _{j},\quad i,j\in E, \end{aligned}$$
(25)

Where \(\phi _j\) is the component of the vector

$$\begin{aligned} \varvec{\phi }=(\phi _1,\phi _2,\phi _3)=\left( \frac{\sum _{i=1}^{3}n_{i1}}{n}, \frac{\sum _{i=1}^{3}n_{i2}}{n}, \frac{\sum _{i=1}^{3}n_{i3}}{n}\right) , \end{aligned}$$
(26)

where \(n_{ij}\) is the number of times the chain moves from state i to state j.

The null hypothesis states that the probability of going from state i to state j does not depend on the starting state i. If this happens the sequence of MA does not depend on the previous state and so instead of being represented by a Markov Chain, the observations would constitute a sequence of independent and identically distributed random variables. The statistic proposed in Basawa and Rao (1980) was used to test the independence hypothesis:

$$\begin{aligned} S=\sum _{i,j\in E}\frac{(n_{ij}-n_i\phi _j)^2}{n_i{\phi }_j}, \end{aligned}$$
(27)

where \(n_{i}=\sum _{j\in E}n_{ij}\). We have that S has a limiting \(\chi ^2\) distribution with 4 degrees of freedom. The value of the statistic is 785.252 and we reject the null hypothesis with a confidence level of \(1\%\). Therefore, we can assume that there is time dependence on the previous state and Markov chains can be a suitable tool to do this modelling.

Fig. 4
figure 4

Empirical Cumulative Distribution Function (ECDF) of the whole dataset and of each state

For our case study, we assume to air an advertisement for a time vector \(\varvec{t}_0^2=[4,12,20]\). According to the presented model, the time at which the option is exercised is \(t_0\), but we could also have chosen a different time to carry out the evaluation.Footnote 4 For simplicity, let the gain function be a linear function of the number of listeners connected at each time \(G(N(t_i))=N(t_i)\). Since we do not know the risk level of the investment, we chose the discount rate equal to \(r=3.873\%\), which is the rate of the Italian Treasury Bonds.Footnote 5

First, we calculate the first ten moments conditioning at each state using Eq. (16). Results are shown in Table 2.

Table 2 First 10 moments conditioning at State 1, 2 and 3 in millions
Fig. 5
figure 5

Density functions starting from the knowledge of 2, 3, 4, 5 and 10 moments conditioning on State 1

Fig. 6
figure 6

Density functions starting from the knowledge of 2, 3, 4, 5 and 10 moments conditioning on the State

Fig. 7
figure 7

Density functions starting from the knowledge of 2, 3, 4, 5 and 10 moments conditioning on the State 3

Then, we apply the method of moments with the maximum entropy approach to determine the density functions generated using different orders of moments. For the implementation of this approach, we employ the Python code PyMaxEnt by Saad and Ruai (2019). Figures 56, and 7 show the density functions obtained from the knowledge of 2, 3, 4, 5 and 10 moments conditioning on each state. It can be seen, for instance, that State 3 reveals changes in the shape of the distribution as the order of moments increases. This means that by increasing the moments we get a more detailed density function. This behaviour is less evident in State 2.

Employing the density function derived from the knowledge of the first ten moments, we proceed to calculate the option price using Eq. (3). The results are summarised in Table 3. We note that as the strike price increases, the option price decreases and as the status increases, the price also increases.

Table 3 Option prices in euros for different levels of strike prices K conditioning at different states starting from the knowledge of 10 moment

As a robustness test for our model, we compare the results with those obtained employing a parametric lognormal distribution which is compatible with a B&S model. We recall that the random variable \(X=e^{N}\) follows the lognormal distribution \(\log {{X}}(\mu ,\sigma ^{2})\) if \(N=\log X\) follows the normal distribution \({\mathcal {N}}(\mu ,\sigma ^{2})\). Its probability density function is:

$$\begin{aligned} f(x)={\frac{e^{-{\frac{(\ln x-\mu )^{2}}{2\sigma ^{2}}}}}{x{\sqrt{2\pi }}{\sigma }}}, \end{aligned}$$
(28)

with expected value and variance as follows:

$$\begin{aligned}{} & {} {\mathbb {E}}[X]=e^{\mu +{\frac{\sigma ^{2}}{2}}}, \end{aligned}$$
(29)
$$\begin{aligned}{} & {} \quad {\text {Var}}(X)=e^{2\mu +\sigma ^{2}}(e^{\sigma ^{2}}-1). \end{aligned}$$
(30)
Table 4 Option prices in euros for different levels of strike prices K conditioning at different states with the assumption of lognomality using only the first two moments

Using the first two moments of our parametric distributions, conditional on each state, we find the prices in Table 4. Similarly to the previous table, prices decrease as the strike price increases. The comparison of Tables 3 and 4 reveals that the values are qualitatively similar. It can be noted that for the first state, the values from our model are lower than the ones calculated with the assumption of lognormally, while those for states 2 and 3 are higher at least up to \(K=\) 3,500,000. Since the values in Table 3 are generated by the knowledge of ten moments, they contain more information.

4 Conclusion

In the modern economy, the advertising sector has seen exponential growth in interest as it plays a key role in the sale of products and services. Advertising today is understood as a commercial vehicle for the realization of profit. With our work, we investigated how to price an option that allows for an advertisement to be aired at certain times in exchange for a strike price. First, we assumed the number of viewers of the advertisement behaves as a Markov chain. Then, we adopted the present worth method calculating the moments of the sum of all the discounted cash flows, earned from the fact that a certain number of people saw the advertisement.

To get the fair price of the option, we found the density function of the discounted cash flows via the method of moments with the maximum entropy approach by Mead and Papanicolaou (1984).

Applying this model to a concrete case, we found that the price value depends on the number of moments that are used for the calculation. Certainly, the way in which the number of viewers is converted into monetary amounts also influences the option price, along with the rate at which the amounts are discounted over time, the vector of the times in which the advertisement is aired, and the value of the strike price.

The accuracy of the result depends also on the accuracy of the data available and therefore, since we have data that are monthly averages, a sensitivity analysis is proposed as future work.

This paper leaves several open questions that can be addressed in future studies. Specifically, we can consider the possibility of making comparisons with further models or diversifying the application with other case studies. Moreover, it would be interesting to investigate the extension from a television network to a mobile platform with apps and software, to reach different viewers with on-demand requests. Finally, it can be possible to value the business as a compound option, in which n phases of an investment are involved.