1 Introduction

The problem of optimizing advertising strategies has always been of paramount importance in the field of marketing. Starting from the pioneering works of Vidale and Wolfe (1957) and Nerlove and Arrow (1962), this topic has evolved into a full-fledged field of research and modeling. Realizing the impossibility of describing all existing classical approaches and results, we refer the reader to the review article of Sethi (1977) (that analyzes the literature prior to 1975) and a more recent paper by Feichtinger et al. (1994) (covering the results up to 1994) and references therein.

It is worth noting that the Nerlove–Arrow approach, which was the foundation for numerous modern dynamic advertising models, assumed no time lag between spending on advertising and the impact of the latter on the goodwill stock. However, many empirical studies (see, for example, Leone 1995) clearly indicate some kind of a “memory” phenomenon that is often called the “distributed lag” or “carryover” effect: the influence of advertising does not have an immediate impact but is rather spread over a period of time varying from several weeks to several months. This shortcoming of the basic Nerlove–Arrow model gave rise to many modifications of the latter aimed at modeling distributed lags. For a long time, nevertheless, the vast majority of dynamic advertising models with distributed lags had been formulated in a deterministic framework (see e.g. Sethi 1977, §2.6 and Feichtinger et al. 1994, Section 2.3).

In recent years, however, there have been several landmark papers that consider the Nerlove-Arrow-type model with memory in a stochastic setting. Here, we refer primarily to the series of papers (Gozzi and Marinelli 2005; Gozzi et al. 2009) (see also a more recent work Li and Chen 2020), where goodwill stock is modeled via Brownian linear diffusion with delay of the form

$$\begin{aligned} dX^u(t)= & {} \left( \alpha _0 X^u(t) + \int _{-r}^0 \alpha _1(s) X^u(t+s)ds + \beta _0 u(t) \right. \nonumber \\{} & {} \left. + \int _{-r}^0 \beta _1(s) u(t+s) ds \right) dt + \sigma dW(t), \end{aligned}$$
(1.1)

where \(X^u\) is interpreted as the product’s goodwill stock and u is the spending on advertising. The corresponding optimal control problem in this case was solved using the so-called lift approach: equation (1.1) was rewritten as a stochastic differential equation (without delay) in a suitable Hilbert space, and then infinite-dimensional optimization techniques (either dynamic programming principle or maximum principle) were applied.

In this article, we present an alternative stochastic model that also takes the carryover effect into account. Instead of the delay approach described above, we incorporate the memory into the model by means of the Volterra kernel \(K \in L^2([0,T])\) and consider the controlled Volterra Ornstein-Uhlenbeck process of the form

$$\begin{aligned} X^u(t)=X(0)+\int _0^t K(t-s)\Big (\alpha u(s)-\beta X^u(s)\Big )ds+\sigma \int _0^t K(t-s) dW(s),\nonumber \\ \end{aligned}$$
(1.2)

where \(\alpha ,\beta ,\sigma > 0\) and \(X(0) \in {\mathbb {R}}\) are constants (see e.g. Abi et al. 2019, Section 5 for more details on affine Volterra processes of such type). Note that such goodwill dynamics can be regarded as the combination of deterministic lag models described in Feichtinger et al. (1994, Section 2.3) and the stochastic Ornstein-Uhlenbeck-based model presented by Rao (1986). The main difference from (1.1) is the memory incorporated to the noise along with the drift as the stochastic environment (represented by the noise) tends to form “clusters” with time. Indeed, in reality positive increments are likely to be followed by positive increments (if conditions are favourable for the goodwill during some period of time) and negative increments tend to follow negative increments (under negative conditions). This behaviour of the noise cannot be reflected by a standard Brownian driver but can easily be incorporated into the model (1.2).

Our goal is to solve an optimization problem of the form

$$\begin{aligned} {\left\{ \begin{array}{ll} X^u(t)=X(0)+\int _0^t K(t-s)\Big (\alpha u(s)-\beta X^u(s)\Big )ds+\sigma \int _0^t K(t-s) dW(s),\\ J(u):= \mathbb {E}\left[ -\int _0^Ta_1u^2(s)ds+a_2X^u(T)\right] \rightarrow \max , \end{array}\right. }\nonumber \\ \end{aligned}$$
(1.3)

where \(a_1,a_2 > 0\) are given constants. The set of admissible controls for the problem (1.3), denoted by \(L^2_a:= L^2_a(\Omega \times [0,T])\), is the space of square integrable real-valued stochastic processes adapted to the filtration generated by W. Note that the process \(X^u\) is well defined for any \(u\in L^2_a\) since, for almost all \(\omega \in \Omega \), the Eq. (1.2) treated pathwisely can be considered as a deterministic linear Volterra integral equation of the second kind that has a unique solution (see e.g. Tricomi 1985).

The optimization problem (1.3) for underlying Volterra dynamics has been studied by several authors (see, e.g. Agram and Øksendal 2015, Yong 2006 and the bibliography therein). Contrarily to most of the work in our bibliography, we will not solve such problem by means of a maximum principle approach. Even though this method allows to find necessary and sufficient conditions to obtain the optimal control to (1.3), we cannot directly apply it as we deal with low regularity conditions on the coefficients of our drift and volatility. Furthermore, such method has another notable drawback in the practice. In fact, its application is often associated with computations of conditional expectations that are substantially challenging due to the absence of Markovianity. Another possible method to solve the optimal control problem (1.3) is to get an explicit solution of the forward equation (1.2), plug it into the performance functional and try to solve the maximization problem using differential calculus in Hilbert spaces. But, even though this method seems appealing, obtaining the required explicit representation of \(X^u\) in terms of u might be tedious and burdensome. Instead, we will use the approach introduced in Abi et al. (2021), Di and Giordano (2022) that is in the same spirit of the one in Gozzi and Marinelli (2005), Gozzi et al. (2009), Li and Zhen (2020) mentioned above: we will rewrite the original forward stochastic Volterra integral equation as a stochastic differential equation in a suitable Hilbert space and then apply standard optimization techniques in infinite dimensions (see e.g. Fabbiri et al. 2017; Fuhrman and Tessitore 2002). Moreover, the shape of the corresponding infinite-dimensional Hamilton-Jacobi-Bellman equation allows to obtain an explicit solution to the latter by exploiting the “splitting” method from Gozzi et al. (2009, Section 3.3).

We notice that, while the optimization problem (1.3) is closely related to the one presented in Abi etal. (2021), there are several important differences in comparison to our work. In particular, Abi etal. (2021) demands the kernel to have the form

$$\begin{aligned} K(t) = \int _{{\mathbb {R}}_+} e^{-\theta t} \mu (d\theta ), \end{aligned}$$
(1.4)

where \(\mu \) is a signed measure such that \(\int _{{\mathbb {R}}_+} (1 \wedge \theta ^{-1/2}) |\mu |(d\theta ) < \infty \). Although there are some prominent examples of such kernels, not all kernels K are of this type; furthermore, even if a particular K admits such a representation in theory, it may not be easy to find the explicit shape of \(\mu \). In contrast, our approach works for all Hölder continuous kernels without any restrictions on the shape and allows to get explicit approximations \({\hat{u}}_n\) of the optimal control \({\hat{u}}\). The lift procedure presented here is also different from the one used in Abi etal. (2021) (although they both are specific cases of the technique presented in Cuchiero and Teichmann (2020).

The lift used in the present paper was introduced in Cuchiero and Teichmann (2020), then generalized in Cuchiero and Teichmann (2019) for the multi-dimensional case, but the approach itself can be traced back to Carmona and Coutin (1998). It should be also emphasised that this method has its own limitations: in order to perform the lift, the kernel K is required to have a specific representation of the form \(K(t) = \langle g, e^{t{\mathcal {A}}} \nu \rangle _{{\mathbb {H}}}\), \(t\in [0,T]\), where g and \(\nu \) are elements of some Hilbert space \({\mathbb {H}}\) and \(\{e^{t{\mathcal {A}}},~t\in [0,T]\}\) is a uniformly continuous semigroup acting on \({\mathbb {H}}\) with \({\mathcal {A}}\in {\mathcal {L}}({\mathbb {H}})\) and, in general, it may be hard to find feasible \({\mathbb {H}}\), g, \(\nu \) and \({\mathcal {A}}\). Here, we work with Hölder continuous kernels K and we overcome this issue by approximating the kernel with Bernstein polynomials (which turn out to enjoy a simple representation of the required type). Then we solve the optimal control problem for the forward process with approximated kernel instead of the original one and we study convergence.

The paper is organised as follows. In Sect. 2, we present our approach in case of a liftable K (i.e. K having a representation in terms of \({\mathbb {H}}\), g, \(\nu \) and \({\mathcal {A}}\) mentioned above). Namely, we describe the lift procedure, give the necessary results from stochastic optimal control theory in Hilbert spaces as well as derive an explicit representation of the optimal control \({\hat{u}}\) by solving the associated Hamilton-Jacobi-Bellman equation. In Sect. 3, we introduce a liftable approximation for general Hölder continuous kernels, give convergence results for the solution to the approximated problem and discuss some numerical aspects for the latter. In Sect. 4, we illustrate the application of our technique with examples and simulations.

2 Solution via Hilbert space-valued lift

2.1 Preliminaries

First of all, let us begin with some simple results on the optimization problem (1.3). Namely, we notice that \(X^u\) and the optimization problem (1.3) are well defined for any \(u\in L^2_a\).

Theorem 1

Let \(K\in L^2([0,T])\). Then, for any \(u \in L^2_a\),

  1. 1)

    the forward Volterra Ornstein-Uhlenbeck-type equation (1.2) has a unique solution;

  2. 2)

    there exists a constant \(C>0\) such that

    $$\begin{aligned} \sup _{t\in [0,T]} {\mathbb {E}}[|X^u(t)|^2] \le C(1+ \Vert u\Vert ^2_2), \end{aligned}$$

    where \(\Vert \cdot \Vert _2\) denotes the standard \(L^2(\Omega \times [0,T])\) norm;

  3. 3)

    \(|J(u)| < \infty \).

Proof

Item 1) is evident since, for almost all \(\omega \in \Omega \), the Eq. (1.2) treated pathwisely can be considered as a deterministic linear Volterra integral equation of the second kind that has a unique solution (see e.g. Tricomi 1985). Next, it is straightforward to deduce that

$$\begin{aligned} \mathbb {E}\left[ |X^u(t)|^2\right]&\le C\bigg (1+ \mathbb {E}\left[ \left( \int _0^t K(t-s) u(s)ds\right) ^2\right] + \mathbb {E}\left[ \left( \int _0^t K(t-s) X^u(s)ds\right) ^2\right] \\&\qquad + \mathbb {E}\left[ \left( \int _0^t K(t-s) dW(s)\right) ^2\right] \bigg )\\&\le C \left( 1 + \Vert K\Vert _2^2 \Vert u\Vert _2^2 + \Vert K\Vert _2^2 \int _0^t \mathbb {E}\left[ |X^u(s)|^2\right] ds + \Vert K\Vert _2^2\right) \\&\le C \left( 1 + \Vert u\Vert _2^2 + \int _0^t \mathbb {E}\left[ |X^u(s)|^2\right] ds\right) . \end{aligned}$$

Now, item 2) follows from Gronwall’s inequality. Finally, \(\mathbb {E}[X^u(t)]\) satisfies the deterministic Volterra equation of the form

$$\begin{aligned} \mathbb {E}[X^u(t)]= - \beta \int _0^t K(t-s) \mathbb {E}[X^u(s)] ds + X(0) + \alpha \int _0^t K(t-s) \mathbb {E}[u(s)]ds \end{aligned}$$

and hence can be represented in the form

$$\begin{aligned} \begin{aligned} \mathbb {E}[X^u(t)]&= X(0) + \alpha \int _0^t K(t-s)\mathbb {E}[u(s)]ds - \beta \int _0^t R_\beta (t,s) X(0) ds \\&\quad - \alpha \beta \int _0^t R_\beta (t,s)\int _0^s K(s-v) \mathbb {E}[u(v)] dv ds \\&=: X(0) + {\mathcal {L}} u, \end{aligned} \end{aligned}$$

where \(R_\beta \) is the resolvent of the corresponding Volterra integral equation and the operator \({\mathcal {L}}\) is linear and continuous. Hence J(u) can be re-written as

$$\begin{aligned} J(u) = -a_1\langle u,u \rangle _{L^2(\Omega \times [0,T]} + a_2 (X(0) + {\mathcal {L}} u), \end{aligned}$$
(2.1)

which immediately implies that \(|J(u)| < \infty \). \(\square \)

2.2 Construction of Markovian lift and formulation of the lifted problem

As anticipated above, in order to solve the optimization problem (1.3) we will rewrite \(X^u\) in terms of Markovian Hilbert space-valued process \({\mathcal {Z}}^u\) using the lift presented in Cuchiero and Teichmann (2020) and then apply the dynamic programming principle in Hilbert spaces. We start from the description of the core idea behind the Markovian lifts in case of liftable kernels.

Definition 1

Let \({\mathbb {H}}\) denote a separable Hilbert space with the scalar product \(\langle \cdot , \cdot \rangle \). A kernel \(K\in L^2([0,T])\) is called \({\mathbb {H}}\)-liftable if there exist \(\nu , g\in {\mathbb {H}}\), \(\Vert \nu \Vert _{{\mathbb {H}}} = 1\), and a uniformly continuous semigroup \(\{e^{t{\mathcal {A}}},~t\in [0,T]\}\) acting on \({\mathbb {H}}\), \({\mathcal {A}}\in \mathcal L({\mathbb {H}})\), such that

$$\begin{aligned} K(t)=\langle g, e^{t{\mathcal {A}}}\nu \rangle , \quad t\in [0,T]. \end{aligned}$$
(2.2)

For examples of liftable kernels, we refer to Sect. 4 and to Cuchiero and Teichmann (2020).

Consider a controlled Volterra Ornstein-Uhlenbeck process of the form (1.2) with a liftable kernel \(K(t) = \langle g, e^{t{\mathcal {A}}}\nu \rangle \), \(\Vert \nu \Vert _{{\mathbb {H}}} = 1\), and denote \(\zeta _0:=\frac{X(0)}{\Vert g\Vert _{{\mathbb {H}}}^2}g\) and

$$\begin{aligned} dV^u(t):=(\alpha u(t)-\beta X^u(t))dt+\sigma dW(t). \end{aligned}$$

Using the fact that \(X(0)=\langle g, \zeta _0\rangle \), we can now rewrite (1.2) as follows:

$$\begin{aligned} X^u(t)&= X(0)+\int _0^t K(t-s) dV^u(s)\\&=\langle g, \zeta _0 \rangle + \int _0^t\langle g,e^{(t-s){\mathcal {A}}}\nu \rangle dV^u(s)\\&=\left\langle g, \zeta _0+\int _0^t e^{(t-s){\mathcal {A}}} \nu dV^u(s) \right\rangle \\&=:\langle g,{\widetilde{{\mathcal {Z}}}}_t^u\rangle , \end{aligned}$$

where \({\widetilde{{\mathcal {Z}}}}^u_t:=\zeta _0+\int _0^t e^{(t-s){\mathcal {A}}}\nu dV^u(s)\). It is easy to check that, \({\widetilde{{\mathcal {Z}}}}^u\) is the unique solution of the infinite dimensional SDE

$$\begin{aligned} {\widetilde{{\mathcal {Z}}}}_t^u = \zeta _0+ \int _0^t \Big ({\mathcal {A}}({\widetilde{{\mathcal {Z}}}}_s^u-\zeta _0) + (\alpha u(s)-\beta \langle g,{\widetilde{{\mathcal {Z}}}}_s^u\rangle ) \nu \Big )ds + \int _0^t\sigma \nu dW(s) \end{aligned}$$

and thus the process \(\{{\mathcal {Z}}_t^u, t\in [0,T]\}\) defined as \({\mathcal {Z}}^u_t:= {\widetilde{{\mathcal {Z}}}}^u_t-\zeta _0\) satisfies the infinite dimensional SDE of the form

$$\begin{aligned} {\mathcal {Z}}_t^u=\int _0^t\Big ({\bar{{\mathcal {A}}}}{\mathcal {Z}}_s^u-\beta \langle g,\zeta _0\rangle \nu + \alpha u(s) \nu \Big ) ds +\int _0^t \sigma \nu dW(s), \end{aligned}$$

where \({\bar{{\mathcal {A}}}}\) is the linear bounded operator on \({\mathbb {H}}\) such that

$$\begin{aligned} {\bar{{\mathcal {A}}}} z:={\mathcal {A}}z-\beta \langle g, z \rangle \nu , \quad z\in {\mathbb {H}}. \end{aligned}$$
(2.3)

These findings are summarized in the following theorem.

Theorem 2

Let \(\{X^u(t), t\in [0,T]\}\) be a Volterra Ornstein-Uhlenbeck process of the form (1.2) with the \({\mathbb {H}}\)-liftable kernel \(K(t) = \langle g, e^{t{\mathcal {A}}}\nu \rangle \), \(g,\nu \in \mathbb H\), \(\Vert \nu \Vert _{{\mathbb {H}}} = 1\), \({\mathcal {A}}\in {\mathcal {L}}(\mathbb H)\). Then, for any \(t\in [0,T]\),

$$\begin{aligned} X^u(t) = \langle g, \zeta _0 \rangle + \langle g, {\mathcal {Z}}^u_t \rangle , \end{aligned}$$
(2.4)

where \(\zeta _0:=\frac{X(0)}{\Vert g\Vert _{{\mathbb {H}}}^2}g\) and \(\{{\mathcal {Z}}^u_t,~t\in [0,T]\}\) is the \({\mathbb {H}}\)-valued stochastic process given by

$$\begin{aligned} {\mathcal {Z}}_t^u=\int _0^t\Big ({\bar{{\mathcal {A}}}}{\mathcal {Z}}_s^u-\beta \langle g,\zeta _0\rangle \nu + \alpha u(s) \nu \Big ) ds +\int _0^t \sigma \nu dW(s) \end{aligned}$$
(2.5)

and \({\bar{{\mathcal {A}}}} \in {\mathcal {L}}({\mathbb {H}})\) is such that

$$\begin{aligned} {\bar{{\mathcal {A}}}} z:={\mathcal {A}}z-\beta \langle g, z \rangle \nu , \quad z\in {\mathbb {H}}. \end{aligned}$$

Using Theorem 2, one can rewrite the performance functional J(u) from (1.3) as

$$\begin{aligned} J^g (u) = \mathbb {E}\left[ -\int _0^Ta_1u^2(s)ds+a_2\langle g,{\mathcal {Z}}^u_T\rangle \right] +a_2\langle g,\zeta _0\rangle , \end{aligned}$$
(2.6)

where the superscript g in \(J^g\) is used to highlight dependence on the \({\mathbb {H}}\)-valued process \({\mathcal {Z}}^u\). Clearly, maximizing (2.6) is equivalent to maximizing

$$\begin{aligned} J^g (u) - a_2\langle g,\zeta _0\rangle = \mathbb {E}\left[ -\int _0^Ta_1u^2(s)ds+a_2\langle g,{\mathcal {Z}}^u_T\rangle \right] . \end{aligned}$$

Finally, for the sake of notation and coherence with literature, we will sometimes write our maximization problem as a minimization one by simply noticing that the maximization of the performance functional \(J^g (u) - a_2\langle g,\zeta _0\rangle \) can be reformulated as the minimization of

$$\begin{aligned} {\bar{J}}^g(u):= - J^g (u) + a_2\langle g,\zeta _0\rangle = \mathbb {E}\left[ \int _0^Ta_1u^2(s)ds - a_2\langle g,{\mathcal {Z}}^u_T\rangle \right] . \end{aligned}$$
(2.7)

Remark 1

Using the arguments similar to the proof of Theorem 1, it is straightforward to check that \(J^g\) and \({\bar{J}}^g\) are continuous w.r.t. u.

In other words, in case of \({\mathbb {H}}\)-liftable kernel K, the original optimal control problem (1.3) can be replaced by the following one:

$$\begin{aligned} {\left\{ \begin{array}{ll} {\mathcal {Z}}_t^u=\int _0^t\Big ({\bar{{\mathcal {A}}}}{\mathcal {Z}}_s^u-\beta \langle g,\zeta _0\rangle \nu + \alpha u(s)\nu \Big ) ds +\int _0^t \sigma \nu dW(s),\\ {\bar{J}}^g(u):= \mathbb {E}\left[ \int _0^Ta_1u^2(s)ds - a_2\langle g,{\mathcal {Z}}^u_T\rangle \right] \rightarrow \min , \end{array}\right. } \quad u\in L^2_a. \end{aligned}$$
(2.8)

Remark 2

The machinery described above can also be generalized for strongly continuous semigroups on Banach spaces, see e.g. Cuchiero and Teichmann (2019, 2020). However, for our purposes, it is sufficient to consider the case when \({\mathcal {A}}\) is a linear bounded operator on a Hilbert space.

2.3 Solution to the lifted problem

In order to solve the optimal control problem (2.8), we intend to use the dynamic programming approach as in Fabbri and Russo (2017). A comprehensive overview of this method for more general optimal control problems can also be found in Fabbri et al. (2017) and Fuhrman and Tessitore (2002).

Denote by \({\widetilde{\sigma }}\) an element of \({\mathcal {L}}({\mathbb {R}}, {\mathbb {H}})\) acting as

$$\begin{aligned} {\widetilde{\sigma }} x = x\sigma \nu , \quad x\in {\mathbb {R}}, \end{aligned}$$

and consider the Hamilton-Jacobi-Bellman (HJB) equation associated with the problem (2.8) of the form

$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{\partial }{\partial t}v(t,z)&{}=-\frac{1}{2} \text {Trace}\Big ({\widetilde{\sigma }} {\widetilde{\sigma }}^* \nabla ^2v(t,z) \Big )-\langle \nabla v(t,z), {\bar{{\mathcal {A}}}} z \rangle -{\mathcal {H}}(t,z, \nabla v(t,z) ),\\ v(T,z) &{}=-\langle a_2 g,z\rangle , \end{array}\right. }\nonumber \\ \end{aligned}$$
(2.9)

where by \(\nabla v\) we denote the partial Gateaux derivative w.r.t. the spacial variable z and the Hamiltonian functional \({\mathcal {H}}: [0,T]\times {\mathbb {H}}\times {\mathbb {H}}\rightarrow {\mathbb {R}}\) is defined as

$$\begin{aligned} {\mathcal {H}}(t,z,\xi ):= \inf _{u\in {\mathbb {R}}}\left\{ a_1u^2 + \big \langle \xi , -\beta \langle g, \zeta _0\rangle \nu + \alpha u \nu \big \rangle \right\} = -\frac{\alpha ^2 \langle \xi , \nu \rangle ^2}{4 a_1} -\beta \langle g, \zeta _0\rangle \langle \xi , \nu \rangle . \end{aligned}$$

Proposition 1

The HJB equation (2.9) associated with the lifted problem (2.8) admits a classical solution (in the sense of Fabbiri and Russo 2017, Definition 3.4) of the form

$$\begin{aligned} v(t,z)=\langle w(t),z\rangle +c(t), \end{aligned}$$
(2.10)

where

$$\begin{aligned} w(t) = - a_2 e^{-(t-T){\bar{{\mathcal {A}}}}^*} g, \quad t\in [0,T], \end{aligned}$$
(2.11)

\({\bar{{\mathcal {A}}}}^* = {\mathcal {A}}^* - \beta \langle \nu , \cdot \rangle g\), and

$$\begin{aligned} c(t) = -\int _t^T \left( \beta X(0) \langle w(s),\nu \rangle + \frac{\alpha ^2}{4 a_1}\langle w(s),\nu \rangle ^2\right) ds, \quad t\in [0,T]. \end{aligned}$$
(2.12)

Proof

Let us solve the HJB equation (2.9) explicitly using the approach presented in Gozzi et al. (2009, Section 3.3). Namely, we will look for the solution in the form (2.10), where w(t) and c(t) are (unknown) functions such that \(\frac{\partial }{\partial t} v\) and \(\nabla v\) are well-defined. In this case,

$$\begin{aligned} \frac{\partial }{\partial t} v(t,z)=\langle w'(t),z\rangle +c'(t),\quad \nabla v(t,z)=w(t), \quad \nabla ^2v(t,z)=0, \end{aligned}$$

and, recalling that \(\langle g, \zeta _0 \rangle = X(0)\), we can rewrite the HJB equation (2.9) as

$$\begin{aligned} {\left\{ \begin{array}{ll} \langle w'(t),z\rangle + \langle z, {\bar{{\mathcal {A}}}}^* w(t)\rangle + c'(t) - \beta X(0) \langle w(t),\nu \rangle - \frac{\alpha ^2}{4 a_1}\langle w(t),\nu \rangle ^2 =0\\ \langle w(T),z\rangle +c(T) =-\langle a_2 g,z\rangle . \end{array}\right. } \end{aligned}$$

Now it would be sufficient to find w and c that solve the following systems:

$$\begin{aligned} {\left\{ \begin{array}{ll} {\langle w'(t),z\rangle + \langle z, {\bar{{\mathcal {A}}}}^* w(t)\rangle } = 0\\ {\langle w(T),z\rangle +\langle a_2 g,z\rangle } = 0 \end{array}\right. }; \quad {\left\{ \begin{array}{ll} {c'(t) - \beta X(0) \langle w(t),\nu \rangle - \frac{\alpha ^2}{4 a_1}\langle w(t),\nu \rangle ^2} =0\\ {c(T)} = 0, \end{array}\right. }\nonumber \\ \end{aligned}$$
(2.13)

Noticing that the first system in (2.13) has to hold for all \(z\in {\mathbb {H}}\), we can solve

$$\begin{aligned} {\left\{ \begin{array}{ll} w'(t) + {\bar{{\mathcal {A}}}}^* w(t) = 0, \\ w(T)+a_2 g = 0 \end{array}\right. } \end{aligned}$$

instead, which is a simple linear equation and its solution has the form (2.11). Now it is easy to see that c has the form (2.12) and

$$\begin{aligned} v(t,z) = \langle w(t), z \rangle + c(t), \quad t\in [0,T]. \end{aligned}$$

It remains to note that (2.10)–(2.12) is indeed a classical solution to (2.9) in the sense of Fabbiri and Russo 2017, Definition 3.4.

\(\square \)

Let us now identify v in (2.10)–(2.12) with the value function of the lifted optimal control problem (2.8) using the result presented in [Theorem 4.1]Fabbri and Russo (2017).

Theorem 3

(Verification theorem) Let v be the solution (2.10)–(2.12) to the HJB equation (2.9) associated with the lifted optimal control problem (2.8). Then

  1. 1)

    \(\inf _{u\in L^2_a} {\bar{J}}^g(u) = v(0,0)\);

  2. 2)

    The optimal control \({\hat{u}}\) minimizing \({\bar{J}}^g\) in (2.8) has the form

    $$\begin{aligned} {\hat{u}}(t)=-\frac{\alpha }{2 a_1}\langle w(t),\nu \rangle =\frac{\alpha a_2}{2 a_1 } \left\langle g, e^{(T-t){\bar{{\mathcal {A}}}}}\nu \right\rangle , \end{aligned}$$
    (2.14)

    where \({\bar{{\mathcal {A}}}} = {\mathcal {A}}-\beta \langle g, \cdot \rangle \nu \).

In particular, \({\hat{u}}\) given by (2.14) solves the original optimal control problem (1.2).

Proof

It is straightforward to check that the coefficients of the forward equation in (2.8) satisfy Fabbiri and Russo(2017, Hypothesis 3.1) whereas the cost functional \({\bar{J}}^g(u)\) satisfies the conditions of Fabbiri and Russo (2017, Hypothesis 3.3). Moreover, the term \(-\beta \langle g, \zeta _0\rangle \nu \) in (2.8) satisfies condition (i) of Hypothesis 3.3 (2017, Theorem 3.7) and, since v given by (2.10)–(2.12) is a classical solution to the HJB equation (2.9), condition (ii) of Fabbiri and Russo (2017, Theorem 3.7) holds automatically. Finally, it is easy to see that v has sufficient regularity as required in Fabbiri and Russo (2017, Theorem 4.1). Therefore, both statements of Theorem 3 immediately follow from Fabbiri and Russo (2017, Theorem 4.1). \(\square \)

Remark 3

The approach described above can be extended by lifting to Banach space-valued stochastic processes. See Di and Giardano (2022) for more details.

3 Approximate solution for forwards with Hölder kernels

The crucial assumption in Sect. 2 that allowed to apply the optimization techniques in Hilbert space was the liftability of the kernel. However, in practice it is often hard to find a representation of the required type for the given kernel, and even if this representation is available, it is not always convenient from the implementation point of view. For this reason, we provide a liftable approximation for the Volterra Ornstein-Uhlenbeck process (1.2) for a general \(C^h\)-kernel K, where \(C^h([0,T])\) denotes the set of h-Hölder continuous functions on [0, T].

This section is structured as follows: first we approximate an arbitrary \(C^h\)-kernel by a liftable one in a uniform manner and introduce a new optimization problem where the forward dynamics is obtained from the original one replacing the kernel K with its liftable approximation. Afterwards, we prove that the optimal value of the approximated problem converges to the optimal value of the original problem and give an estimate for the rate of convergence. Finally, we discuss some numerical aspects that could be useful from the implementation point of view.

Remark 4

In what follows, by C we will denote any positive constant the particular value of which is not important and may vary from line to line (and even within one line). By \(\Vert \cdot \Vert _2\) we will denote the standard \(L^2(\Omega \times [0,T])\)-norm.

3.1 Liftable approximation for Volterra Ornstein-Uhlenbeck processes with Hölder continuous kernels

Let \(K\in C([0,T])\), \({\mathbb {H}}=L^2({\mathbb {R}})\), the operator \({\mathcal {A}}\) be the 1-shift operator acting on \({\mathbb {H}}\), i.e.

$$\begin{aligned} ({\mathcal {A}}f)(x) = f(x+1), \quad f\in {\mathbb {H}}, \end{aligned}$$

and denote \(K_n\) a Bernstein polynomial approximation for K of order \(n\ge 0\), i.e.

$$\begin{aligned} \begin{aligned} K_n(t)&= \frac{1}{T^n}\sum _{k=0}^n K\left( \frac{Tk}{n}\right) \left( {\begin{array}{c}n\\ k\end{array}}\right) t^k(T-t)^{n-k}\\&=: \sum _{k=0}^n \kappa _{n,k} t^k, \quad t\in [0,T], \end{aligned} \end{aligned}$$
(3.1)

where

$$\begin{aligned} \kappa _{n,k}:= \frac{1}{T^k}\sum _{i=0}^k (-1)^{k-i} K\left( \frac{iT}{n}\right) \left( {\begin{array}{c}n\\ i\end{array}}\right) \left( {\begin{array}{c}n-i\\ k-i\end{array}}\right) . \end{aligned}$$
(3.2)

Observe that

$$\begin{aligned} (e^{t{\mathcal {A}}} \mathbbm {1}_{[0,1]})(x) = \sum _{k=0}^\infty \frac{t^k}{k!} \left[ {\mathcal {A}}^k \mathbbm {1}_{[0,1]}\right] (x) = \sum _{k=0}^\infty \frac{t^k}{k!} \mathbbm {1}_{[-k,-k+1]}(x) \end{aligned}$$

and hence \(K_n\) is \({\mathbb {H}}\)-liftable as

$$\begin{aligned} K_n (t) = \left\langle g_n, e^{t{\mathcal {A}}} \nu \right\rangle _{{\mathbb {H}}} = \sum _{k=0}^n \kappa _{n,k} t^k, \quad t\in [0,T], \end{aligned}$$

with \(g_n:= \sum _{k=0}^n k! \kappa _{n,k} \mathbbm {1}_{[-k,-k+1]}\) and \(\nu := \mathbbm {1}_{[0,1]}\).

By the well-known approximating property of Bernstein polynomials, for any \(\varepsilon > 0\), there exist \(n = n(\varepsilon ) \in {\mathbb {N}}_0\) such that

$$\begin{aligned} \sup _{t\in [0,T]} \left| K(t) - K_n(t) \right| < \varepsilon . \end{aligned}$$

Moreover, if additionally \(K \in C^h([0,T])\) for some \(h\in (0,1)\), Mathe (1999, Theorem 1) guarantees that for all \(t\in [0,T]\)

$$\begin{aligned} |K(t)-K_n(t)|\le H\left( \frac{t(T-t)}{n}\right) ^{h/2} \le \frac{H T^h}{2^h} n^{-\frac{h}{2}}, \end{aligned}$$
(3.3)

where \(H>0\) is such that

$$\begin{aligned} |K(t)-K(s)|\le H |t-s|^h, \quad s,t\in [0,T]. \end{aligned}$$
(3.4)

Now, consider a controlled Volterra Ornstein-Uhlenbeck process \(\{X^u(t),~t\in [0,T]\}\) of the form (1.2) with the kernel \(K \in C^h([0,T])\) satisfying (3.4). For a given admissible u define also a stochastic process \(\{X^u_n (t),~t\in [0,T]\}\) as a solution to the stochastic Volterra integral equation of the form

$$\begin{aligned} X_n^u(t)= & {} X(0)+\int _0^t K_n(t-s) \Big (\alpha u(s)-\beta X_n^u(s)\Big )ds \nonumber \\{} & {} +\sigma \int _0^t K_n(t-s) dW(s), \quad t\in [0,T], \end{aligned}$$
(3.5)

where \(K_n (t) = \sum _{k=0}^n \kappa _{n,k} t^k\) with \(\kappa _{n,k}\) defined by (3.2), i.e. the Bernstein polynomial approximation of K of degree n.

Remark 5

It follows from Azmoodeh et al. (2014, Corollary 4) that both stochastic processes \(\int _0^t K(t-s)dW(s)\) and \(\int _0^t K_n(t-s)dW(s)\), \(t\in [0,T]\), have modifications that are Hölder continuous at least up to the order \(h \wedge \frac{1}{2}\). From now on, these modifications will be used.

Now we move to the main result of this subsection.

Theorem 4

Let \(K\in C^h([0,T])\), \(u \in L^2_a\), and \(X^u\), \(X^u_n\) are given by (1.2) and (3.5) respectively. Then there exists \(C>0\) which does not depend on n or u such that for any admissible \(u\in L^2_a\):

$$\begin{aligned} \sup _{t\in [0,T]} \mathbb {E}\left[ |X^u(t) -X^u_{n}(t)|^2\right] \le C (1+\Vert u\Vert ^2_2) n^{-h}. \end{aligned}$$

Proof

First, by Theorem 1, there exists a constant \(C>0\) such that

$$\begin{aligned} \sup _{t\in [0,T]}\mathbb {E}[|X^u(t)|^2]\le C(1+ \Vert u\Vert _2^2). \end{aligned}$$
(3.6)

Consider an arbitrary \(\tau \in [0,T]\), and denote \(\Delta (\tau ):=\sup _{t\in [0,\tau ]} \mathbb {E}\left[ |X^u(t) -X^u_{n}(t)|^2\right] \). Then

$$\begin{aligned} \Delta (\tau )&=\sup _{t\in [0,\tau ]}\mathbb {E}\Bigg [\Bigg |\int _0^t K(t-s)\Big (\alpha u(s)-\beta X^u(s)\Big )ds\\&\quad +\int _0^t K_n(t-s)\Big (\alpha u(s)-\beta X_n^u(s)\Big )ds\\&\quad +\int _0^t \sigma \Big ( K(t-s)-K_n(t-s)\Big )dW(s)\Bigg |^2\Bigg ]\\&\le C \sup _{t\in [0,\tau ]}\mathbb {E}\Bigg [ \int _0^t\Bigg | \Big ( K(t-s)-K_n (t-s) \Big )u(s) \Bigg |^2 ds\Bigg ]\\&\quad +C\sup _{t\in [0,\tau ] } \mathbb {E}\Bigg [ \int _0^t \Bigg |K_n(t-s)\Big (X^u(s)-X_n^u(s)\Big )\Bigg |^2 ds\Bigg ]\\&\quad + C\sup _{t\in [0,\tau ]}\mathbb {E}\Bigg [ \int _0^t\Bigg | X^u(s) \Big ( K(t-s)-K_n(t-s)\Big )\Bigg |^2 ds\Bigg ]\\&\quad +C \sup _{t\in [0,\tau ]}\mathbb {E}\left[ \Bigg |\int _0^t \Big (K(t-s)-K_n(t-s)\Big ) dW(s)\Bigg |^2 \right] . \end{aligned}$$

Note that, by (3.3) we have that

$$\begin{aligned} \sup _{t\in [0,\tau ]}\mathbb {E}\Bigg [ \int _0^t\Bigg | \Big ( K(t-s)-K_n (t-s) \Big )u(s) \Bigg |^2 ds\Bigg ]&\le Cn^{-h} \Vert u \Vert ^2_2. \end{aligned}$$

Moreover, since \(\{K_n,~n\ge 1\}\) are uniformly bounded due to their uniform convergence to K it is true that

$$\begin{aligned} \sup _{t\in [0,\tau ] } \mathbb {E}\Bigg [ \int _0^t \Bigg |K_n(t-s)\Big (X^u(s)-X_n^u(s)\Big )\Bigg |^2 ds\Bigg ]&\le C \int _0^\tau \Delta (s)ds \end{aligned}$$

with C not dependent on n, and from (3.3), (3.6) one can deduce that

$$\begin{aligned} \sup _{t\in [0,\tau ]}\mathbb {E}\Bigg [ \int _0^t\Bigg | X^u(s) \Big ( K(t-s)-K_n(t-s)\Big )\Bigg |^2 ds\Bigg ] \le C n^{-h} (1+ \Vert u \Vert ^2_2). \end{aligned}$$

Lastly, by the Ito isometry and (3.3),

$$\begin{aligned} \sup _{t\in [0,\tau ]}\mathbb {E}\left[ \Bigg |\int _0^t \Big (K(t-s)-K_n(t-s)\Big ) dW(s)\Bigg |^2 \right] \le C n^{-h}. \end{aligned}$$

Hence

$$\begin{aligned} \Delta (\tau ) \le Cn^{-h} (1+\Vert u\Vert _2^2)+ C \int _0^t \Delta (s) ds, \end{aligned}$$

where C is a positive constant (recall that it may vary from line to line). The final result follows from Gronwall’s inequality. \(\square \)

3.2 Liftable approximation of the optimal control problem

As it was noted before, our aim is to find an approximate solution to the the optimization problem (1.3) by solving the liftable problem of the form

$$\begin{aligned} {\left\{ \begin{array}{ll} X_n^u(t)=X(0)+\int _0^t K_n(t-s)\Big (\alpha u(s)-\beta X_n^u(s)\Big )ds+\sigma \int _0^t K_n(t-s) dW(s),\\ J_n(u):= \mathbb {E}\left[ -\int _0^Ta_1 u^2(s) ds+a_2X_n^u (T) \right] \rightarrow \max , \end{array}\right. }\nonumber \\ \end{aligned}$$
(3.7)

where the maximization is performed over \(u\in L^2_a\). In (3.7), \(K_n\) is the Bernstein polynomial approximation of \(K\in C^h([0,T])\), i.e.

$$\begin{aligned} K_n(t) = \langle g_n, e^{t{\mathcal {A}}} \nu \rangle , \quad t\in [0,T], \end{aligned}$$

where \({\mathcal {A}}\in {\mathcal {L}} \left( {\mathbb {H}}\right) \) acts as \(({\mathcal {A}}f)(x+1)\), \(\nu = \mathbbm {1}_{[0,1]}\) and \( g_n = \sum _{k=0}^n k! \kappa _{n,k} \mathbbm {1}_{[-k,-k+1]}\) with \(\kappa _{n,k}\) defined by (3.2). Due to the liftability of \(K_n\), the problem (3.7) falls in the framework of Sect. 2, so, by Theorem 3, the optimal control \({\hat{u}}_n\) has the form (2.14):

$$\begin{aligned} {\hat{u}}_n(t) = \frac{\alpha a_2}{2a_1 }\langle g_n, e^{(T-t)\bar{\mathcal {A}}_n} \nu \rangle , \quad t\in [0,T], \end{aligned}$$
(3.8)

where \({\bar{{\mathcal {A}}}}_n:= {\mathcal {A}}-\beta \langle g_n, \cdot \rangle \nu \). The goal of this subsection is to prove the convergence of the optimal performance in the approximated dynamics to the actual optimal, i.e.

$$\begin{aligned} J_n({\hat{u}}_n) \rightarrow \sup _{u\in L^2_a} J(u), \quad n \rightarrow \infty , \end{aligned}$$

where J is the performance functional from the original optimal control problem (1.3).

Proposition 2

Let the kernel \(K \in C^h([0,T])\). Then

$$\begin{aligned} \sup _{n\in {\mathbb {N}}} J_n (u) \rightarrow - \infty \qquad \text {as } \Vert u\Vert _2 \rightarrow \infty , \end{aligned}$$
(3.9)
$$\begin{aligned} J(u) \rightarrow - \infty \qquad \text {as } \Vert u\Vert _2 \rightarrow \infty , \end{aligned}$$
(3.10)

where \(\Vert \cdot \Vert _2\) denotes the standard \(L^2(\Omega \times [0,T])\) norm.

Proof

We prove only (3.9); the proof of (3.10) is the same. Let \(u\in L^2_a\) be fixed. For any \(n\in {\mathbb {N}}\) denote

$$\begin{aligned} G_n(t):= \int _0^t K_n(t-s) dW(s), \quad t\in [0,T], \end{aligned}$$

and notice that for any \(t\in [0,T]\) we have that

$$\begin{aligned} |X_n^u(t)|&\le X(0) + \alpha \int _0^t |K_n (t-s)| |u(s)| ds + \beta \int _0^t |K_n (t-s)||X^u_n (s)| ds + \sigma \left| G_n(t)\right| \\&\le C\left( 1 + \left( \int _0^T u^2(s) ds\right) ^{\frac{1}{2}} + \int _0^t |X_n^u(s)|ds + \sup _{r\in [0,T]}\left| G_n(r)\right| \right) , \end{aligned}$$

where \(C > 0\) is a deterministic constant that does not depend on n, t or u (here we used the fact that \(K_n \rightarrow K\) uniformly on [0, T]). Whence, for any \(n\in {\mathbb {N}}\),

$$\begin{aligned} {\mathbb {E}} \left[ |X_n^u(t)| \right] \le C\left( 1 + \Vert u\Vert _2 + \int _0^t \mathbb {E}\left[ |X_n^u(s)|\right] ds + \mathbb E\left[ \sup _{r\in [0,T]}\left| G_n(r) \right| \right] \right) . \end{aligned}$$
(3.11)

Now, let us prove that there exists a constant \(C>0\) such that

$$\begin{aligned} \sup _{ n\in {\mathbb {N}} } {\mathbb {E}}\left[ \sup _{r\in [0,T]}\left| G_n(r) \right| \right] < C. \end{aligned}$$

First note that, by Remark 5, for each \(n\in {\mathbb {N}}\) and \(\delta \in \left( 0, \frac{h}{2} \wedge \frac{1}{4}\right) \) there exists a random variable \(\Upsilon _n = \Upsilon _n (\delta )\) such that

$$\begin{aligned} \left| G_n(r_1) - G_n(r_2)\right| \le \Upsilon _n |r_1 - r_2|^{h\wedge \frac{1}{2} - 2\delta } \end{aligned}$$

and whence

$$\begin{aligned} \sup _{r\in [0,T]}\left| G_n(r) \right| \le T^{h\wedge \frac{1}{2} - 2\delta } \Upsilon _n. \end{aligned}$$

Thus it is sufficient to check that \(\sup _{n\in {\mathbb {N}}}\mathbb {E}\Upsilon _n < \infty \). It is known from Azmoodeh et al. (2014) that one can put

$$\begin{aligned} \Upsilon _n:= C_\delta \left( \int _0^T \int _0^T \frac{|G_n (x) - G_n (y)|^{p}}{|x-y|^{(h\wedge \frac{1}{2} - \delta )p+1}}dxdy \right) ^{\frac{1}{p}}, \end{aligned}$$

where \(p:=\frac{1}{\delta }\) and \(C_\delta > 0\) is a constant that does not depend on n. Let \(p' > p\). Then Minkowski integral inequality yields

$$\begin{aligned} \begin{aligned} \left( \mathbb {E}\Upsilon ^{p'}_n \right) ^{\frac{p}{p'}}&= C_\delta ^{p} \left( \mathbb {E}\left[ \left( \int _0^T \int _0^T \frac{|G_n (x) - G_n (y)|^{p}}{|x-y|^{(h\wedge \frac{1}{2} - \delta )p+1}}dxdy \right) ^{\frac{p'}{p}}\right] \right) ^{\frac{p}{p'}}\\&\le C_\delta ^{p} \int _0^T \int _0^T \frac{\left( \mathbb {E}\left[ |G_n (x) - G_n (y)|^{p'}\right] \right) ^{\frac{p}{p'}}}{|x-y|^{(h\wedge \frac{1}{2} - \delta )p+1}}dxdy. \end{aligned} \end{aligned}$$
(3.12)

Note that, by Mathe (1999, Proposition 2), every Bernstein polynomial \(K_n\) that corresponds to K is Hölder continuous of the same order h and with the same constant H, i.e.

$$\begin{aligned} |K_n(t)-K_n(s)|\le H|t-s|^h, \quad s,t\in [0,T], \end{aligned}$$

whenever

$$\begin{aligned} |K(t)-K(s)|\le H|t-s|^h, \quad s,t\in [0,T]. \end{aligned}$$

This implies that there exists a constant C which does not depend on n such that

$$\begin{aligned}&\mathbb {E}\left[ |G_n (x) - G_n (y)|^{p'}\right] \\&\quad = C \left( \int _0^{x\wedge y} (K_n(x-s) - K_n(y-s))^2ds + \int ^{x\vee y}_{x\wedge y} K^2_n(x \vee y-s)ds \right) ^{\frac{p'}{2}}\\&\quad \le C|x-y|^{p'(h\wedge \frac{1}{2})}. \end{aligned}$$

Plugging the bound above to (3.12), we get that

$$\begin{aligned} \left( \mathbb {E}\left[ \Upsilon ^{p'}_n\right] \right) ^{\frac{p}{p'}}&\le C \int _0^T \int _0^T |x-y|^{(h\wedge \frac{1}{2}) p - (h\wedge \frac{1}{2} -\delta )p-1}dxdy\\&= C \int _0^T \int _0^T |x-y|^{ -1 + \delta p}dxdy\\&< C, \end{aligned}$$

where \(C>0\) denotes, as always, a deterministic constant that does not depend on n, t, u and may vary from line to line.

Therefore, there exists a constant, again denoted by C not depending on n, t or u such that

$$\begin{aligned} \sup _{n\in {\mathbb {N}}} {\mathbb {E}}\left[ \Upsilon _{n} \right] < C \end{aligned}$$

and thus, by (3.11),

$$\begin{aligned} {\mathbb {E}} \left[ |X_n^u(t)| \right] \le C\left( 1 + \Vert u\Vert _2 + \int _0^t \mathbb {E}\left[ |X_n^u(s)|\right] ds\right) . \end{aligned}$$

By Gronwall’s inequality, there exists \(C>0\) which does not depend on n such that

$$\begin{aligned} {\mathbb {E}} \left[ |X_n^u(T)| \right] \le C(1+\Vert u\Vert _2), \end{aligned}$$

and so

$$\begin{aligned} \sup _{n\in {\mathbb {N}}} J_n (u) \le C(1+ \Vert u\Vert _2) - \Vert u\Vert _2^2 \rightarrow -\infty , \quad \Vert u\Vert _2\rightarrow \infty . \end{aligned}$$

\(\square \)

Theorem 5

Let \(K \in C^h([0,T])\) and \(K_n\) be its Bernstein polynomial approximation of order n. Then there exists constant \(C>0\) such that

$$\begin{aligned} \Big |J_n({\hat{u}}_n) - \sup _{u\in L^2_a} J(u)\Big | \le C n^{-\frac{h}{2}}. \end{aligned}$$
(3.13)

Moreover, \({\hat{u}}_n\) is “almost optimal” for J in the sense that there exists a constant \(C>0\) such that

$$\begin{aligned} \Big |J({\hat{u}}_n) - \sup _{u\in L^2_a} J(u)\Big | \le C n^{-\frac{h}{2}}. \end{aligned}$$

Proof

First, note that for any \(r \ge 0\)

$$\begin{aligned} \sup _{u\in B_r} \Big |J_n(u) - J(u)\Big | \le C (1+r^2)^{\frac{1}{2}} n^{-\frac{h}{2}}, \end{aligned}$$
(3.14)

where \(B_r:= \{u\in L^2_a:~\Vert u\Vert _2 \le r\}\). Indeed, by definitions of J, \(J_n\) and Theorem 4, for any \(u\in B_r\):

$$\begin{aligned} \begin{aligned} \Big |J_n(u) - J(u)\Big |&=\Big |\mathbb {E}[X_n^u(T)-X^u(T)]\Big | \le C(1 + \Vert u \Vert ^2_2)^{\frac{1}{2}} n^{-\frac{h}{2}}\\&\le C (1+r^2)^{\frac{1}{2}} n^{-\frac{h}{2}}. \end{aligned} \end{aligned}$$
(3.15)

In particular, this implies that there exists \(C>0\) that does not depend on n such that \(J(0) - C < J_n(0)\), so, by Proposition 2, there exists \(r_0>0\) that does not depend on n such that \(\Vert u\Vert _2 > r_0\) implies

$$\begin{aligned} J_n(u)< J(0) - C < J_n(0), \quad n\in {\mathbb {N}}. \end{aligned}$$

In other words, all optimal controls \({\hat{u}}_n\), \(n\in {\mathbb {N}}\) must be in the ball \(B_{r_0}\) and that \(\sup _{u\in L^2_a} J(u) = \sup _{u\in B_{r_0}} J(u)\). This, together with uniform convergence of \(J_n\) to J over bounded subsets of \(L^2_a\) and estimate (3.14), implies that there exists \(C>0\) not dependent on n such that

$$\begin{aligned} \Big |J_n({\hat{u}}_n) - \sup _{u\in L^2_a} J(u)\Big | \le C n^{-\frac{h}{2}}. \end{aligned}$$
(3.16)

Finally, taking into account (3.14) and (3.16) as well as the definition of \(B_{r_0}\),

$$\begin{aligned} \Big |J({\hat{u}}_n) - \sup _{u\in L^2_a} J(u)\Big |&\le \Big |J({\hat{u}}_n) - J_n({\hat{u}}_n)\Big | + \Big |J_n({\hat{u}}_n) - \sup _{u\in L^2_a} J(u)\Big |\\&\le \Big |J({\hat{u}}_n) - J_n({\hat{u}}_n)\Big | + \Big |J_n({\hat{u}}_n) - \sup _{u\in B_{r_0}} J(u)\Big | \\&\le C n^{-\frac{h}{2}}. \end{aligned}$$

which ends the proof. \(\square \)

Theorem 6

Let \(K \in C^h([0,T])\) and \({\hat{u}}_n\) be defined by (3.8). Then the optimization problem (1.3) has a unique solution \({\hat{u}} \in L^2_a\) and

$$\begin{aligned} {\hat{u}}_n \rightarrow {\hat{u}}, \quad n\rightarrow \infty , \end{aligned}$$

in the weak topology of \(L^2(\Omega \times [0,T])\).

Proof

By (2.1), the performance functional J can be represented in a linear-quadratic form as

$$\begin{aligned} J(u) = -a_1\langle u,u \rangle _{L^2(\Omega \times [0,T]} + a_2 (X(0) + {\mathcal {L}} u), \end{aligned}$$

where \({\mathcal {L}}\): \(L^2(\Omega \times [0,T]) \rightarrow L^2(\Omega \times [0,T])\) is a continuous linear operator. Then, by Allaire (2007, Theorem 9.2.6), there exists a unique \({\hat{u}} \in L^2(\Omega \times [0,T])\) that maximizes J and, moreover, \({\hat{u}}_n \rightarrow {\hat{u}}\) weakly as \(n\rightarrow \infty \). Furthermore, since all \({\hat{u}}_n\) are deterministic, so is \({\hat{u}}\); in particular, it is adapted to filtration generated by W which implies that \({\hat{u}} \in L^2_a\). \(\square \)

3.3 Algorithm for computing \({\hat{u}}_n\)

The explicit form of \({\hat{u}}_n\) given by (3.8) is not very convenient from the implementation point of view since one has to compute \(e^{(T-t) {\bar{{\mathcal {A}}}}_n}\nu =e^{(T-t)\bar{\mathcal {A}}_n}\mathbbm {1}_{[0,1]}\), where \({\bar{{\mathcal {A}}}}_n:= {\mathcal {A}}-\beta \langle g_n, \cdot \rangle \mathbbm {1}_{[0,1]}\), \(({\mathcal {A}}f)(x) = f(x+1)\). A natural way to simplify the problem is to truncate the series

$$\begin{aligned} \sum _{k=0}^\infty \frac{(T-t)^k}{k!}\bar{\mathcal {A}}_{n}^k\mathbbm {1}_{[0,1]} \approx \sum _{k=0}^M \frac{(T-t)^k}{k!}{\bar{{\mathcal {A}}}}_{n}^k\mathbbm {1}_{[0,1]} \end{aligned}$$

for some \(M\in {\mathbb {N}}\). However, even after replacing \(e^{(T-t){\bar{{\mathcal {A}}}}_n}\) in (3.8) with its truncated version, we still need to be able to compute \(\bar{\mathcal {A}}_{n}^k\mathbbm {1}_{[0,1]}\) for the given \(k \in {\mathbb {N}} \). An algorithm to do so is presented in the proposition below.

Proposition 3

For any \(k\in {\mathbb {N}} \cup \{0\}\),

$$\begin{aligned} \bar{\mathcal {A}}_n^k\mathbbm {1}_{[0,1]}=\sum _{i=0}^k\gamma ({i,k})\mathbbm {1}_{[-i,-i+1]}, \end{aligned}$$

where, \(\gamma ({0,0})=1\) and, for all \(k\ge 1\),

$$\begin{aligned} \gamma ({i,k})= {\left\{ \begin{array}{ll} \gamma ({i-1,k-1}), &{} i=1,...,k\\ \sum _{j=0}^{(k-1)\wedge n}(-\beta )j! \kappa _{n,j}\gamma ({j,k-1}), &{} i=0. \end{array}\right. } \end{aligned}$$

Proof

The proof follows an inductive argument. The statement for \(\gamma ({0,0})\) is obvious. Now let

$$\begin{aligned} \bar{\mathcal {A}}_n^{k-1}\mathbbm {1}_{[0,1]}=\sum _{i=0}^{k-1}\gamma ({i,k-1})\mathbbm {1}_{[-i,-i+1]}. \end{aligned}$$

Then

$$\begin{aligned} {\bar{{\mathcal {A}}}}_n^{k}\mathbbm {1}_{[0,1]}&={\bar{{\mathcal {A}}}}_n\Big (\bar{\mathcal {A}}_n^{k-1}\mathbbm {1}_{[0,1]}\Big )\\&= \sum _{i=0}^{k-1}\gamma ({i,k-1}){\bar{{\mathcal {A}}}}_n\mathbbm {1}_{[-i,-i+1]}\\&=\sum _{i=1}^{k}\gamma ({i-1,k-1})\mathbbm {1}_{[-i,-i+1]}\\&\quad +\mathbbm {1}_{[0,1]}(-\beta )\left\langle \sum _{j=0}^{k-1}\gamma ({j,k-1})\mathbbm {1}_{[-j,-j+1]}, \sum _{j=0}^n j! \kappa _{n,j}\mathbbm {1}_{[-j,-j+1]} \right\rangle \\&=\sum _{i=1}^{k}\gamma ({i-1,k-1})\mathbbm {1}_{[-i,-i+1]}+\mathbbm {1}_{[0,1]}\sum _{j=0}^{(k-1)\wedge n}(-\beta )j! \kappa _{n,j}\gamma ({j,k-1}). \end{aligned}$$

\(\square \)

Finally, consider

$$\begin{aligned} {\hat{u}}_{n,M}(t)&:= \frac{\alpha a_2}{2a_1 }\left\langle g_n, \sum _{k=0}^M \frac{(T-t)^k}{k!}{\bar{{\mathcal {A}}}}_{n}^k\mathbbm {1}_{[0,1]} \right\rangle \nonumber \\&= \frac{\alpha a_2}{2a_1 }\left\langle \sum _{i=0}^n i! \kappa _{n,i} \mathbbm {1}_{[-i,-i+1]} , \sum _{k=0}^M \sum _{i=0}^k \frac{(T-t)^k}{k!}\gamma ({i,k})\mathbbm {1}_{[-i,-i+1]} \right\rangle \nonumber \\&= \frac{\alpha a_2}{2a_1 }\left\langle \sum _{i=0}^n i! \kappa _{n,i} \mathbbm {1}_{[-i,-i+1]} , \sum _{i=0}^M \left( \sum _{k=i}^M \frac{(T-t)^k}{k!}\gamma ({i,k})\right) \mathbbm {1}_{[-i,-i+1]} \right\rangle \nonumber \\&= \frac{\alpha a_2}{2a_1 }\sum _{i=0}^{n \wedge M} \sum _{k=i}^M \frac{i! \kappa _{n,i}\gamma ({i,k})}{k!}(T-t)^k \nonumber \\&= \frac{\alpha a_2}{2a_1 }\sum _{k=0}^M \left( \sum _{i=0}^{k\wedge n} \frac{i! \kappa _{n,i}\gamma ({i,k})}{k!}\right) (T-t)^k, \end{aligned}$$
(3.17)

where \(\kappa _{n,i}\) are defined by (3.2) and \(\gamma ({i,k})\) are from Proposition 3.

Theorem 7

Let \(n\in {\mathbb {N}}\) be fixed and \(M \ge (T-t)\Vert {\bar{{\mathcal {A}}}}_n\Vert _{{\mathcal {L}}}\), where \(\Vert \cdot \Vert _{\mathcal L}\) denotes the operator norm. Then, for all \(t\in [0,T]\),

$$\begin{aligned} |{\hat{u}}_n (t) - {\hat{u}}_{n,M}(t)| \le \frac{\alpha a_2}{2a_1 } \Vert g_n \Vert e^{ (T-t)\Vert {\bar{{\mathcal {A}}}}_n\Vert _{{\mathcal {L}}} } \left( 1 - e^{ - \frac{(T-t)\Vert {\bar{{\mathcal {A}}}}_n\Vert _{{\mathcal {L}}}}{M+1} }\right) . \end{aligned}$$

Moreover,

$$\begin{aligned} \sup _{t\in [0,T]}|{\hat{u}}_n (t) - {\hat{u}}_{n,M}(t)| \le \frac{\alpha a_2}{2a_1 } \Vert g_n \Vert e^{ T\Vert {\bar{{\mathcal {A}}}}_n\Vert _{{\mathcal {L}}} } \left( 1 - e^{ - \frac{T\Vert {\bar{{\mathcal {A}}}}_n\Vert _{{\mathcal {L}}}}{M+1} }\right) \rightarrow 0, \quad M \rightarrow \infty . \end{aligned}$$

Proof

One has to prove the first inequality and the second one then follows. It is clear that

$$\begin{aligned} |{\hat{u}}_n (t) - {\hat{u}}_{n,M}(t)| \le \frac{\alpha a_2}{2a_1 } \Vert g_n \Vert \left\| \sum _{k=M+1}^\infty \frac{(T-t)^k}{k!}{\bar{{\mathcal {A}}}}_n^k \mathbbm {1}_{[0,1]}\right\| \end{aligned}$$

and, if \(M \ge (T-t)\Vert {\bar{{\mathcal {A}}}}_n\Vert _{{\mathcal {L}}}\), we have that

$$\begin{aligned} \left\| \sum _{k=M+1}^\infty \frac{(T-t)^k}{k!}{\bar{{\mathcal {A}}}}_n^k \mathbbm {1}_{[0,1]}\right\|&\le \sum _{k=M+1}^\infty \frac{\left( (T-t)\left\| {\bar{{\mathcal {A}}}}_n\right\| _{\mathcal L}\right) ^k}{k!}\\&\le e^{ (T-t)\Vert {\bar{{\mathcal {A}}}}_n\Vert _{{\mathcal {L}}} } \left( 1 - e^{ - \frac{(T-t)\Vert {\bar{{\mathcal {A}}}}_n\Vert _{{\mathcal {L}}}}{M+1} }\right) , \end{aligned}$$

where we used a well-known result on tail probabilities of Poisson distribution (see e.g. Samuel 1965). \(\square \)

4 Examples and simulations

Example 1

(monomial kernel) Let \(N \in {\mathbb {N}}\) be fixed. Consider an optimization problem of the form

$$\begin{aligned} {\left\{ \begin{array}{ll} X^u(t) = X(0)+\int _0^t(t-s)^{N}\Big (u(s)-X^u(s)\Big )ds+\int _0^t(t-s)^{N} dW(s),\\ \mathbb {E}\left[ X^u(T) - \int _0^Tu^2(s)ds\right] \rightarrow \max , \end{array}\right. } \end{aligned}$$
(4.1)

where, as always, we optimize over \(u\in L^2_a\). The kernel \(K(t) = t^{N}\) is \({\mathbb {H}}\)-liftable,

$$\begin{aligned} t^{N}=\langle {N}! \mathbbm {1}_{[-N,-N+1]},e^{ t{\mathcal {A}}}\mathbbm {1}_{[0,1]}\rangle , \end{aligned}$$

where \(({\mathcal {A}}f)(x) = f(x+1)\), \(f\in {\mathbb {H}}\). By Theorem 3, the optimal control for the problem (4.1) has the form

$$\begin{aligned} {\hat{u}}(t) = \frac{N!}{2} \langle \mathbbm {1}_{[-N,-N+1]}, e^{(T-t){\bar{{\mathcal {A}}}}}\mathbbm {1}_{[0,1]} \rangle , \end{aligned}$$

where \({\bar{{\mathcal {A}}}} ={\mathcal {A}}-N!\langle \mathbbm {1}_{[-N,-N+1]}, \cdot \rangle \mathbbm {1}_{[0,1]} \). In this simple case, we are able to find an explicit expression for \(e^{(T-t){\bar{{\mathcal {A}}}}^*} \mathbbm {1}_{[-i,-i+1]}\). Indeed, it is easy to see that, for any \(i\in {\mathbb {N}} \cup \{0\}\), \(p \in {\mathbb {N}}\cup \{0\}\) and \(q=0,1,...,N\),

$$\begin{aligned} {\bar{{\mathcal {A}}}}^{p(N+1)+q} \mathbbm {1}_{[0,1]} = \sum _{j=0}^p (-1)^{p-j} (N!)^{p-j} \mathbbm {1}_{[-j(N+1) - q, -j(N+1) - q + 1]} \end{aligned}$$

and whence

$$\begin{aligned}&\langle \mathbbm {1}_{[-N,-N+1]}, e^{(T-t){\bar{{\mathcal {A}}}}}\mathbbm {1}_{[0,1]} \rangle \\&\quad = \left\langle \mathbbm {1}_{[-N,-N+1]}, \sum _{p=0}^\infty \sum _{q=0}^{N} \frac{(T-t)^{pN + p + q}}{(pN+p+q)!} \sum _{j=0}^p (-1)^{p-j} (N!)^{p-j} \mathbbm {1}_{[-j(n+1) - q, -j(N+1) - q + 1]} \right\rangle \\&= \sum _{p=0}^\infty \frac{(T-t)^{pN + p + N}}{(pN+p+N)!} (-1)^{p} (N!)^{p}\\&= (T-t)^{N} E_{N+1, N+1}(-N!(T-t)^{N+1}), \end{aligned}$$

where \(E_{a,b}(z):= \sum _{p=0}^\infty \frac{z^p}{\Gamma (ap+b)}\) is the Mittag-Leffler function. This, in turn, implies that

$$\begin{aligned} {\hat{u}}(t) = \frac{N!(T-t)^{N}}{2} E_{N+1, N+1}(-N!(T-t)^{N+1}). \end{aligned}$$
(4.2)

On Fig. 1, the black curve depicts the optimal \({\hat{u}}\) computed for the problem 4.1 with \(K(t) = t^2\) and \(T=2\) using (4.2); the othere curves are the approximated optimal controls \({\hat{u}}_{n,M}\) (as in (3.17)) computed for \(n=1,2,5,10\) and \(M=20\).

Fig. 1
figure 1

Optimal control of Volterra Ornstein-Uhlenbeck process with monomial kernel \(K(t) = t^2\) (in black) and control approximants \({\hat{u}}_{n,M}\)

Fig. 2
figure 2

Optimal advertising strategies for control problems with kernels \(K_1\)\(K_3\) from Example 2; plots related to the kernel \(K_i\) are contained in the ith column. Panels a1a3 depict the graphs of kernels \(K_1\)\(K_3\); each of b1b3 represents a sample path of the corresponding \(X^u_i(t)\) under optimal control with \(\alpha =0\) (orange) and \(\alpha =1\) (blue) as well as the approximated optimal control \({\hat{u}}_{n,M}\) itself (green). Panels c1c3 show \({\hat{u}}_{n,M}\) for \(\alpha =1\) (blue), \(\alpha = 1.5\) (orange) and \(\alpha = 2\) (green; in all three cases \(\beta = 1\)), whereas d1d3 plot the behaviour of \({\hat{u}}_{n,M}\) for \(\beta =1\) (blue), \(\beta = 1.5\) (orange) and \(\beta = 2\) (green; in all three cases \(\alpha = 1\))

Remark 6

The solution of the problem (4.1) described in Example 1 should be regarded only as an illustration of the optimization technique via infinite dimensional lift: in fact, the kernel K in this example is degenerate and thus the forward equation (4.1) can be solved explicitly. This means that other finite dimensional techniques could have been used in this case.

Example 2

(fractional and gamma kernels) Consider three optimization problems of the form

$$\begin{aligned} {\left\{ \begin{array}{ll} X_i^u(t) = \int _0^t K_i(t-s) \Big (\alpha u(s)- \beta X^u(s)\Big )ds + \int _0^t K_i(t-s) dW(s),\\ \mathbb {E}\left[ X_i^u(T) - \int _0^Tu^2(s)ds\right] \rightarrow \max , \end{array}\right. }\quad i=1,2,3,\nonumber \\ \end{aligned}$$
(4.3)

\(u\in L^2_a\), where the kernels are chosen as follows: \(K_1(t):= t^{0.3}\) (fractional kernel), \(K_2(t):= t^{1.1}\) (smooth kernel) and \(K_3(t):= e^{-t}t^{0.3}\) (gamma kernel). In these cases, we apply all the machinery presented in Sect. 3 to find \({\hat{u}}_{n,M}\) for each of the optimal control problems described above. In our simulations, we choose \(T=2\), \(n=20\), \(M=50\); the mesh of the partition for simulating sample paths of \(X^u\) is set to be 0.05, \(\sigma = 1\), \(X(0) = 0\).

Figure 2 depicts approximated optimal controls for different values of \(\alpha \) and \(\beta \). Note that the gamma kernel \(K_3(t)\) (third column) is of particularly interest in optimal advertising. This kernel, in fact, captures the peculiarities of the empirical data (see Leone 1995) since the past dependence comes into play after a certain amount of time (like a delayed effect) and its relevance declines as time goes forward.

Remark 7

Note that the stochastic Volterra integral equation from (4.3) can be sometimes solved explicitly for certain kernels (e.g. via the resolvent method). For instance, the solution \(X^u\) which corresponds to the fractional kernel of the type \(K(t) = t^h\), \(h>0\), and \(\beta = 1\) has the form

$$\begin{aligned} X^u(t)= & {} \Gamma (h+1) \int _0^t (t-s)^h E_{h+1,h+1} \left( -\Gamma (h+1)(t-s)^{h+1} \right) \\{} & {} \quad \left( \alpha u(s) ds + dW(s) \right) , \quad t\in [0,T], \end{aligned}$$

where \(E_{a,b}\) again denotes the Mittag-Leffler function. Having the explicit solution, one could solve the optimization problem (4.3) by plugging in the shape of \(X^u\) to the performance functional and applying the standard minimization techniques in Hilbert spaces. However, as mentioned in the introduction, this leads to some tedious calculations that are complicated to implement, whereas our approach allows to get the approximated solution in a relatively simple manner.