1 Introduction

From the modelling point of view, the main idea of time changing a stochastic process is to represent a stochastic function \(X= X_t\), \(t\ge 0\), by a process \(L= L_t\), \(t \ge 0\), with well-known structure serving as a base, and a (possibly stochastic) perturbation \(\Lambda = \Lambda _t\), \(t \ge 0\), of the time line

$$\begin{aligned} X_t = L_{\Lambda _t}. \end{aligned}$$

This is a way to change the speed of motion along the trajectories of L, like treating the time line as an elastic band. This modelling potential as well as the easiness in simulation have fascinated many researchers. See, e.g., [10, 48]. The concept of time perturbation is formally defined as follows.

Definition 1

A general time change process \(\Lambda = \Lambda _t\), \(t\ge 0\), is a stochastically continuous, non-decreasing càdlàg stochastic process starting at 0.

In our work the time changed processes are used as models for the noise driving the dynamics. As we shall see such noises can be embedded in the framework of general martingales when dealing with the calculus perspective. However, they also benefit of more explicit statistical properties that constitute a good trade-off between generality and specification, which is in itself interesting from the modelling perspective. Indeed we have examples both from the point of view of stochastic volatility in finance (see, e.g., [8, 14]) and turbulence in physics (see, e.g., [7, 11]). Using time change noises, one can have both Markovian and non-Markovian structures and also include clustering effects as discussed in [47, Chapter IV, 3e] and confirmed in the study [44, Chapter 3] on market microstructure.

The potential of time change in modelling raises from the pivotal results of Dambis [17] and Dubins and Schwarz [30], in which it is clarified that, for any continuous local martingale \(M= M_t\), \({t \ge 0}\) with \(M_0=0\) and \(\langle M \rangle _\infty =+\infty \), there exists a Brownian motion \(W=W_t\), \({t \ge 0}\), such that \(M_t =W_{\langle M \rangle _t}\), for every \(t \ge 0\). As a particular case we can consider martingales arising from classical Itô stochastic integrals

$$\begin{aligned} M_t= \int _0^t \sigma _s dw_s, \quad t\ge 0, \end{aligned}$$

with respect to a Brownian motion w. Then, the result above leads to the representation

$$\begin{aligned} M_t = W_{\Lambda _t}, \quad \Lambda _t= \langle M \rangle _t = \int _0^t\sigma ^2_s ds \end{aligned}$$
(1)

where W is another Brownian motion. These considerations reflect the scaling property of Brownian motion, namely, \(c W_t \, {\mathop {=}\limits ^{d}} \, W_{c^2 t }\) \((c>0)\). In this respect, it is also interesting to remark that, among all Lévy processes, besides Brownian motion, only \(\alpha \)-stable processes share a property similar to (1). Indeed Rosinski and Woyczynski [43, Theorem 3.1] proved that stochastic integrals \(X_t = \int _0^t \sigma _s d\mathcal {L}_s\), with respect to the \(\alpha \)-stable Lévy process \(\mathcal {L}\) \((\alpha \in (0,2])\) can be represented as

$$\begin{aligned} X_t = L_{\Lambda _t}, \quad \Lambda _t = \int _0^t\sigma ^\alpha _s ds \end{aligned}$$
(2)

where L is another \(\alpha \)-stable Lévy process. Here again we observe that \(\alpha \)-stable processes enjoy the scaling property \(\sigma L_t \, {\mathop {=}\limits ^{d}} \, L_{\sigma ^\alpha t}\). Representations (1) and (2) triggered the use of time change in stochastic volatility modelling, which started with the works [5, 8, 13, 14] and it is still of large interest nowadays. See, e.g., [11, 15, 34, 48].

For a given stochastic process X, the problem of the very existence of a representation in the form \(X_t \, {\mathop {=}\limits ^{d}} \, W_{\Lambda _t}\) in terms of a Brownian motion and a family of stopping times \(\Lambda _t\), \(t \ge 0\), is the so-called Skorokhod embedding problem, which has driven researchers to discuss different solutions and to provide extensions beyond W as base, eventually to include also jumps. For a survey on the subject we can refer to [38].

On the other side, the problem of determining the time change \(\Lambda \) given the observations of X and fixed the base process (as a Brownian motion W, or even a Lévy process L) is known as the recovery problem and it was first studied in [50], see also [45].

From the probabilistic perspective, two classes of time change processes have turned to be successful: the subordinators (i.e. non-decreasing Lévy processes) and the absolutely continuous time change processes, characterised by

$$\begin{aligned} \Lambda _t = \int _0^t \lambda _s \, ds, \end{aligned}$$

with \(\lambda \) being a non-negative integrable process representing the time change rate. While with the case of subordinators, the time changed Lévy process \(X:= L_\Lambda \) is still a Lévy process, with the case of absolutely continuous time changes (with non-constant rate), the process X exits the Lévy family and even the Markov family, giving in this way chances of describing forms of time dependences within the dynamics. Time changed Lévy processes with absolutely continuous time change process are widely used in applications to finance and beyond. As illustration we can propose the representation of affine processes in terms of these time changed processes, see [36]. We recall that affine processes are popular in the modelling, e.g., of interest rate and energy futures.

In the present work, we consider time change processes of the absolutely continuous type. Those noises are the stochastic driving forces of the controlled dynamics we study. The goal of this article is to discuss some techniques of stochastic optimisation for such dynamics with emphasis on the role of information and the probabilistic structure of the noise. In Sect. 2 we give a description of the framework, while in the rest of the paper we summarise an approach via maximum principle to stochastic control, providing sufficient conditions to verify optimality. This is based on [23, 24, 29]. In Sect. 3 we lay out the ideas and justify the technical use of different information flows. We achieve the result exploiting backward stochastic differential equations (BSDEs) driven by time changed Lévy noises. In Sect. 4 we see how to adapt these ideas to the case of Volterra type controlled dynamics. For this, a substantially modified Hamiltonian functional has to be used together with the NA-derivative [20, 21]. The role of stochastic differentiation will be discussed. In Sect. 5 we extend the approach to treat forward–backward systems of equations.

As a side note, we remark that there are extensions of the concept of time change to infinite dimensions such as Hilbert space valued time changes, which first appeared in [37], then also used in, e.g., [4] and recently connected to infinite dimensional stochastic volatility models as, e.g., in [12]. Also further extensions are explored in the direction of the so-called meta time changes, which are general forms of distortions of the space-time, in the sense that not only time is perturbed, but also the spacial component. These are substantially non-negative random measures, so far, extending the concept of subordination. See [9].

2 The noise: time-space random fields

The stochastic elements refer to the complete probability space \(( \Omega , \mathcal {F}, \mathbb {P})\). The noise is represented as a random field on the time-space, which is here given by the field \(\mathbb {X}:= [0,T] \times \mathbb {R}\) \((T<\infty )\), where the spacial component represents the jump size in the paths. We can define two disjoint complementary sets

$$\begin{aligned} \mathfrak {X}_B := [0,T]\times \{0\}, \quad \mathfrak {X}_H:= [0,T]\times {\mathbb {R}_0}\quad ({\mathbb {R}_0}:= \mathbb {R}\setminus \{0\}), \end{aligned}$$

where the first set accommodates the random field of continuous nature (no jumps corresponds to having jump size null) and the second accommodates the random fields with jumps. We remark that our work can deal with further spacial components, for example of the type \(\mathbb {X}:= [0,T] \times \mathbb {R}\times \mathbb {R}^d\). In this case the extra spacial component may assume different meaning: in physics, this could be the actual space, in finance, this can represent the space of available assets as, e.g., [18, 22].

The noise is modelled by the random field \(\mu \), which is defined as the mixture

$$\begin{aligned} \mu (\Delta , \omega ) := B\left( \Delta \cap \mathfrak {X}_B,\omega \right) + \tilde{H}\left( \Delta \cap \mathfrak {X}_H, \omega \right) , \quad \Delta \subseteq \mathbb {X}, \omega \in \Omega , \end{aligned}$$
(3)

of two components on the Borel sets \(\mathcal {B}_\mathbb {X}\) of \(\mathbb {X}\) described here below as conditional Gaussian measure B and conditional centered Poisson measure \(\tilde{H}\).

To proceed with the formal definition, we first introduce yet another random measure, which will be directly connected with the concept of time change. We adopt the framework introduced in [29].

Definition 2

We define the random measure \(\Lambda \) on \(\mathcal {B}_\mathbb {X}\) by

where the stochastic process \(\lambda =(\lambda ^B,\lambda ^H)\) has non-negative, stochastically continuous components in \(L^1(P\times dt)\). The set of these processes is denoted \(\mathcal {L}\). The \(\mathbb {P}\)-augmented filtration generated by \(\Lambda \) is denoted \(\mathbb {F}^\Lambda = \left\{ \mathcal {F}^\Lambda _t, \quad t \in [0,T]\right\} \). Set \(\mathcal {F}^\Lambda := \mathcal {F}^\Lambda _T\).

Here above \(\nu \) is a \(\sigma \)-finite measure on the Borel sets \(\mathcal {B}_{{\mathbb {R}_0}}\) of \({\mathbb {R}_0}\) satisfying \(\int _{{\mathbb {R}_0}}z^2\nu (dz)<\infty \). The structure of the noise (3) is linked to the two following components.

Definition 3

The conditional Gaussian measure B is a random measure on \(\mathcal {B}_{\mathfrak {X}_B}\) such that

  1. (A1)

    \(\mathbb {P}\left( B(\Delta ) \le x \,\Big \vert \mathcal {F}^\Lambda \right) = \mathbb {P}\left( B(\Delta ) \le x \,\Big \vert \Lambda ^B(\Delta ) \right) = \Phi \left( \frac{x}{\sqrt{ \Lambda ^B(\Delta )}}\right) \), \(x\in \mathbb {R}\), \(\Delta \subseteq \mathfrak {X}_B\), where \(\Phi \) is the standard normal cumulative distribution function.

  2. (A2)

    \(B(\Delta _1)\) and \(B(\Delta _2)\) are conditionally independent given \(\mathcal {F}^\Lambda \) whenever \(\Delta _1\) and \(\Delta _2\) are disjoint sets.

The conditional Poisson measure H is a random measure on \(\mathcal {B}_{\mathfrak {X}_H}\) such that

  1. (A3)

    \(\mathbb {P}\left( H(\Delta ) = k \,\Big \vert \mathcal {F}^\Lambda \right) = \mathbb {P}\left( H(\Delta ) = k \,\Big \vert \Lambda ^H(\Delta ) \right) = \frac{\Lambda ^H(\Delta )^k}{k!} e^{-\Lambda ^H(\Delta )}\), \(k\in \mathbb {N}\), \(\Delta \subseteq \mathfrak {X}_H\),

  2. (A4)

    \(H(\Delta _1)\) and \(H(\Delta _2)\) are conditionally independent given \(\mathcal {F}^\Lambda \) whenever \(\Delta _1\) and \(\Delta _2\) are disjoint sets.

Furthermore we assume that

  1. (A5)

    B and H are conditionally independent given \(\mathcal {F}^\Lambda \).

We refer to [32] or [35] for the existence of conditional distributions in Definition 3. In (3) we use the centred conditional Poisson measure, which is the signed random measure \(\tilde{H}\) given by

$$\begin{aligned} \tilde{H}(\Delta ) = H(\Delta ) - \Lambda ^H(\Delta ), \quad \Delta \subset \mathfrak {X}_H. \end{aligned}$$

It is easy to verify that the random measure \(\mu \) in (3) is a martingale random field, see e.g. [22, Definition 2.1], with respect to both the following right-continuous information flows \(\mathbb {F}\) and \(\mathbb {G}\). The first filtration is

$$\begin{aligned} \mathbb {F} :=\left\{ \mathcal {F}_t, \, t \in [0,T]\right\} , \quad \text { where } \mathcal {F}_t:=\bigcap _{r>t}\mathcal {F}_r^\mu , \end{aligned}$$

and \(\mathbb {F} ^\mu :=\left\{ \mathcal {F}_t^\mu , \, t \in [0,T]\right\} \) is the \(\mathbb {P}\)-augmented filtration generated by the values \(\mu (\Delta )\), \(\Delta \subset [0,t]\times \mathbb {R}\), \(t \in [0,T]\). The second filtration is

$$\begin{aligned} \mathbb {G} :=\left\{ \mathcal {G}_t, \ t \in [0,T]\right\} , \quad \text { where } \mathcal {G}_t:=\mathcal {F}_t^\mu \vee \mathcal {F}^\Lambda . \end{aligned}$$

Notice that we have \(\mathcal {G}_T=\mathcal {F}_T\) at the horizon T, while the initial information is different in the two flows: \(\mathcal {G}_0=\mathcal {F}^\Lambda \) and \(\mathcal {F}_0\) is trivial. We can understand \(\mathbb {G} \) as an enlargement of the information \(\mathbb {F}\) by the future values of \(\Lambda \), else we can regard \(\mathbb {F}\) as partial with respect to \(\mathbb {G}\). From now on, we set \(\mathcal {F}:= \mathcal {F}_T= \mathcal {G}_T\).

The filtration \(\mathbb {G}\) has little real world application since it includes some information of the future values of \(\Lambda \). However, it will be used technically in the sequel to exploit the distributional nature of the noise from the martingale representation theorem point of view. Indeed we recall that given a square integrable martingale \(\mathcal {M}\) with respect to some reference filtration

$$\begin{aligned} \mathbb {F} ^{\mathcal {M}} = \left\{ \mathcal {F}^{\mathcal {M}}_t, \ t \in [0,T]\right\} , \end{aligned}$$

any square integrable \(\mathcal {F}_T^{\mathcal {M}}\)-measurable random variable \(\xi \) admits representation

$$\begin{aligned} \xi = \xi ^{\perp } + \int \varphi \ d\mathcal {M} \end{aligned}$$
(4)

by means of a unique stochastic integrand \(\varphi \). Here \(\xi ^{\perp }\) is a stochastic remainder orthogonal to all the stochastic integrals with respect to \(\mathcal {M}\) (Kunita–Watanabe integral representation). It is well known that \(\xi ^{\perp }\) is a constant whenever \(\mathcal {M}\) is a Gaussian or a centered Poisson random measure, or a mixture of the two, and the reference filtration \(\mathbb {F}^{\mathcal {M}}\) is generated by \(\mathcal {M}\), see [16, 19] and also [21, 27]. In addition we can show that

$$\begin{aligned} \xi ^{\perp } = \mathbb {E}[\xi \vert \mathcal {F}^\Lambda ] \end{aligned}$$

in the case when the martingale \(\mathcal {M}\) is given by \(\mu \) in (3) with respect to the filtration \(\mathbb {G}\). See [29, Theorem 3.5]. We stress that this stochastic integral representations stand at the foundation of the solutions of BSDEs. Specifically, the following explicit stochastic representation theorem is considered in the sequel. See the original version in [20] and the versions for random fields appearing in [22, 28].

Theorem 1

For any \(\xi \in L^2(d\mathbb {P})\) we have

$$\begin{aligned} \xi&= \mathbb {E}[\xi \vert \mathcal {F}^\Lambda ] +\int _\mathbb {X}\mathfrak {D}_{t,z} \xi \ \mu (dtdz) \nonumber \\&= \mathbb {E}[\xi \vert \mathcal {F}^\Lambda ] +\int _{\mathfrak {X}_B}\mathfrak {D}_{t,0} \xi \, B(dt) + \int _{\mathcal {X}_H} \mathfrak {D}_{t,z} \xi \ \tilde{H}(dtdz), \end{aligned}$$
(5)

by means of the NA-derivative \(\mathfrak {D}\xi \) which is defined as the \(L^2(\Lambda \times d\mathbb {P})\) limit

$$\begin{aligned} \mathfrak {D}\xi = \lim _{n \rightarrow \infty } \varphi _n \end{aligned}$$
(6)

of the simple predictable random fields \((\varphi _n)_{n\in \mathbb {N}}\) in the form

$$\begin{aligned} \varphi _n(t,z) := \sum _{k=1}^{K_n} \mathbb {E}\left[ \xi \frac{\mu (\Delta _{nk})}{ \mathbb {E}[\Lambda (\Delta _{nk}) \vert \mathcal {G}_{s_{nk}}]} \Big \vert \mathcal {G}_{s_{nk}} \right] \mathbf {1}_{\Delta _{nk}}(t,z), \quad (t,z) \in \mathbb {X}, \end{aligned}$$
(7)

defined on some intersecting system \((\Delta _{nk})_{k=1,\ldots ,K_n; n\in \mathbb {N}} \) with \(\Delta _{nk} = (s_{nk}, u_{nk}] \times B_{nk}\) and \(B_{nk}\) belonging to a countable semiring generting \(\mathcal {B}_\mathbb {R}\).

From (6), (7) we see that \(\mathfrak {D}\xi \in \mathcal {I}^\mathbb {G}\), which is the space of Itô integrands, i.e. \(\mathbb {G}\)-predictable processes in \(L^2(d\Lambda \times d\mathbb {P})\). Furthermore, by (5) it is easy to see that the NA-derivative is the dual of the Itô integral. For later use, we recall that any \(\varphi \in \mathcal {I}^\mathbb {G}\) can be regarded as a stochastic process with values in \(\mathcal {Z}\), which is the space of functions \(\phi :\mathbb {R}\rightarrow \mathbb {R}\) such that

$$\begin{aligned} \vert \phi (0)\vert ^2+ \int \limits _{\mathbb {R}_0}\vert \phi (z)\vert ^2 \,\nu (dz) < \infty . \end{aligned}$$

Representation (5) is explicit, compared to (4), because it indicates, besides existence, the nature of the integrand. In fact formulae (6), (7) depend only on the random variable \(\xi \) to be represented and the framework \((\Omega , \mathbb {G}, \mathbb {P}, \mu )\). This type of stochastic integral representations is in line with the well-known Clark–Ocone formula, in which the integrand is given in terms of the projections of the Malliavin derivative on the information flow. We remark that the NA-derivative coincides with the Clark–Ocone formula whenever the integrator \(\mathcal {M}= \mu \) is a Brownian motion or a centred Poisson random measure, or mixtures of the two, and when \(\xi \) is Malliavin differentiable. The domain of the Malliavin derivative is contained, but not equal, to \(L^2(d\mathbb {P})\).

To conclude on this section we recall the following result [46, Theorem 3.1] (see also [32]), which explicitly connects the noise \(\mu \) in (3) with the concept of time-change.

Theorem 2

Let \(W_t\), \(t\in t \in [0,T]\) be a Brownian motion and \(N_t\), \(t\in t \in [0,T]\) be a centred pure jump Lévy process with Lévy measure \(\nu \). Assume that both W and N are independent of \(\Lambda \). Then B satisfies (A1)–(A2) if and only if, for any t,

$$\begin{aligned} B_t {\mathop {=}\limits ^{d}} W_{\Lambda _t^B}, \end{aligned}$$

and \(\eta _t := \int _0^t\int _{{\mathbb {R}_0}} z \,\tilde{H}(ds,dz)\), \(t \in [0,T]\), satisfies (A3)–(A4) if and only if, for any t,

$$\begin{aligned} \eta _t {\mathop {=}\limits ^{d}} N_{\hat{\Lambda }^H_t} \end{aligned}$$

with \(\hat{\Lambda }^H_t := \int _0^t \lambda _s^H \,ds\), for \(t \in [0,T]\).

3 A sufficient maximum principle

Hereafter we highlight the use of information to tackle optimal control problems for dynamics driven by time-changed Lévy noises. We remark immediately that these problems cannot be solved with dynamic programming methods since the state process is, in general, not Markovian. The approach we present here is via the maximum principle. We also make use of the enlarged filtration \(\mathbb {G}\) to exploit the structure of the noise. We refer to [29] for the original presentation of the main ideas.

We aim to find the optimal control \(\hat{u}\) providing the maximal value over all admissible controls \(\mathcal {A}^{\mathbb {F}}\),

$$\begin{aligned} J(\hat{u}) = \sup _{u\in \mathcal {A}^{\mathbb {F}}} J(u), \end{aligned}$$
(8)

where the performance functional is

$$\begin{aligned} J(u) = \mathbb {E}\left[ \int \limits _0^T F_t(\lambda _t,u_t,X_{t}) \,dt + G(X_T) \right] . \end{aligned}$$
(9)

Here \(G(x,\omega )\), \(\omega \in \Omega \), is a concave random variable for all \(x\in \mathbb {R}\), differentiable in x a.s. and \(F_t(\lambda , u,x,\omega )\), \(t\in [0,T]\), \(\lambda \in [0,\infty )^2\), \(u\in \mathcal {U}\), \(x\in \mathbb {R}\), \(\omega \in \Omega \) is a \(\mathbb {F}\)-adapted stochastic process differentiable in x a.s. Here \(\mathcal {U}\subseteq \mathbb {R}\) is a closed convex set.

Problem (8), (9) is apparently standard in stochastic control, however, it is the state process X that presents elements of novelty given by the driving noise \(\mu \), which provides a non-Markovian framework in general. The state process \(X=X_t\), \(t\in [0,T]\), follows the dynamics

$$\begin{aligned} dX_t&= b_t(\lambda _t,u_t, X_{t\text {-}}) \,dt + \int \limits _\mathbb {R}\kappa _t(z,\lambda _t, u_t,X_{t\text {-}}) \,\mu (dtdz), \quad X_0 \in \mathbb {R}, \end{aligned}$$
(10)

where \(b_t(\lambda ,u,x)\) and \(\kappa _t(z,\lambda ,u,x)\), \(t\in [0,T]\), \(\lambda \in [0,\infty )^2\), \(z\in \mathbb {R}\), \(u\in \mathcal {U}\), \(x\in \mathbb {R}\) are \(\mathbb {F}\)-predictable stochastic processes differentiable in x a.s. All partial derivatives with respect to some variable x are denoted by \(\partial _x\) here and in the sequel. Sufficient conditions for the existence of an \(\mathbb {F}\)-adapted strong solution to (10) can be found in [33].

In line with the maximum principle (MP) approach, we introduce a Hamiltonian functional \(\mathcal {H}: [0,T]\times [0,\infty )^2\times \mathcal {U}\times \mathbb {R}\times \mathbb {R}\times \mathcal {Z}\times \Omega \rightarrow \mathbb {R}\) given by the stochastic function

$$\begin{aligned} \mathcal {H}_t(\lambda ,u,x,p,q) =&\; F_t(\lambda ,u,x) + b_t(\lambda ,u,x) p + \kappa _t(0,\lambda ,u,x) q(0) \lambda ^B \nonumber \\&+ \int \limits _{\mathbb {R}_0}\kappa _t(z,\lambda ,u,x) q(z) \,\lambda ^H \,\nu (dz). \end{aligned}$$
(11)

Corresponding to an admissible pair (uX) we have the couple \((p,q) \in L^2(dt\times d\mathbb {P}) \times \mathcal {I}^\mathbb {G}\), which is the solution to the BSDE

$$\begin{aligned} p_t&= \partial _x G(X_T)+ \int \limits _t^T \partial _x \mathcal {H}_s(\lambda , u_s, X_{s}, p_{s},q_s)\,ds - \int \limits _t^T \int \limits _\mathbb {R}q_s(z) \,\mu (ds,dz). \end{aligned}$$
(12)

Here \(\partial _x \mathcal {H}_t = \frac{\partial }{\partial x} \mathcal {H}_t(\lambda , u, x,p, q)\) and we note that \(\mathcal {H}\) is differentiable in x by the assumptions on F, b and \(\kappa \). A solution to Eq. (12) is studied in [29, Theorem 4.5]. There sufficient conditions to ensure that the pair \(\left( \partial _x \mathcal {H}, \partial _x G(X_T)\right) \) are standard parameters (see [29, Definition 4.1]) are given. In our context these would read: There exists \(K_1>0\)

$$\begin{aligned} \Big \vert \partial _x \kappa _t(0,\lambda _t,u_t,X_{t\text {-}}) \Big \vert \sqrt{\lambda ^B_t}&\le K_1 \quad dt\times d\mathbb {P}\text {-a.e}, \end{aligned}$$
(13)
$$\begin{aligned} \int \limits _{{\mathbb {R}_0}} \left( \partial _x \kappa _t(z,\lambda _t,u_t,X_{t\text {-}}) \right) ^2\nu (dz) \sqrt{\lambda ^H_t}&\le K_1 \quad dt\times d\mathbb {P}\text {-a.e}, \end{aligned}$$
(14)
$$\begin{aligned} \big \vert \partial _x b_t(\lambda _t,u_t,X_{t\text {-}}) \big \vert&\le K_1 \quad dt\times d\mathbb {P}\text {-a.e} . \end{aligned}$$
(15)

Remark 1

We have to be aware that to find a solution to the BSDE (12), we work under filtration \(\mathbb {G}\) as it is the representation theorem in the form (5) (see Theorem 1) that provides the existence of the solution as shown in [29, Theorem 4.5]. Note that if another filtration were to be used, then the BSDE would present an additional term (an orthogonal martingale) as intrinsic part of the solution. This would not be suitable to pursue a maximum principle type result (as it would involve the explicit use of the quadratic variation of the orthogonal martingale, which is not explicitly known).

Definition 4

The admissible controls in (8), (9) are \(\mathbb {F}\)-predictable stochastic processes \(u: [0,T]\times \Omega \rightarrow \mathcal {U}\), such that the corresponding state process X in (10) has a unique strong solution in \(L^2(dt\times d\mathbb {P})\), the adjoint equation (12) has a unique solution in \(L^2(dt \times d\mathbb {P})\times \mathcal {I}^\mathbb {G}\), and

$$\begin{aligned} \mathbb {E}\left[ \int \limits _0^T \vert F_t( \lambda _t,u_t,X_{t\text {-}})\vert ^2 \,dt + \vert G(X_T) \vert + \vert \partial _x G(X_T) \vert ^2 \right] < \infty . \end{aligned}$$

The couple (uX) is called an admissible pair. The set of admissible controls is denoted by \(\mathcal {A}^\mathbb {F}\).

Once evaluating the Hamiltonian functional (11) in the solution (pq) to (12), we obtain a \(\mathbb {G}\)-adapted functional. Thus we cannot apply directly the maximum principle to \(\mathcal {H}\) to solve (8) (where actually solutions should be \(\mathbb {F}\)-adapted) as it could be done in the case of controlled jump-diffusions, see e.g. [39]. In our context, we need then to ”project" the Hamiltonian onto the information flow \(\mathbb {F}\) regarded as “partial" information with respect to \(\mathbb {G}\), here we are inspired by [25]. Then we have the following result.

Theorem 3

Let \(\lambda \in \mathcal {L}\) be fixed. Let \(\hat{u}\in \mathcal {A}^{\mathbb {F}}\). Denote the corresponding state process as \(\hat{X}\) with solution \((\hat{p},\hat{q})\) of the adjoint equation (12). Set

$$\begin{aligned} \mathcal {H}_t^{\mathbb {F}}(\lambda _t,u,x,\hat{p}_{t}, \hat{q}_t ) :&= \mathbb {E}\left[ \mathcal {H}_t(\lambda _t, u, x,\hat{p}_{t}, \hat{q}_t ) \,\big \vert \mathcal {F}_t \right] \nonumber \\&= F_t(\lambda _t,u,x) + b_t(\lambda _t,u,x) \mathbb {E}\left[ \hat{p}_{t} \,\big \vert \mathcal {F}_t \right] + \kappa _t(0,\lambda _t, u,x) \mathbb {E}\left[ \hat{q}_t(0) \,\big \vert \mathcal {F}_t \right] \lambda ^B_t \nonumber \\&\quad + \int \limits _{\mathbb {R}_0}\kappa _t(z,\lambda _t,u,x) \mathbb {E}\left[ q_t(z) \,\big \vert \mathcal {F}_t \right] \,\lambda _t^H \,\nu (dz) \end{aligned}$$

for all \(t\in [0,T]\). If

$$\begin{aligned} h_t^{\mathbb {F}}(x) := \mathop {\mathrm {ess\,sup}}\limits _{u\in \mathcal {U}} \mathcal {H}_t^{\mathbb {F}} \left( \lambda _t,u, x, \hat{p}_{t}, \hat{q}_{t} \right) \end{aligned}$$
(16)

exists and is a concave function in x for all \(t\in [0,T]\) and

$$\begin{aligned} \mathcal {H}_t^{\mathbb {F}}(\lambda _t,\hat{u}_t,\hat{X}_t,\hat{p}_{t}, \hat{q}_t ) =h_t^{\mathbb {F}}( \hat{X}_t), \end{aligned}$$
(17)

then \((\hat{u},\hat{X})\) is an optimal pair for (8), (9).

In the sequel we set \(\hat{b}_s = b_s(\lambda _s,\hat{u}_s,\hat{X}_{s\text {-}})\), etc. for the coefficients associated with the admissible pair \((\hat{u},\hat{X})\) with solution \((\hat{p},\hat{q})\) of the adjoint equation (12). Set \(b_s = b_s(\lambda _s,u_s,X_{s\text {-}})\) and so forth for the coefficients associated to another arbitrary admissible pair (uX). In addition \(\hat{\mathcal {H}}_s(u,x)= \mathcal {H}_s(\lambda _s,u,x,\hat{p}_{s},\hat{q}_{s})\).

In the proof we assume that the integrals are well defined. Some sufficient conditions can be given to ensure this. For instance we can assume (13)–(15) and that there exist a \(K_2>0\) such that

$$\begin{aligned}&\big \vert b_t(\lambda ,u,x) - b_t(\lambda ,u,x') \big \vert \le K_2 \vert x-x' \vert \end{aligned}$$
(18)
$$\begin{aligned}&\big \vert \kappa _t(0,\lambda ,u,x)-\kappa _t(0,\lambda ,u,x') \big \vert \le K_2 \big \vert x-x' \big \vert \end{aligned}$$
(19)
$$\begin{aligned}&\int _{\mathbb {R}_0}\big \vert \kappa _t(z,\lambda ,u,x) - \kappa _t(z,\lambda ,u,x') \big \vert ^2 \nu (dz) \le K_2 \vert x-x' \vert ^2 \end{aligned}$$
(20)

\(\mathbb {P}\text {-a.s.},\) for all \(t \in [0,T]\), \(\lambda \in [0,\infty )^2\), \(u\in \mathcal {U}\) and all \(x,x' \in \mathbb {R}\).

Proof

Observe that

$$\begin{aligned} J(\hat{u}) - J(u) = \mathbb {E}\left[ G( \hat{X}_T) - G(X_T) \right] + \mathbb {E}\left[ \int \limits _0^T \left\{ \hat{F}_s - F_s \right\} \,ds \right] = I_1 + I_2 . \end{aligned}$$

Since \(\hat{X}_0-X_0 = 0\) and G is concave, we have

$$\begin{aligned} I_1= \mathbb {E}\left[ G (\hat{X}_T) - G(X_T) \right] \ge \mathbb {E}\left[ \partial _x G(\hat{X}_T) \left( \hat{X}_T - X_T \right) \right] = \mathbb {E}\left[ \hat{p}_T \left( \hat{X}_T - X_T \right) \right] . \end{aligned}$$

By the Itô’s formula, we have

$$\begin{aligned} I_1 \ge&\mathbb {E}\Big [ \int \limits _0^T \big \{ - \big ( \hat{X}_{s} - X_{s} \big ) \partial _x \hat{\mathcal {H}}_s(\hat{u}_s,\hat{X}_{s}) + \hat{p}_{s} \big (\hat{b}_s- b_s \big ) \,\big \} ds \\&+ \int \limits _0^T\int \limits _{\mathbb {R}} \Big \{ \hat{p}_{s\text {-}} \big ( \hat{\kappa }_s(z)-\kappa _s(z) \big ) + \big (\hat{X}_{s\text {-}} - X_{s\text {-}} \big ) \hat{q}_s(z) \Big \} \,\mu (ds,dz) \\&+ \int \limits _0^T \big ( \hat{\kappa }_s(0)-\kappa _s(0) \big ) \hat{q}_s(0) \,\lambda ^B_s \,ds + \int \limits _0^T\int \limits _{\mathbb {R}_0}\big ( \hat{\kappa }_s(z)-\kappa _s(z) \big ) \hat{q}_s(z) \,\nu (dz)\lambda ^H_sds \Big ] \\ =&\mathbb {E}\Big [ \int \limits _0^T \big \{ -\big ( \hat{X}_{s} - X_{s} \big ) \partial _x \hat{\mathcal {H}}_s(\hat{u}_s,\hat{X}_{s}) + \hat{p}_{s\text {-}} \big (\hat{b}_s- b_s \big ) \,\big \} ds \\&+ \int \limits _0^T\int \limits _\mathbb {R}\big ( \hat{\kappa }_s(z)-\kappa _s(z)\big ) \hat{q}_s(z) \,\Lambda (dsdz) \Big ]. \end{aligned}$$

Furthermore, from the Hamiltonian functional (11), we have

$$\begin{aligned} I_2=&\mathbb {E}\Big [ \int \limits _0^T \big \{ \hat{F}_s - F_s \big \} \,ds \Big ] =\mathbb {E}\Big [ \int \limits _0^T \Big \{ \hat{\mathcal {H}}_s(\hat{u}_s,\hat{X}_{s}) -\hat{\mathcal {H}}_s(u_s,X_{s}) - \big (\hat{b}_s -b_s\big ) \hat{p}_{s}\\&- \big (\hat{\kappa }_s(0)-\kappa _s(0) \big )\hat{q}_s(0) \lambda ^B_s - \int \limits _{\mathbb {R}_0}\big ( \hat{\kappa }_s(z)-\kappa _s(z)\big ) \hat{q}_s(z) \,\nu (dz) \lambda ^H_s \Big \} \,ds \Big ] \\ =&\mathbb {E}\Big [ \int \limits _0^T \Big \{ \hat{\mathcal {H}}_s(\hat{u}_s,\hat{X}_{s}) - \hat{\mathcal {H}}_s(u_s,X_{s}) -\big (\hat{b}_s -b_s\big ) \hat{p}_{s} \Big \} \,ds \\&- \int \limits _0^T\int \limits _\mathbb {R}\big ( \hat{\kappa }_s(z)-\kappa _s(z)\big )\hat{q}_s(z) \,\Lambda (dsdz) \Big ]. \end{aligned}$$

Hence

$$\begin{aligned}&J(\hat{u})-J(u) \nonumber \\&\quad \ge \mathbb {E}\Big [ \int \limits _0^T\mathbb {E}\big [ \hat{\mathcal {H}}_s(\hat{u}_s,\hat{X}_{s}) -\hat{\mathcal {H}}_s(u_s,X_{s})-\big ( \hat{X}_{s} - X_{s} \big ) \partial _x \hat{\mathcal {H}}_s(\hat{u}_s,\hat{X}_{s}) \big \vert \mathcal {F}_s \big ]\,ds \Big ] \nonumber \\&\quad = \mathbb {E}\Big [ \int \limits _0^T \hat{F}_s - F_s + \big ( \hat{b}_s-b_s \big ) \mathbb {E}\big [ \hat{p}_{s} \,\big \vert \mathcal {F}_s \big ] + \big (\hat{X}_{s}-X_{s}\big ) \partial _x \hat{F}_s + \partial _x \hat{b}_s \mathbb {E}\big [ \hat{p}_{s} \,\big \vert \mathcal {F}_s \big ] \,ds \Big ] \nonumber \\&\qquad +\mathbb {E}\Big [ \int \limits _0^T \int \limits _\mathbb {R}\Big \{ \big (\hat{\kappa }_s(z) -\kappa _s(z) \big ) \mathbb {E}\big [ \hat{q}_s(z) \,\big \vert \mathcal {F}_s \big ] \nonumber \\&\qquad + \big (\hat{X}_{s}-X_{s}\big ) \partial _x \hat{\kappa }_s(z) \mathbb {E}\big [ \hat{q}_s(z) \,\big \vert \mathcal {F}_s \big ] \Big \} \,\Lambda (dsdz) \Big ] \nonumber \\&\quad = \mathbb {E}\Big [ \int \limits _0^T \hat{\mathcal {H}}_s^\mathbb {F}(\hat{u}_s,\hat{X}_{s}) - \hat{\mathcal {H}}_s^\mathbb {F}(u_s,X_{s}) -\partial _x \hat{\mathcal {H}}_s^\mathbb {F}(\hat{u}_s,\hat{X}_{s}) \big ( \hat{X}_{s} - X_{s\text {-}} \big ) \,ds \Big ]. \end{aligned}$$
(21)

We can show that the integrand here above is non-negative \(dt \times d\mathbb {P}\)-a.e. applying a separating hyperplane argument (see [42, Chapter 5, Section 23]) to the concave function (16) and exploiting the maximality (17). \(\square \)

4 MP for Volterra dynamics

In a fashion similar to Sect. 3 we consider the control problem

$$\begin{aligned} J(\hat{u}) = \sup _{u\in \mathcal {A}^{\mathbb {F}}} J(u), \end{aligned}$$
(22)

associated to the performance functional

$$\begin{aligned} J(u) = \mathbb {E}\Big [ \int \limits _0^T F_t(\lambda _t,u_t,X_{t}) \,dt + G(X_T) \Big ]. \end{aligned}$$
(23)

cf. (8), (9). However, we now study a state process X with Volterra type dynamics:

(24)

where \(X_0\in \mathbb {R}\) and the coefficients are given by the mappings

$$\begin{aligned}&b:[0,T]\times [0,T]\times [0,\infty )^2\times \mathcal {U}\times \mathbb {R}\times \Omega \longrightarrow \mathbb {R}, \\&\kappa :[0,T]\times [0,T]\times \mathbb {R}\times [0,\infty )^2\times \mathcal {U}\times \mathbb {R}\times \Omega \longrightarrow \mathbb {R}. \end{aligned}$$

We assume \(b_\cdot (t,\lambda ,u,x,\cdot )\) and \(\kappa _\cdot (t,z,\lambda ,u,x,\cdot )\) to be \(\mathbb {F}\)-predictable for all \(t \in [0,T],\lambda \in [0,\infty )^2,u\in \mathcal {U},x\in \mathbb {R}\) and \(z\in \mathbb {R}\). We also require that they are \(\mathcal {C}^1\) with respect to t and to x. For later use, in order to apply the transformation rule (see [41] Theorem 3.3), we assume that for all \(z\in \mathbb {R}\) \(\lambda \in [0,\infty )^2\), \(u\in \mathcal {U}\), \(x\in \mathbb {R}\) the partial derivative of \(\kappa \) with respect to t (denoted \(\partial _t \kappa _s(t,z,\lambda ,u,x)\)) is locally bounded (uniformly in t) and satisfies

$$\begin{aligned} \vert \partial _t\kappa _s(t_1,z,\lambda ,u,x)- \partial _t \kappa _s(t_2,z,\lambda ,u,x)\vert \le K_3 \vert t_1-t_2 \vert , \end{aligned}$$
(25)

for some \(K_3>0\) and for each fixed \(s\le t\), \(\lambda \in [0,\infty )^2\), \(u\in \mathcal {U}\), \(x\in \mathbb {R}\). Sufficient conditions for the existence of a strong \(\mathbb {F}\)-adapted solution to (24) in \(L^2(dt\times d\mathbb {P})\) are studied in [23, Theorem 3.2].

In the Volterra framework the dynamics produce a memory effect that, together with the noise \(\mu \), makes the state process X clearly non-Markovian. So, as in Sect. 3, we propose a maximum principle approach. However, we see that we have to modify the Hamiltonian functional to accommodate the Volterra features.

Hereafter, the general space \(\Phi _S\) is the space of measurable functions on [0, T] with values in S. We introduce the Hamiltonian functional :

$$\begin{aligned} \mathcal {H}:[0,T]\times \Phi _{[0,\infty )^2}\times \Phi _{\mathcal {U}} \times \Phi _\mathbb {R}\times \Phi _{L^2(d\mathbb {P})} \times \Phi _{\mathcal {Z}} \times \Omega \longrightarrow \mathbf{R}, \end{aligned}$$

as the mapping given by the sum

$$\begin{aligned} \mathcal {H}_t(\lambda ,u,x,p,q):=H_0(t) +H_1(t) \end{aligned}$$
(26)

of the two components

$$\begin{aligned} H_0(t)&:=F_t(\lambda _t,u_t,x_t)+b_t(t,\lambda _t,u_t,x_t)p_t +\kappa _t(t,0,\lambda _t,u_t,x_t) q_t(0)\lambda ^B_t\\&\quad +\int _{{\mathbb {R}_0}}\kappa _t(t,z,\lambda _t,u_t,x_t)q_t(z)\lambda _t^H\nu (dz)\\ H_1(t)&:=\int _0^t\partial _t b_s(t,\lambda _s,u_s,x_s)ds\ p_t +\int _0^t \partial _t \kappa _s(t,z,\lambda _s,u_s,x_s) \mathfrak {D}_{s,0}p_t \lambda ^B_s ds \\&\quad + \int _0^t\!\!\!\int _{\mathbb {R}_0}\partial _t \kappa _s(t,z,\lambda _s,u_s,x_s) \mathfrak {D}_{s,z}p_t \lambda ^H_s \nu (dz)ds. \end{aligned}$$

As we see here above we have used the NA-derivative (6), (7). Stochastic differentiation was first used in the maximum principle approach in [1], where the controlled dynamics were jump-diffusions driven by a standard Brownian motion and centred Poisson random measure. There the Malliavin derivative was used. We recall that Malliavin calculus is tailored on the type of noise and it does not extend easily to general martingales as integrators. See, e.g., [26]. We find an extension to the case of processes with conditional independent increment in [49]. However, the very use of Malliavin calculus has the drawback that the domains of the operators involved are strict subspaces of \(L^2(d\mathbb {P})\) and it is difficult to verify whether the random variables of interests belong to such domains (particularly thinking of the dependence of the variables on different controls u). To circumvent this problem [2] proposes to use an extension of the Malliavin calculus to the white noise setting. See [26] and see [3] for the case of Volterra type dynamics. The white noise framework is a generalisation of Malliavin calculus to distribution spaces and it is again strongly depending on the nature of the noise. At present there is no extension of such framework to the case of time-change Lévy noises and this is topic of current research.

For those reasons, we use here the NA-derivative (6), (7) and its duality with the Itô integral, see Theorem 1. The NA-derivative is well-defined with respect to all square integrable martingales and its domain is the whole \(L^2(d\mathbb {P})\). All our variables belong to such domain.

For \(\lambda \in \mathcal {L}\) and for (uX) in (24), we introduce the adjoint equation as the BSDE

$$\begin{aligned} p_t = \,&\partial _x G(X_T)+\int _t^T\partial _x\mathcal {H}_s(\lambda ,u,X\, ,p, q )ds -\int _t^T\!\!\!\int _{\mathbb {R}}q_s(z)\mu (dsdz). \end{aligned}$$
(27)

Observe that (27) is a true BSDE as the terms \(\partial _x H_1\) in the driver \(\partial _x\mathcal {H}\) are integrated (see [23, Remark 3.3]). Such equation is admitting \(\mathbb {G}\)-adapted solution following [29]. Hence we can implement the same form of “projection" on the filtration \(\mathbb {F}\) to solve the optimal control problem (22), which is studied over the following set of admissible controls.

Definition 5

The set \(\mathcal {A}^{\mathbb {F}}\) of admissible controls in problem (22), (23) consists of \(\mathbb {F}\)-predictable stochastic processes \(u:[0,T] \times \Omega \longmapsto \mathcal {U}\) such that X in (24) and (pq) in (27) have a unique strong solution in \(L^2(dt\times d\mathbb {P})\) and \(L^2(dt\times d\mathbb {P}) \times \mathcal {I}^\mathbb {G}\), respectively, and

$$\begin{aligned} \mathbb {E}\left[ \int _0^T \vert F_s(\lambda _s, u_s, X_s) \vert ^2 ds+ \vert G( X_T) \vert + \vert \partial _x G (X_T) \vert ^2 \right] < \infty . \end{aligned}$$

The couple (uX) is an admissible pair.

Theorem 4

Set \(\lambda \in \mathcal {L}\). Let \(\hat{u} \in \mathcal {A}^\mathbb {F} \) and assume that the corresponding solutions \(\hat{X}\), \((\hat{p},\hat{q})\) of (24) and (27) exist. Define the mapping \(\mathcal {H}^\mathbb {F} \) as

$$\begin{aligned} \mathcal {H}^\mathbb {F} _t(\lambda ,u,x,\hat{p},\hat{q}) :=\mathbb {E}\left[ \mathcal {H}_t(\lambda ,u,x,\hat{p},\hat{q} ) \vert \mathcal {F}_t\right] :=H_0^\mathbb {F} (t) +H_1^\mathbb {F} (t) , \end{aligned}$$
(28)

for \(t \in [0,T]\), \(u\in \Phi _\mathcal {U}\), \(x\in \Phi _\mathbb {R}\), where

$$\begin{aligned} H_0^\mathbb {F} (t)&:=F_t(\lambda _t,u_t,x_t)+b_t(t,\lambda _t,u_t,x_t)\mathbb {E}[\hat{p}_t\vert \mathcal {F}_t]+\kappa _t(t,0,\lambda _t,u_t,x_t)\mathbb {E}[q_t(0)\vert \mathcal {F}_t]\lambda ^B_t\nonumber \\&\quad +\int _{\mathbb {R}_0}\kappa _t(t,z,\lambda _t,u_t,x_t)\mathbb {E}[\hat{q}_t(z) \vert \mathcal {F}_t]\lambda ^H_t\nu (dz)\nonumber \\ H_1^\mathbb {F} (t)&:=\int _0^t\partial _t b_s(t,\lambda _s,u_s,x_s) ds\ \mathbb {E}[\hat{p}_t\vert \mathcal {F}_t] + \int _0^t \partial _t\kappa _s (t,0,\lambda _s, u_s,x_s) \mathbb {E}[\mathfrak {D}_{s,0} \, \hat{p}_t\vert \mathcal {F}_t] \lambda ^B_s ds \\&\quad + \int _0^t\!\!\!\int _{\mathbb {R}_0}\partial _t\kappa _s (t,z,\lambda _s, u_s,x_s) \mathbb {E}[\mathfrak {D}_{s,z}\hat{p}_t\vert \mathcal {F}_t]\lambda ^H_s \nu (dz) ds \end{aligned}$$

Assume that, for fixed t, the function

$$\begin{aligned} h^\mathbb {F}_t(x) := \mathop {\mathrm {ess\,sup}}\limits _{u\in \mathcal {U}}\mathcal {H}^\mathbb {F} _t(\lambda ,u,x,\hat{p},\hat{q}) \end{aligned}$$
(29)

exists and is concave. Also assume that, for all \(t \in [0,T]\),

$$\begin{aligned} h^\mathbb {F}_t(\hat{X}) =\mathcal {H}^\mathbb {F} _t(\lambda ,\hat{u},\hat{X},\hat{p},q). \end{aligned}$$
(30)

Then \(\hat{u}\) is an optimal control and \((\hat{u},\hat{X})\) is an optimal pair for problem (22).

To ease the reading, we introduce some notation in the same style as in Sect. 3. For a given \(\hat{u} \in \mathcal {A}^{\cdot }\), \(\hat{X}\) represents the associated controlled dynamics. Also, \(b_s(t):=b_s(t,\lambda _t,u_t,X_t), \quad \hat{b}_s(t):=b_s(t,\lambda _t,\hat{u}_t,\hat{X}_t),\) and similarly for \(\kappa \), \(\hat{\kappa }\), F, \(\hat{F}\), G, \(\hat{G}\). We will also write \(\mathcal {H}_s^{u}:=\mathcal {H}_s(\lambda _s,u_, X_s,\hat{Y}_s,\hat{\phi }_s(\cdot )), \quad \mathcal {H}^{\hat{u}}_s:=\mathcal {H}_s(\lambda _s,\hat{u}_s, \hat{X}_s,\hat{Y}_s,\hat{\phi }_s(\cdot ))\) and similarly for the other functionals in (26), (28).

Proof

The arguments proceed in a fashion similar to those in Theorem 3, however the presence of the Volterra structure needs to be considered with care. Here below we present only those specific arguments treating the Volterra feature of the dynamics (24).

For \(u\in \mathcal {A}^\mathbb {F} \) with corresponding controlled dynamics X, we consider \(J(u)-J(\hat{u})=I_1+I_2\), where

$$\begin{aligned} I_1&:=\mathbb {E}\left[ \int _0^T F_t(\lambda _t,u_t,X_t) - F_t(\lambda _t,\hat{u}_t,\hat{X}_t ) dt\right] , \\ I_2&:=\mathbb {E}\left[ G(X_T)-G(\hat{X}_T)\right] . \end{aligned}$$

From the definition of \(H_0^\mathbb {F}(t)\) we have

$$\begin{aligned} I_1&=\mathbb {E}\left[ \int _0^T\!\!\!\left\{ H_0^{\mathbb {F} ,u}(t)-H_0^{\mathbb {F} ,\hat{u}}(t)-[b_t(t)-\hat{b}_t(t)]\mathbb {E}\left[ \hat{p}_t \vert \mathcal {F}_t\right] \right\} dt\right. \\&\left. \quad -\int _0^T\!\!\!\int _{\mathbb {R}}[\kappa _t(t,z)-\hat{\kappa }_t(t,z)]\mathbb {E}\left[ \hat{q}_t(z) \vert \mathcal {F}_t\right] \Lambda (dtdz)\right] . \end{aligned}$$

In the study of \(I_2\) we exploit the concavity of G, use the Itô formula for the product and the duality between the NA-derivative and the Itô integral. Finally we obtain

$$\begin{aligned} I_2&\le \mathbb {E}\left[ \int _0^T\!\!\left\{ \hat{p}_t\left( \left( b_t(t)-\hat{b}_t(t)\right) +\int _0^t\left( \partial _t b_s(t)-\partial _t \hat{b}_s(t)\right) ds\right. \right. \right. \nonumber \\&\left. \left. \quad +\int _0^t\!\!\!\int _{\mathbb {R}}\left( \partial _t \kappa _s(t,z)-\partial _t \hat{\kappa }_s(t,z)\right) \mu (dsdz)\right) \right\} dt\nonumber \\&\quad -\int _0^T\partial _x \mathcal {H}^{\hat{u}}_t\left( X_t-\hat{X}_t \right) dt+\int _0^T\left\{ [\kappa _t(t,0)-\hat{\kappa }_t(t,0)]\hat{q}_t(0)\lambda ^B_t\right. \nonumber \\&\left. \left. \quad + \int _{\mathbb {R}_0}[\kappa _t(t,z)-\hat{\kappa }_t(t,z)]\hat{q}_t(z)\nu (dz)\lambda ^H_t\right\} dt\right] . \end{aligned}$$
(31)

Now notice that,

$$\begin{aligned} \mathbb {E}\Big [\int _0^T\Big (\int _0^t\!\!\!\int _{\mathbb {R}}&\partial _t \kappa _s(t,z)\mu (dsdz)\Big )\hat{p}_t dt\Big ] \\&=\int _0^T\mathbb {E}\left[ \int _0^t\!\!\!\int _{\mathbb {R}}\partial _t \kappa _s(t,z)\mathfrak {D}_{s,z}\hat{p}_t \Lambda (dsdz)\right] dt\nonumber \\&=\mathbb {E}\left[ \int _0^T \!\!\!\int _0^t \!\!\!\int _\mathbb {R}\partial _t \kappa _s(t,z)\mathfrak {D}_{s,z} \hat{p}_t \Lambda (dsdz)dt \right] \end{aligned}$$

The use of Itô formula is justified by the transformation rule [41, Theorem 3.3 ] and assumption (25). Then we can substitute the above into (31).

The claim is then achieved thanks to (30) following a separation hyperplane argument applied to the concave function (29). \(\square \)

In the arguments above we have assumed that the integrals are well defined. Sufficient conditions on the coefficients of (24) can be imposed in the same spirit as the ones proposed in Sect. 3.

5 MP for forward–backward systems

Inspired by the concept of recursive utility introduced in [31], we see that the performance functional in problem (22)

$$\begin{aligned} J(\hat{u}) = \sup _{u\in \mathcal {A}^{\mathbb {F}}} J(u), \end{aligned}$$

can include evaluations involving a backward differential equation in the form

$$\begin{aligned} J(u):=\mathbb {E}\left[ \int _0^TF_t(\lambda _t, u_t,X_t,Y_t) dt+G(X_T)+ \Gamma (Y_0 )\right] \end{aligned}$$
(32)

where

$$\begin{aligned} Y_t&=h(X_T) + \int _t^T g_s(\lambda _s,u_s,X_{s},Y_{s},\Theta _s)ds \nonumber \\&\quad - \int _t^T\!\!\!\int _\mathbb {R}\Theta _s(z)\mu (dsdz)- \int _t^T dM_s \end{aligned}$$
(33)

is a BSDE driven by the noise \(\mu \) considered under the filtration \(\mathbb {F}\). The driver \(g_\cdot (\lambda ,u,x,y,\theta )\) is an \(\mathbb {F}\)-adapted process in \(L^2(dt\times d\mathbb {P})\) for all \(\lambda \in [0,\infty )^2\), \(u\in \mathbb {U}\), \(x\in \mathbb {R}\), \(y\in \mathbb {R}\), \(\theta \in \mathcal {Z}\), \(C^1\) both with respect to s and x, and with partial derivatives in \(L^2(d\mathbb {P})\). Also we assume that h(x) is \(\mathbb {F}_T\)-measurable for all \(x \in \mathbb {R}\) and \(C^1\) with respect to x, a.s. For simplicity we assume that is is bounded and with bounded derivative. These conditions are sufficient to technically guarantee the well definition of some integrals that occur later in the proofs. They can be weekend in specific models. The BSDE (33) is studied under the filtration \(\mathbb {F}\) within the setting of general martingale noise. This yields the presence of the orthogonal martingale M as intrinsic part of the solution, which derives from the stochastic integral representations of type (4).

The forward dynamics of X are given in (24) with Volterra features and satisfying the same assumptions presented in Sect. 4. Observe that (24)–(33) is a partially coupled system, the solution of which is not a challenge. First one solves (24), then the value of \(X_T\) is used in the final value of (33). For the existence of a strong solution \((Y,\Theta ,M)\) to (33) we can refer, e.g., to [40].

Here above \(\Gamma (x)\) is a random variable for all \(x \in \mathbb {R}\), concave and differentiable in x a.s., while F and G satisfy the same conditions introduced earlier in Sect. 3.

In [6] we see one of the first works in which a couple forward–backward system considers the backward equation with respect to a general filtration (thus presenting an orthogonal martingale part). In this work the driving noise is however a Brownian motion only. The goal of that paper is to prove existence of an optimal control.

In our context we see that a maximum principle can be obtained by an adequate modification of the Hamiltonian functional, which will in turn be coupled with a forward–backward system of equations. Indeed we have

$$\begin{aligned} \mathcal {H}:[0,T]\times \Phi _{[0,\infty )^2}\times \Phi _\mathbb {R}\times \Phi _\mathbb {R}\times \Phi _\mathbb {R}\times \Phi _{\mathcal {Z}}\times \mathbb {R}\times \Phi _\mathbb {R}\times \Phi _{L^2(\mathbb {P})} \times \Phi _{\mathcal {Z}} \times \Omega \longrightarrow \mathbf{R}, \end{aligned}$$

given by

$$\begin{aligned} \mathcal {H}_t(\lambda ,u, x,y,\theta ,y^0, \zeta ,p,q)&:=H_0(t) +H_1(t), \end{aligned}$$
(34)

where

$$\begin{aligned} H_0(t):= & {} F_t(\lambda _t,u_t,x_t,y_t) + b_t(t,\lambda _t,u_t,x_t) p_t + \kappa _t(t,0,\lambda _t,u_t,x_t) q_t(0)\lambda _t^B\\&+\int _{{\mathbb {R}_0}}\kappa _t(t,z,\lambda _t,u_t,x_t)q_t(z)\lambda _t^H\nu (dz)+g_t(\lambda _t,u_t,x_t,y_t,\theta _t)\zeta _t\\ H_1(t):= & {} \int _0^t\partial _t b_s(t,\lambda _s,u_s,x_s)ds\ p_t+ \int _0^t \partial _t\kappa _s(t,0,\lambda _s,u_s,x_s)\mathfrak {D}_{s,0} \, p_t \, \lambda _s^Bds \\&+\int _0^t\!\!\!\int _{\mathbb {R}}\partial _t\kappa _s(t,z,\lambda _s,u_s,x_s)\mathfrak {D}_{s,z}p_t \lambda _s^H \nu (dz)dt. \end{aligned}$$

As in Sect. 4, the NA-derivative (6), (7) appears in presence of Volterra structures. Also we see that the functional \(H_0\) contains elements from the backward differential equation (33) in the performance functional. For \(\lambda \in \mathcal {L}\) and in association to the coupled system (24)–(33), we then consider the functional

$$\begin{aligned} \mathcal {H}(t) := \mathcal {H}_t(\lambda , u, X, Y, \Theta , y^0, \zeta ,p,q), \end{aligned}$$

and the adjoint forward–backward system \((\zeta ,(p,q))\) here below.

For \(\theta \in \mathcal {Z}\), we denote with \(\partial _{\theta _{0}}\) the partial derivative with respect to \(\theta (0)\) and \(\nabla _{\theta _{z}}\) denotes the Fréchet derivative with respect to \(\theta (z)\), \(z\ne 0\). We also denote with \(\frac{d}{d\nu }\nabla _{\theta _{z}}\mathcal {H}_i(t)\) the Radon-Nikodym derivative of \(\nabla _{\theta _{z}}\mathcal {H}(t)\) with respect to \(\nu (dz)\).

The process \(\zeta \) satisfies stochastic forward equation

$$\begin{aligned} {\left\{ \begin{array}{ll} d\zeta _t =\partial _y\mathcal {H}(t)dt+\partial _{\theta _{0}}\mathcal {H}(t)B(dt)+\int _{{\mathbb {R}_0}}\frac{d}{d\nu }\nabla _{\theta _{z}}\mathcal {H}(t)\widetilde{H}(dtdz) \\ \zeta _0=\partial _y\psi (y^0). \end{array}\right. } \end{aligned}$$
(35)

Later on the value \(\zeta _0 = y^0 = Y_0\) will be associated to the initial value of the backward dynamics Y.

The stochastic backward equation is

$$\begin{aligned} {\left\{ \begin{array}{ll} dp_t=-\partial _x\mathcal {H}(t)dt+\int _\mathbb {R}q_t(z)\mu (dtdz) \\ p_T=\partial _x\varphi (X_T)+h(X_T) \zeta _T, \end{array}\right. } \end{aligned}$$
(36)

For the sequel we assume:

  • \(\mathcal {H}(t)\) is well defined, differentiable with respect to \(x,y,\theta (0),u\) and Fréchet differentiable with respect to \(\theta (z)\), \(z\ne 0\).

  • \(\frac{d}{d\nu }\nabla _{\theta _{z}}\mathcal {H}(t)\) is well defined.

As already noticed in Sect. 4, we remark that (36) is a true BSDE since the term \(\partial _x H_1\) in the driver \(\partial _x\mathcal {H}\) is integrated (see [23, Remark 3.3]). Also the forward–backward system (35), (36) is partially coupled and studied under filtration \(\mathbb {G}\). The solution of (35) leads to the terminal condition in (36). Sufficient conditions for the existence of strong solutions to (35) and (36) are well known. See, e.g., [29, 33].

In this framework the admissible controls are given as follows.

Definition 6

The admissible controls in the stochastic optimal control problem with performance (32) are \(\mathbb {F}\)-predictable stochastic processes \(u: [0,T]\times \Omega \rightarrow \mathcal {U}\), such that the corresponding forward–backward system \((X,(Y,\Theta ,M))\) in (24)–(33) has a unique strong solution and the adjoint forward–backward system \((\zeta , (p,q))\) in (35), (36) has a unique solution, and

$$\begin{aligned} \mathbb {E}\Bigg [ \int \limits _0^T \vert F_t( \lambda _t,u_t,X_{t})\vert ^2 \,dt + \vert G(X_T) \vert + \vert \partial _x G(X_T) \vert ^2 + \vert \Gamma (X_T) \vert + \vert \partial _x \Gamma (X_T) \vert ^2\Bigg ] < \infty . \end{aligned}$$

The set of admissible controls is still denoted by \(\mathcal {A}^\mathbb {F}\).

Following the approach already suggested in the previous sections, we present the maximum principle exploiting a form of “projection” of the Hamiltonian functional on the filtration \(\mathbb {F}\).

Theorem 5

Fix \(\lambda \in \mathcal {L}\). Let \(\hat{u} \in \mathcal {A}^\mathbb {F} \) and assume that the corresponding solutions \(\hat{X}\), \((\hat{Y},\hat{\Theta },\hat{M})\) of (24) and (33) exist together with the solutions \(\hat{\zeta }\), \((\hat{p}, \hat{q})\) of (35), (36), with \(y^0=\hat{Y}_0\) in the initial value. Define the \(\mathbb {F}\)-Hamiltonian functional as

$$\begin{aligned} \mathcal {H}^{\mathbb {F} }_t (\lambda ,u, x,y,\theta ,\hat{Y}_0, \hat{\zeta }, \hat{p},\hat{q}) :=\mathbb {E}\left[ \mathcal {H}_t(\lambda ,u, x,y,\theta , \hat{Y}_0, \hat{\zeta },\hat{p},\hat{q}) \vert \mathcal {F}_t\right] \end{aligned}$$

for \(t \in [0,T]\), \(u \in \Phi _\mathbb {R}\), \(x,y \in \Phi _\mathbb {R}\), \(\theta \in \Phi _{\mathcal {Z}}\), which is naturally given by the sum of the following two functionals

$$\begin{aligned} H^{\mathbb {F} }_0(t)&:=F_t(\lambda _t u_t,x_t,y_t) + b_t(t,\lambda _t,u_t,x_t)\mathbb {E}[\hat{p}_t \vert \mathcal {F}_t] +\kappa _t(t,0,\lambda _t,u_t,x_t)\mathbb {E}[\hat{q}_t(0)\vert \mathcal {F}_t]\lambda _t^B\\&\quad +\int _{{\mathbb {R}_0}}\kappa _t(t,z,\lambda _t,u_t,x_t)\mathbb {E}[\hat{q}_t(z)\vert \mathcal {F}_t]\lambda _t^H\nu (dz)+g_t(\lambda _t,u_t,x_t,y_t,\theta _t)\mathbb {E}[\hat{\zeta }_t \vert \mathcal {F}_t]\\ H^{\mathbb {F} }_1(t)&\quad := \int _0^t\partial _t b_s(t,\lambda _s,u_s,x_s)ds\mathbb {E}[\hat{p}_t\vert \mathcal {F}_t] +\int _0^t \partial _t\kappa _s(t,0,\lambda _s,u_s,x_s)\mathbb {E}[\mathfrak {D}_{s,0} \hat{p}_t\vert \mathcal {F}_t] \lambda ^B_s ds\\&\qquad \quad +\int _0^t\!\!\!\int _{\mathbb {R}}\partial _t\kappa _s(t,z,\lambda _s,u_s,x_s)\mathbb {E}[\mathfrak {D}_{s,z} \hat{p}_t\vert \mathcal {F}_t] \lambda ^H_s \nu (dz) ds \end{aligned}$$

Assume that

  • the functionals

    $$\begin{aligned} x, y, \theta \longmapsto \mathop {\mathrm {ess\,sup}}\limits _{u\in \mathcal {U}} \mathcal {H}^\mathbb {F} _t(\lambda ,u,x, y, \theta , \hat{Y}^0, \hat{\zeta }, \hat{p}, \hat{q} ) \end{aligned}$$
    (37)

    exist and are concave for all t

  • and, for all \(t \in [0,T]\),

    $$\begin{aligned} \sup _{u\in \mathcal {U}} \mathcal {H}^\mathbb {F} _t(\lambda ,u,\hat{X},\hat{Y},\hat{\Theta }, \hat{Y}^0, \hat{\zeta }, \hat{p}, \hat{q}) =\mathcal {H}^\mathbb {F} _t(\lambda ,\hat{u},\hat{X},\hat{Y},\hat{\Theta }, \hat{Y}^0, \hat{\zeta }, \hat{p}, \hat{q}) . \end{aligned}$$
    (38)

Then \(\hat{u}\) is an optimal control for the forward–backward system (24)–(33) with performance (32).

6 Concluding remarks

In this work we have summarised how stochastic control for time changed Lévy dynamics can be addressed in the framework of maximum principles. To exploit the structure of the time changed noise, we have used different filtrations, a technical one which is anticipating of the information on the time change process and the natural one generated by the noise. Correspondingly, we have used different types of Hamiltonian functionals. The use of different information flows has impact on the possibility to solve backward stochastic differential equations, which in turn depend on different forms of stochastic integral representations.

In our work we have seen that with appropriate modifications of the Hamiltonian functionals we can deal also with Volterra type dynamics, which involve memory in the coefficients and not only in the noise via time change. The work has progressed to include forward–backward systems of equations.

We remark that in Sect. 5, the forward–backward system involved a Volterra type of forward dynamics (24) and a backward stochastic differential equation (33). It is natural to ask whether one could extend this to Volterra type backward equations. It turns out that this extension in rather tricky. Indeed, if the time-change was deterministic, then we would be able to progress in our solution. In this case the filtrations involved would coincide \(\mathbb {F}=\mathbb {G}\) and we would not have to deal with orthogonal martingale parts in the solution of a Volterra version of Eq. (33). See [24] for details. On the opposite, if the time change is genuinely stochastic, then the Volterra version of equation (33) would present an orthogonal component, which would have to be dealt with. For this some necessary assumptions on the structure of the orthogonal martingale part would be necessary, but there are no result that would guarantee those assumption at the time being.

In this presentation we have discussed only sufficient maximum principles, which are forms of verification that an “educated" guess of possible optimal control is indeed optimal. Necessary maximum principle are useful to identify those guesses. A version of a necessary maximum principle is available in the framework of Sects. 3 and 4, see [23]. The results are not yet polished enough in the case of Volterra type dynamics as presented in Sect. 5. There as well some structure on the orthogonal martingale part of (33) would be necessary.