Markovian lifts of positive semidefinite affine Volterra-type processes

  • Christa CuchieroEmail author
  • Josef Teichmann
Open Access


We consider stochastic partial differential equations appearing as Markovian lifts of matrix-valued (affine) Volterra-type processes from the point of view of the generalized Feller property (see, e.g., Dörsek and Teichmann in A semigroup point of view on splitting schemes for stochastic (partial) differential equations, 2010. arXiv:1011.2651). We introduce in particular Volterra Wishart processes with fractional kernels and values in the cone of positive semidefinite matrices. They are constructed from matrix products of infinite dimensional Ornstein–Uhlenbeck processes whose state space is the set of matrix-valued measures. Parallel to that we also consider positive definite Volterra pure jump processes, giving rise to multivariate Hawkes-type processes. We apply these affine covariance processes for multivariate (rough) volatility modeling and introduce a (rough) multivariate Volterra Heston-type model.


Stochastic partial differential equations Affine processes Wishart processes Hawkes processes Stochastic Volterra processes Rough volatility models 

Mathematics Subject Classification

60H15 60J25 

JEL Classification

C.5 G.1 

1 Introduction

It is the goal of this article to investigate the results of Cuchiero and Teichmann (2018) on infinite dimensional Markovian lifts of stochastic Volterra processes in a multivariate setup: We are mainly interested in the case where the stochastic Volterra processes take values in the cone of positive semidefinite matrices \(\mathbb {S}^d_+\). We shall concentrate on the affine case due to its relevance for tractable rough covariance modeling, extending rough volatility (see, e.g., Alòs et al. 2007; Gatheral et al. 2018; Bayer et al. 2016) to a setting of d “roughly correlated” assets.

Viewing stochastic Volterra processes from an infinite dimensional perspective allows to dissolve a generic non-Markovianity of the at first sight naturally low-dimensional volatility process. Indeed, this approach makes it actually possible to go beyond the univariate case considered so far and treat the problem of multivariate rough covariance models for more than one asset. Moreover, the considered Markovian lifts allow to apply the full machinery of affine processes. We refer to the introduction of Cuchiero and Teichmann (2018) for an overview of theoretical and practical advantages of Markovian lifts in the context of Volterra-type processes.

Let us start now by explaining why the matrix-valued positive definite case is actually more involved than the scalar one in \(\mathbb {R}_+\), where, for instance, the Volterra Cox–Ingersoll–Ross process takes values. The latter appears as variance process in a rough Heston model (see, e.g., El Euch 2019; Abi Jaber and El Euch 2019; Alòs and Yang 2017). Consider now a standard Wishart process on \(\mathbb {S}_+^d\), as defined in Bru (1991), Cuchiero et al. (2011), of the form
$$\begin{aligned} \mathrm{d} X_t = (d-1) {\text {Id}}_{d} \mathrm{d}t + \sqrt{X_t} \mathrm{d}W_t + \mathrm{d}W_t^\top \sqrt{X_t}, \quad X_0 \in \mathbb {S}^d_+. \end{aligned}$$
Here \( \sqrt{.} \) denotes the matrix square root, \({\text {Id}}_d\) the identity matrix and W a \(d \times d\) the matrix of Brownian motions. The (necessary) presence of the dimension d in the drift is an obvious obstruction to infinite dimensional versions of this equation, which could be projected to obtain Volterra-type equations by the variation of constants formula; see Cuchiero and Teichmann (2018) for such a projection on \(\mathbb {R}_+\). In order to circumvent this difficulty, we present two approaches in this paper:
  • We develop a theory of infinite dimensional affine Markovian lifts of pure jump positive semidefinite Volterra processes.

  • We develop a theory of squares of Gaussian processes in a general setting to construct infinite dimensional analogs of Wishart processes. Their finite dimensional projections, however, look different from naively conjectured Volterra Wishart processes following the role model of Volterra Cox–Ingersoll–Ross processes. They are also different in dimension one, as outlined below.

The jump part appears natural and comes without any further probabilistic problem when constrained to finite variation jumps. Note that in the (non-Volterra) case of affine processes on positive semidefinite matrices, quadratic variation jumps are not possible either (see Mayerhofer 2012). With the generalized Feller approach from Dörsek and Teichmann (2010), Cuchiero and Teichmann (2018), we obtain a new class of stochastic Volterra processes taking values in \(\mathbb {S}_+^d\) of the form
$$\begin{aligned} V_t= h(t)+\int _0^t( K(t-s) V_s + V_s K(t-s) )\mathrm{d}s + \int _0^t K(t-s) \mathrm{d}N_s + \int \mathrm{d}N_s K(t-s), \end{aligned}$$
where \(h: \mathbb {R}_+ \rightarrow \mathbb {S}^d_+\) is some deterministic function, K a (potentially fractional) kernel in \(L^2(\mathbb {R}_+, \mathbb {S}_+^d)\) and N a pure jump process of finite variation with jump sizes in \(\mathbb {S}^d_+\), whose compensator is a linear function in V. This allows, for instance, to define a multivariate Hawkes process \(\widehat{N}\)1 with values in \(\mathbb {N}_0^d\) given by the diagonal entries of N, i.e., \({\text {diag}}(N)=\widehat{N}\), and the compensator of \(\widehat{N}_i\) is given by \(\int _0^{\cdot }V_{s,ii} \mathrm{d}s\) (see Example 4.16). By means of the affine transform formula for the infinite dimensional lift of (1.2), we are able to derive an expression for the Laplace transform of \(V_t\) which can be computed by means of matrix Riccati–Volterra equations.

The difficulty of the continuous part arises from geometric constraints, which can, however, be circumvent by building squares of unconstrained processes. Let us illustrate the idea in a finite dimensional setting: Let W be an \(n \times d \) matrix of Brownian motions and let \(\nu \) be a matrix in \(\mathbb {R}^{d \times \mathrm{d}k}\) consisting of k submatrixes \(\nu _i \in \mathbb {R}^{d \times d}\), \(i=1, \ldots , k\), i.e., \(\nu =(\nu _1, \ldots , \nu _k)\).

Define now a Gaussian process with values in \(\mathbb {R}^{n \times \mathrm{d}k}\) by \(\gamma := W\nu \). Then, by Itô’s product formula, the \(\mathbb {R}^{\mathrm{d}k \times \mathrm{d}k}\) valued process \(\gamma _t^\top \gamma _t \) satisfies the following equation
$$\begin{aligned} \mathrm{d} \gamma _t^\top \gamma _t = n \nu ^\top \nu \mathrm{d}t + \nu ^\top \mathrm{d}W_t^\top \gamma _t + \gamma _t^\top \mathrm{d}W_t \nu . \end{aligned}$$
Following Bru (1991, Subsection 5.2) and setting \(\lambda _t:=\gamma _t^\top \gamma _t \), this can, however, also be written via a \( kd \times kd \) matrix of independent Brownian motions B satisfying
$$\begin{aligned} \sqrt{\gamma _t^\top \gamma _t} \mathrm{d} B_t \sqrt{\nu ^\top \nu } = \gamma _t^\top \mathrm{d}W_t \nu \end{aligned}$$
in the more familiar form
$$\begin{aligned} \mathrm{d} \lambda _t=n \nu ^\top \nu \mathrm{d}t + \sqrt{\nu ^\top \nu } \mathrm{d}B_t^\top \sqrt{\lambda _t} + \sqrt{\lambda _t} \mathrm{d}B_t \sqrt{\nu ^\top \nu }. \end{aligned}$$
Our article is devoted to analyze the situation where the index variable \( \nu \) gets continuous, which is the only possible form of an infinite dimensional Wishart process. We believe that generalized Feller processes are the right arena to achieve this purpose. In this article, we choose measure spaces, but an analogous analysis can be done in the setting of function spaces as, for instance, the Hilbert space setting of Filipović (2001); see Cuchiero and Teichmann (2018, Section 5.2). In the measure-valued setting, we proceed as follows: Let \(\gamma \) be an infinite dimensional Ornstein–Uhlenbeck process taking values in \(\mathbb {R}^{n \times d}\)-valued regular Borel measures on \(\mathbb {R}_+\). Then, Volterra Wishart processes arise as finite dimensional projections of \(\gamma ^{\top }(\mathrm{d}x_1)\gamma (\mathrm{d}x_2)\) on \(\mathbb {S}_+^d\) and can be written as
$$\begin{aligned} V_t= & {} h(t)+ n \int _0^t K(t-s) K(t-s) \mathrm{d}s \nonumber \\&+ \int _0^t K(t-s) \mathrm{d}W^{\top }_s Y(t,s) \mathrm{d}s+ \int _0^t Y(t,s)^{\top } \mathrm{d}W_s K(t-s), \end{aligned}$$
where h and K are as in (1.2), W an \(n \times d \) matrix of Brownian motions and \(Y(t,s)= \int _0^{\infty } e^{-x (t-s)}\gamma _s(\mathrm{d}x)\). As explained in Remark 5.4, \(V_t\) corresponds to the matrix square of a Volterra Ornstein–Uhlenbeck process \(X_t\), obtained as finite dimensional projection of \(\gamma (\mathrm{d}x)\). The Volterra Wishart process (1.6) can then also be written in terms of the forward process of \(X_t\), i.e., \((\mathbb {E}[X_t|{\mathcal {F}}_s])_{s\le t}\), namely
$$\begin{aligned} V_t= & {} h(t)+ n \int _0^t K(t-s) K(t-s) \mathrm{d}s \\&+ \int _0^t K(t-s) \mathrm{d}W^{\top }_s \mathbb {E}[X_t|{\mathcal {F}}_s] \mathrm{d}s+ \int _0^t \mathbb {E}[X_t^{\top }|{\mathcal {F}}_s] \mathrm{d}W_s K(t-s). \end{aligned}$$
Note that this is not of standard Volterra form, as, e.g., in Abi Jaber et al. (2019), since Y(ts) or \({\mathbb {E}}[X_t|\mathcal {F}_s]\), respectively, cannot be expressed as a function of \(V_t\). By moving to a Brownian field analogous to (1.4), it could, however, be expressed as a path functional of \((V_s)_{s \le t}\). For \(n=d=1\), it also gives rise to a different equation than the Volterra CIR process. We explain the connection between (1.6) and (1.3)–(1.5) in detail in Sect. 5.

Note that by choosing K to be a matrix of fractional kernels, the trajectories of (1.6) become rough, whence V qualifies for rough covariance modeling with potentially different roughness regimes for different assets and their covariances. This is in accordance with econometric observations. In Sect. 6, we show how such models can be defined: We introduce a (rough) multivariate Volterra Heston-type model with jumps and show that it can again be cast in the affine framework. This is particularly relevant for pricing basket or spread options using the Fourier pricing approach.

The remainder of the article is organized as follows: In Sect. 1.1, we introduce some notation and review certain functional analytic concepts. In Sects. 2 and 3, we recall and extend results on generalized Feller processes as outlined in Cuchiero and Teichmann (2018). In particular, Theorem 2.8 provides a result on invariant (sub)spaces for generalized Feller processes that is crucial for the square construction as outlined above. In Sect. 4, we apply the presented theory to SPDEs which are lifts of matrix-valued stochastic Volterra jump processes of type (1.2). Section 5 is devoted to present a theory of infinite dimensional Wishart processes which in turn give rise to (rough) Volterra Wishart processes. In Sect. 6, we apply these processes for multivariate (rough) volatility modeling.

1.1 Notation and some functional analytic notions

For the background in functional analysis, we refer to the excellent textbook of Schaefer and Wolff (1999) as main reference and to the equally excellent books of Engel and Nagel (2000) and Pazy (1983) for the background in strongly continuous semigroups.

We shall apply the following notations: Let Y be a Banach space and \( Y^* \) its dual space, i.e., the space of linear continuous functionals with the strong dual norm
$$\begin{aligned} {\Vert \lambda \Vert }_{Y^*} = \sup _{\Vert y\Vert \le 1} | \langle y , \lambda \rangle |, \end{aligned}$$
where \( \langle y , \lambda \rangle := \lambda (y) \) denotes the evaluation of the linear functional \( \lambda \) at the point \( y \in Y \). Since in the case of Eq. (1.2), cones \( {\mathcal {E}} \) of \( Y^* \) will be our state spaces, we denote the polar cones in pre-dual notation, i.e.,
$$\begin{aligned} {\mathcal {E}}_* = \big \{ y \in Y \, | \; \langle y , \lambda \rangle \le 0 \text { for all } \lambda \in {\mathcal {E}} \big \} . \end{aligned}$$
We denote spaces of bounded linear operators from Banach spaces \( Y_1 \) to \( Y_2 \) by \( L(Y_1,Y_2) \) with norm
$$\begin{aligned} {\Vert A \Vert }_{L(Y_1,Y_2)} := \sup _{{\Vert y_1\Vert }_{Y_1}\le 1} {\Vert Ay_1 \Vert }_{Y_2}. \end{aligned}$$
If \(Y_1=Y_2\), we only write \(\Vert \cdot \Vert _{L(Y_1)}\). On \( Y^* \), we shall usually consider beside the strong topology (induced by the strong dual norm) the weak-\(*\)-topology, which is the weakest locally convex topology making all linear functionals \( \langle y , \cdot \rangle \) on \( Y^* \) continuous. Let us recall the following facts:
  • The weak-\(*\)-topology is metrizable if and only if Y is finite dimensional: This is due to Baire’s category theorem since \(Y^*\) can be written as a countable union of closed sets, whence at least one has to contain an open set, which in turn means that compact neighborhoods exist, i.e., a strictly finite dimensional phenomenon.

  • Norm balls \( K_R \) of any radius R in \( Y^* \) are compact with respect to the weak-\(*\)-topology, which is the Banach–Alaoglu theorem.

  • These balls are metrizable if and only if Y is separable: This is true since Y can be isometrically embedded into \( C(K_1) \), where \( y \mapsto \langle y,\cdot \rangle \), for \( y \in Y \). Since Y is separable, its embedded image is separable, too, which means—by looking at the algebra generated by Y in \( C(K_1) \)—that \( C(K_1) \) is separable, which is the case if and only if \( K_1 \) is metrizable.

Even though some results are more general, in particular, often only compactness of \(K_R\) is used, we shall always assume separability in this article.
Finally, a family of linear operators \( {(P_t)}_{t \ge 0} \) on a Banach space Y with \( P_t P_s = P_{t+s} \) for \( s,t \ge 0 \) and with \( P_0 =I \) where I denotes the identity is called strongly continuous semigroup if \( \lim _{t \rightarrow 0} P_t y = y \) holds true for every \( y \in Y \). We denote its generator usually by A which is defined as \( \lim _{t \rightarrow 0} \frac{P_t y - y}{t} \) for all \( y \in {\text {dom}}(A) \), i.e., the set of elements where the limit exists. Notice that \( {\text {dom}}(A) \) is left invariant by the semigroup P and that its restriction on the domain equipped with the operator norm
$$\begin{aligned} {\Vert y \Vert }_{{\text {dom}}(A)} := \sqrt{\Vert y\Vert ^2 + \Vert Ay\Vert ^2} \end{aligned}$$
is again a strongly continuous semigroup.

Moreover, as already used in the introduction, \(\mathbb {S}^d\) denotes the vector space of symmetric \(d \times d\) matrices and \(\mathbb {S}^d_+\) the cone of positive semidefinite ones. Furthermore, we denote by \({\text {diag}}(A)\) the vector consisting of the diagonal elements of a matrix A.

2 Generalized Feller semigroups and processes

In the context of Markovian lifts of stochastic Volterra processes (signed), measure-valued processes appear in a natural way. The generalized Feller framework is taylor-made for such processes, as it allows to consider non-locally compact state spaces, going beyond the standard theory of Feller processes as provided e.g. in Ethier and Kurtz (1986). This is explicitly needed in Sect. 5 for Ornstein-Uhlenbeck processes which take values in the space of matrix-valued measures. Beyond that jump processes with unbounded, but finite activity can be easily constructed in this setting, see Proposition 3.4 and Sect. 4. We shall first collect some results from Cuchiero and Teichmann (2018) and generalize accordingly for the purposes of this article.

2.1 Definitions and results

First, we introduce weighted spaces and state a central Riesz–Markov–Kakutani representation result. The underlying space X here is a completely regular Hausdorff topological space.

Definition 2.1

A function \(\varrho :X\rightarrow (0,\infty )\) is called admissible weight function if the sets \(K_R:=\left\{ x\in X:\varrho (x)\le R \right\} \) are compact and separable for all \(R>0\).

An admissible weight function \(\varrho \) is necessarily lower semicontinuous and bounded from below by a positive constant. We call the pair X together with an admissible weight function \(\varrho \) a weighted space. A weighted space is \(\sigma \)-compact. In the following remark, we clarify the question of local compactness of convex subsets \({\mathcal {E}} \subset X\) when X is a locally convex topological space and \(\varrho \) convex.

Remark 2.2

Let X be a separable locally convex topological space and \({\mathcal {E}}\) a convex subset. Moreover, let \(\varrho \) be a convex admissible weight function. Then, \( \varrho \) is continuous on \( {\mathcal {E}} \) if and only if \(\mathcal {E}\) is locally compact. Indeed, if \( \varrho \) is continuous on \(\mathcal {E}\), then of course, the topology on \( {\mathcal {E}} \) is locally compact since every point has a compact neighborhood of type \( \{ \varrho \le R \} \) for some \( R > 0 \). On the other hand, if the topology on \( {\mathcal {E}} \) is locally compact, then for every point \( \lambda _0 \in {\mathcal {E}} \), there is a convex, compact neighborhood \( V \subset \mathcal {E}\) such that \( \varrho (\lambda )-\varrho (\lambda _0) \) is bounded on V by a number \( k > 0 \), whence by convexity \( |\varrho (s(\lambda -\lambda _0)+\lambda _0)-\varrho (\lambda _0)| \le s k \) for \( \lambda - \lambda _0 \in s(V-\lambda _0) \) and \( s \in ]0,1] \). This in turn means that \( \varrho \) is continuous at \( \lambda _0 \).

From now on, \(\varrho \) shall always denote an admissible weight function. For completeness, we start by putting definitions for general Banach space valued functions, although in the sequel, we shall only deal with \(\mathbb {R}\)-valued functions: Let Z be a Banach space with norm \({||\cdot ||}_Z \). The vector space
$$\begin{aligned} \mathrm {B}^\varrho (X;Z):=\left\{ f:X\rightarrow Z :\sup _{x\in X}\varrho (x)^{-1}{||f(x)||}_Z < \infty \right\} \end{aligned}$$
of Z-valued functions f equipped with the norm
$$\begin{aligned} ||f||_{\varrho }:=\sup _{x\in X}\varrho (x)^{-1}{||f(x) ||}_Z, \end{aligned}$$
is a Banach space itself. It is also clear that for Z-valued bounded continuous functions, the continuous embedding \(\mathrm {C}_b(X;Z)\subset \mathrm {B}^\varrho (X;Z)\) holds true, where we consider the supremum norm on bounded continuous functions, i.e., \(\sup _{x \in X}\Vert f(x)\Vert \).

Definition 2.3

We define \(\mathcal {B}^{\varrho }(X;Z)\) as the closure of \(\mathrm {C}_b(X;Z)\) in \(\mathrm {B}^{\varrho }(X;Z)\). The normed space \(\mathcal {B}^{\varrho }(X;Z)\) is a Banach space.

If the range space \(Z=\mathbb {R}\), which from now on will be the case, we shall write \( \mathcal {B}^\varrho (X) \) for \(\mathcal {B}^{\varrho }(X; \mathbb {R})\) and analogously \(B^{\varrho }(X)\).

We consider elements of \( {\mathcal {B}}^\varrho (X) \) as continuous functions whose growth is controlled by \( \varrho \). More precisely, we have by Dörsek and Teichmann (2010, Theorem 2.7) that \(f \in {\mathcal {B}}^{\varrho }(X)\) if and only if \(f|_{K_R}\in {\mathrm {C}}(K_R)\) for all \(R>0\) and
$$\begin{aligned} \lim _{R \rightarrow \infty }\sup _{x \in X \setminus K_R}\varrho (x)^{-1}||f(x)||=0 \, . \end{aligned}$$
Additionally, by Dörsek and Teichmann (2010, Theorem 2.8), it holds that for every \(f \in {\mathcal {B}}^\varrho (X)\) with \(\sup _{x\in X}f(x)>0\), there exists \(z\in X\) such that
$$\begin{aligned} \varrho (x)^{-1}f(x) \le \varrho (z)^{-1}f(z) \quad \text {for all} x \in X, \end{aligned}$$
which emphasizes the analogy with spaces of continuous functions vanishing at \( \infty \) on locally compact spaces.

Let us now state the following crucial representation theorem of Riesz type:

Theorem 2.4

(Riesz representation for \(\mathcal {B}^\varrho (X)\)) For every continuous linear functional \(\ell :\mathcal {B}^\varrho (X)\rightarrow \mathbb {R}\) there exists a finite signed Radon measure \(\mu \) on X such that
$$\begin{aligned} \ell (f)=\int _{X}f(x)\mu (\mathrm{d}x)\quad \text {for all} f \in {\mathcal {B}}^\varrho (X). \end{aligned}$$
$$\begin{aligned} \int _{X}\varrho (x)|\mu |( \mathrm{d}x) = ||\ell ||_{L(\mathcal {B}^\varrho (X),\mathbb {R})}, \end{aligned}$$
where \(|\mu |\) denotes the total variation measure of \(\mu \).

We shall next consider strongly continuous semigroups on \( \mathcal {B}^\varrho (X) \) spaces and recover very similar structures as well known for Feller semigroups on the space of continuous functions vanishing at \( \infty \) on locally compact spaces.

Definition 2.5

A family of bounded linear operators \(P_t:\mathcal {B}^{\varrho }(X)\rightarrow \mathcal {B}^{\varrho }(X)\) for \( t \ge 0 \) is called generalized Feller semigroup if
  1. (i)

    \(P_0=I\), the identity on \(\mathcal {B}^{\varrho }(X)\),

  2. (ii)

    \(P_{t+s}=P_tP_s\) for all t, \(s\ge 0\),

  3. (iii)

    for all \(f \in {\mathcal {B}}^{\varrho }(X)\) and \(x\in X\), \(\lim _{t\rightarrow 0}P_t f(x)=f(x)\),

  4. (iv)

    there exist a constant \(C\in \mathbb {R}\) and \(\varepsilon >0\) such that for all \(t\in [0,\varepsilon ]\), \(||P_t||_{L(\mathcal {B}^{\varrho }(X))}\le C \).

  5. (v)

    \(P_t\) is positive for all \(t\ge 0\), that is, for \(f \in {\mathcal {B}}^{\varrho }(X)\), \(f\ge 0\), we have \(P_t f\ge 0\).


We obtain due to the Riesz representation property the following key theorem:

Theorem 2.6

Let \((P_t)_{t\ge 0}\) satisfy (i) to (iv) of Definition 2.5. Then, \((P_t)_{t\ge 0}\) is strongly continuous on \(\mathcal {B}^{\varrho }(X)\), that is,
$$\begin{aligned} \lim _{t \rightarrow 0}||P_t f-f||_{\varrho }=0 \quad for all f \in {\mathcal {B}}^{\varrho }(X). \end{aligned}$$

One can also establish a positive maximum principle in case that the semigroup \( P_t \) grows around 0 like \( \exp (\omega t) \) for some \(\omega \in \mathbb {R}\) with respect to the operator norm on \( \mathcal {B}^{\varrho }(X) \). Indeed, the following theorem proved in Dörsek and Teichmann (2010, Theorem 3.3) is a reformulation of the Lumer–Phillips theorem for pseudo-contraction semigroups using a generalized positive maximum principle which is formulated in the sequel.

Theorem 2.7

Let A be an operator on \(\mathcal {B}^{\varrho }(X)\) with domain D, and \(\omega \in \mathbb {R}\). A is closable with its closure \(\overline{A}\) generating a generalized Feller semigroup \((P_t)_{t\ge 0}\) with \(||P_t||_{L(\mathcal {B}^{\varrho }(X))}\le \exp (\omega t)\) for all \(t\ge 0\) if and only if
  1. (i)

    D is dense,

  2. (ii)

    \(A-\omega _0\) has dense image for some \(\omega _0>\omega \), and

  3. (iii)

    A satisfies the generalized positive maximum principle, that is, for \(f\in D\) with \((\varrho ^{-1}f)\vee 0\le \varrho (z)^{-1}f(z)\) for some \(z\in X\), \(Af(z)\le \omega f(z)\).


As a new contribution to the general theorems, we shall work out a statement on invariant subspaces which will be crucial for constructing squares of infinite dimensional OU processes.

Theorem 2.8

Let X be a weighted space with weight \( \varrho _1\) and \( q : X \rightarrow q(X) \) be a (surjective) continuous map from \( (X,\varrho _1) \) to the weighted space \( (q(X),\varrho _2) \). Let \( P^{(1)} \) be a generalized Feller semigroup acting on \( \mathcal {B}^{\varrho _1}(X) \). Assume that \( \varrho _2 \circ q \le \varrho _1 \) on X. Let D be a dense subspace of \(\mathcal {B}^{\varrho _2}(q(X)) \). Furthermore, for every \( f \in D \subset \mathcal {B}^{\varrho _2}(q(X)) \) and for every \( t \ge 0 \), there is some \( g \in \mathcal {B}^{\varrho _2}(q(X)) \) such that
$$\begin{aligned} P^{(1)}_t (f \circ q) = g \circ q \, , \end{aligned}$$
and additionally, there is a constant \( C \ge 1 \) such that
$$\begin{aligned} P^{(1)}_t (\varrho _2 \circ q) \le C \varrho _2 \circ q \, . \end{aligned}$$
Then, there is a generalized Feller semigroup \( P^{(2)} \) acting on \(\mathcal {B}^{\varrho _2}(q(X)) \) such that
$$\begin{aligned} P^{(1)}_t (f \circ q) = (P^{(2)}_t f) \circ q \, . \end{aligned}$$


The continuous map q defines a linear operator M from \( \mathcal {B}^{\varrho _2}(q(X)) \) to \( \mathcal {B}^{\varrho _1}(X) \) via \( f \mapsto f \circ q \). Notice that M is bounded, since
$$\begin{aligned} {\Vert M f \Vert }_{\varrho _1} \le {\Vert f \Vert }_{\varrho _2}, \quad f \in \mathcal {B}^{\varrho _2}(q(X)) \end{aligned}$$
due to the assumption \( \varrho _2 \circ q \le \varrho _1 \). It is also injective, but its image is not necessarily closed. Assumptions (2.8) and (2.9) now mean that
$$\begin{aligned} P^{(1)}_t M f \in {\text {rg}}(M) \end{aligned}$$
for every \( f \in \mathcal {B}^{\varrho _2}(q(X)) \) and not only for \( f \in D \). Hence, we can define
$$\begin{aligned} P^{(2)}_t f := M^{-1} P^{(1)}_t M f \, , \end{aligned}$$
which is by the very construction a semigroup of linear operators on \( \mathcal {B}^{\varrho _2}(q(X)) \). Since M is continuous, its graph is closed, whence \( P^{(2)}_t \) is a bounded linear operator by the closed graph theorem. Moreover, property (iv) of Definition 2.5 holds true due to Assumption (2.9). Positivity is also preserved, since for \(f \ge 0\), we have due to Assumption (2.8) and the fact that \(P^{(1)}\) is a generalized Feller semigroup,
$$\begin{aligned} P^{(2)}_t f = M^{-1} P^{(1)}_t M f=M^{-1}\underbrace{ P^{(1)}_t (f \circ q)}_{\ge 0}=M^{-1}(g \circ q)=g\ge 0. \end{aligned}$$
Here, g is nonnegative due the positivity of \(P^{(1)}_t (f \circ q)\). By (2.8) and the definition of \(P^{(2)}\), (2.10) clearly holds true. Hence,
$$\begin{aligned} \lim _{t \rightarrow 0} P^{(2)}_t f (q(x)) = \lim _{t \rightarrow 0} P^{(1)}_t f(q(x)) = f(q(x)) \end{aligned}$$
for \( x \in X\) and thus property (iii) of Definition 2.5. Hence, all conditions of Definition 2.5 are satisfied and we can conclude that the operators \( (P_t^{(2)}) \) form a generalized Feller semigroup.

\(\square \)

Remark 2.9

In the setting of general semigroups, it is not clear that restrictions of semigroups to (not even closed) subspaces preserve strong continuity.

Remark 2.10

There are several methods to show that (2.8) is satisfied. In general, it is not sufficient to assume that the generator of \( P^{(1)} \) has this property.

Corollary 2.11

Let the assumptions of Theorem 2.8 except Assumption (2.9) hold true and suppose additionally that
$$\begin{aligned} \varrho _2 \circ q = \varrho _1. \, \end{aligned}$$
Then, the same conclusions hold true. In particular, the range of the operator \( M: \mathcal {B}^{\varrho _2}(q(X)) \rightarrow \mathcal {B}^{\varrho _1}(X),\, f \mapsto f \circ q \) is closed.

We restate from Cuchiero and Teichmann (2018) assertions on existence of generalized Feller processes and path properties. It is remarkable that in this very general context, càg versions exist for countably many test functions.

Theorem 2.12

Let \( (P_t)_{t \ge 0} \) be a generalized Feller semigroup with \( P_t 1 = 1 \) for \( t \ge 0 \). Then, there exists a filtered measurable space \( (\Omega ,(\mathcal {F}_t)_{t \ge 0}) \) with right continuous filtration, and an adapted family of random variables \( {(\lambda _t)}_{t \ge 0} \) such that for any initial value \( \lambda _0 \in X \) there exists a probability measure \( \mathbb {P}^{\lambda _0} \) with
$$\begin{aligned} \mathbb {E}_{\lambda _0}[f(\lambda _t)] := \mathbb {E}_{\mathbb {P}^{\lambda _0}}[f(\lambda _t)] = P_t f(\lambda _0) \end{aligned}$$
for \( t \ge 0 \) and every \( f \in \mathcal {B}^\varrho (X) \). The Markov property holds true, i.e.,
$$\begin{aligned} \mathbb {E}_{\mathbb {P}^{\lambda _0}}[f(\lambda _t) \, | \; \mathcal {F}_s] = P_{t-s} f(\lambda _s) \end{aligned}$$
almost surely with respect to \( \mathbb {P}^{\lambda _0} \).

Theorem 2.13

Let \( (P_t)_{t\ge 0} \) be a generalized Feller semigroup, and let \( (\lambda _t)_{t \ge 0} \) be a generalized Feller process on a filtered probability space. Then, for every countable family \( {(f_n)}_{n \ge 0} \) of functions in \( \mathcal {B}^\varrho (X) \), we can choose a version of the processes \( {\left( \frac{f_n(\lambda _t)}{\varrho (\lambda _t)} \right) }_{t \ge 0} \), such that the trajectories are càglàd for all \( n \ge 0 \). If additionally \( P_t \varrho \le \exp (\omega t) \varrho \) holds true, then \( (\exp (- \omega t) \varrho (\lambda _t))_{t \ge 0} \) is a super-martingale and can be chosen to have càglàd trajectories. In this case, we obtain that the processes \( {\big ( f_n(\lambda _t) \big )}_{t \ge 0} \) can be chosen to have càglàd trajectories.

Remark 2.14

In the general case, when \( P_t \varrho \le M \exp (\omega t) \varrho \) for \(M >1\), we obtain for \( {\big ( f_n(\lambda _t) \big )}_{t \ge 0} \) only càg trajectories. To see this, consider the measurable set of sample events \( \{ \sup _{0 \le t \le 1} \varrho (\lambda _t) \le R \} \). Then, we can construct on the metrizable compact set \( \{ \varrho \le R \} \) a càglàd version of the processes \( {\left( \frac{f_n(\lambda _t)}{\varrho (\lambda _t)} \right) }_{t \le 1} \) and \( \left( {\frac{1}{\varrho (\lambda _t)}}\right) _{t \le 1} \) and in turn also of \( {\big ( f_n(\lambda _t) \big )}_{t \ge 0}\). The limit \( R \rightarrow \infty \), however, only leads to a càg version since we cannot control the right limits.

2.2 Dual spaces of Banach spaces

The most important playground for our theory will be closed subsets of duals of Banach spaces, where the weak-\(*\)-topology appears to be \( \sigma \)-compact due to the Banach–Alaoglu theorem. Assume that \( \mathcal {E} \subset Y^*\) is a closed subset of the dual space \(Y^*\) of some Banach space Y where \(Y^{*}\) is equipped with its weak-\(*\)-topology. Consider a lower semicontinuous function \(\varrho :\mathcal {E} \rightarrow (0,\infty )\) and denote by \((\mathcal {E},\varrho )\) the corresponding weighted space. We have the following approximation result (see Döorsek and Teichmann (2010, Theorem 4.2)) for functions in \(\mathcal {B}^{\varrho }(\mathcal {E})\) by cylindrical functions. Set
$$\begin{aligned} {\text {Cyl}}_N:= & {} \bigl \{ g(\langle \cdot ,y_1\rangle ,\ldots ,\langle \cdot ,y_N\rangle ):g\in \mathrm {C}_b^{\infty }(\mathbb {R}^N) \nonumber \\&\text {and} y_j \in Y, j=1,\ldots ,N \bigr \}, \end{aligned}$$
where \(\langle \cdot , \cdot \rangle \) denotes the pairing between \(Y^*\) and Y. We denote by \({\text {Cyl}}:=\bigcup _{N\in \mathbb {N}}{\text {Cyl}}_N\) the set of bounded smooth continuous cylinder functions on \(\mathcal {E}\).

Theorem 2.15

The closure of \({\text {Cyl}}\) in \(\mathrm {B}^\varrho (\mathcal {E})\) coincides with \(\mathcal {B}^\varrho (\mathcal {E})\), whose elements appear to be precisely the functions \(f\in \mathcal {B}^{\varrho }(\mathcal {E})\) which satisfy (2.3) and that \(f|_{K_R}\) is weak-\(*\)-continuous for any \(R>0\).


See Cuchiero and Teichmann (2018).

\(\square \)

Assumption 2.16

Let \((\lambda _t)_{t\ge 0}\) denote a time homogeneous Markov process on some stochastic basis \((\Omega ,\mathcal {F}, (\mathcal {F}_t)_{t\ge 0}, \mathbb {P}^{\lambda _0})\) with values in \(\mathcal {E}\).

Then, we assume that
  1. (i)
    there are constants C and \(\varepsilon >0\) such that
    $$\begin{aligned} \mathbb {E}_{\lambda _0}[\varrho (\lambda _t)]\le C\varrho (\lambda _0) \quad \text {for all } \lambda _0\in \mathcal {E} \text { and } t\in [0,\varepsilon ]; \end{aligned}$$
  2. (ii)
    $$\begin{aligned} \lim _{t\rightarrow 0} \mathbb {E}_{\lambda _0}[f(\lambda _t))] = f(\lambda _0) \quad \text {for any } f\in \mathcal {B}^{\varrho }(\mathcal {E}) \text { and } \lambda _0\in \mathcal {E}; \end{aligned}$$
  3. (iii)

    for all f in a dense subset of \( \mathcal {B}^\varrho (\mathcal {E}) \), the map \( \lambda _0 \mapsto \mathbb {E}_{\lambda _0}[f(\lambda _t)] \) lies in \( \mathcal {B}^\varrho (\mathcal {E}) \).


Remark 2.17

Of course inequality (2.12) implies that \( |\mathbb {E}_{\lambda _0}[f(\lambda _t)]|\le C \varrho (\lambda _0) \) for all \( f \in \mathcal {B}^{\varrho }(\mathcal {E}) \), \( \lambda _0 \in \mathcal {E} \) and \( t \in [0,\varepsilon ]\).

Theorem 2.18

Suppose Assumptions 2.16 hold true. Then, \(P_t f(\lambda _0):=\mathbb {E}_{\lambda _0}[f(\lambda _t)]\) satisfies the generalized Feller property and is therefore a strongly continuous semigroup on \(\mathcal {B}^\varrho (\mathcal {E})\).


This follows from the arguments of Dörsek and Teichmann (2010, Section 5).

\(\square \)

3 Approximation theorems

In order to establish existence of Markovian solutions for general generators A, we could at least in the pseudo-contractive case either directly apply Theorem 2.7, where we have to assume that the generator A satisfies on a dense domain D a generalized positive maximum principle and that for at least one \( \omega _0 > \omega \) the range of \( A - \omega _0 \) is dense, or we approximate a general generator A by (finite activity pure jump) generators \(A^n \) and apply the following (well known) approximation theorems. They also work in the general context when the constant \(M >1\).

Theorem 3.1

Let \( (P_t^n)_{n \in \mathbb {N}, t\ge 0} \) be a sequence of strongly continuous semigroups on a Banach space Z with generators \( (A^n)_{n \in \mathbb {N}} \) such that there are uniform (in n) growth bounds \( M \ge 1 \) and \( \omega \in \mathbb {R} \) with
$$\begin{aligned} \Vert P^n_t \Vert _{L(Z)} \le M \exp (\omega t) \end{aligned}$$
for \( t \ge 0 \). Let furthermore \( D \subset \cap _n {\text {dom}}(A^n)\) be a dense subspace with the following three properties:
  1. (i)

    D is an invariant subspace for all \( P^n \), i.e., for all \( f \in D \), we have \( P^n_t f \in D \), for \( n \ge 0 \) and \( t \ge 0 \).

  2. (ii)
    There is a norm \( {\Vert .\Vert }_D \) on D such that there are uniform growth bounds with respect to \( {\Vert .\Vert }_D \), i.e., there are \( M_D \ge 1 \) and \( \omega _D \in \mathbb {R} \) with
    $$\begin{aligned} {\Vert P^n_t f \Vert }_D \le M_D \exp (\omega _Dt) {\Vert f\Vert }_D \end{aligned}$$
    for \( t \ge 0 \) and for \( n \ge 0 \).
  3. (iii)
    The sequence \( A^n f \) converges as \( n \rightarrow \infty \) for each \( f \in D \), in the following sense: There exists a sequence of numbers \( a_{nm} \rightarrow 0 \) as \( n,m \rightarrow \infty \) such that
    $$\begin{aligned} \Vert A^n f - A^m f \Vert \le a_{nm} {\Vert f \Vert }_D \end{aligned}$$
    holds true for every \( f \in D \) and for all nm.
Then, there exists a strongly continuous semigroup \( (P_t^\infty )_{t \ge 0} \) with the same growth bound on Z such that \( \lim _{n \rightarrow \infty } P^n_t f = P^\infty _t f \) for all \( f \in Z \) uniformly on compacts in time and on bounded sets in D. Furthermore on D, the convergence is of order \( O(a_{nm}) \). If in addition for each \(n \in \mathbb {N}\), \((P_t^n)_{t \ge 0}\) is a generalized Feller semigroup, then this property transfers also to the limiting semigroup.


See Cuchiero and Teichmann (2018). \(\square \)

For the purposes of affine processes, a slightly more general version of the approximation theorem is needed, which we state in the sequel:

Theorem 3.2

Let \( (P_t^n)_{n \in \mathbb {N}, t\ge 0} \) be a sequence of strongly continuous semigroups on a Banach space Z with generators \( (A^n)_{n \in \mathbb {N}} \) such that there are uniform (in n) growth bounds \( M \ge 1 \) and \( \omega \in \mathbb {R} \) with
$$\begin{aligned} \Vert P^n_t \Vert _{L(Z)} \le M \exp (\omega t) \end{aligned}$$
for \( t \ge 0 \). Let furthermore \( D \subset \cap _n {\text {dom}}(A^n)\) be a subset with the following two properties:
  1. (i)

    The linear span \({\text {span}}(D)\) is dense.

  2. (ii)
    There is a norm \( {\Vert .\Vert }_D \) on \( {\text {span}}(D) \) such that for each \( f \in D \) and for \( t > 0 \), there exists a sequence \( a^{f,t}_{nm} \), possibly depending on f and t,
    $$\begin{aligned} \Vert A^n P^m_u f - A^m P^m_u f \Vert \le a^{f,t}_{nm} {\Vert f \Vert }_D \end{aligned}$$
    holds true for nm and for \( 0 \le u \le t\), with \( a^{f,t}_{nm} \rightarrow 0 \) as \( n,m \rightarrow \infty \).
Then, there exists a strongly continuous semigroup \( (P_t^\infty )_{t \ge 0} \) with the same growth bound on Z such that \( \lim _{n \rightarrow \infty } P^n_t f = P^\infty _t f \) for all \( f \in Z \) uniformly on compacts in time. If in addition for each \(n \in \mathbb {N}\), \((P_t^n)_{t \ge 0}\) is a generalized Feller semigroup, then this property transfers also to the limiting semigroup.


See Cuchiero and Teichmann (2018).\(\square \)

Our first application of Theorem 3.1 is the next proposition that extends well-known results on bounded generators toward unbounded limits.

We repeat here a remark from Cuchiero and Teichmann (2018) since it helps to understand the fourth condition on the measures:

Remark 3.3

Let \( (P_t)_{t \ge 0} \) be a generalized Feller semigroup with \(\Vert P_t\Vert _{L(\mathcal {B}^{\varrho }(X))}\le M \exp (\omega t) \) for some \(M\ge 1 \) and some \(\omega \). Additionally, it is assumed to be of transport type, i.e.,
$$\begin{aligned} P_t f (x) = f ( \psi _t(x)) \end{aligned}$$
for some continuous map \( \psi _t : X \rightarrow X \). Define now a new function
$$\begin{aligned} \tilde{\varrho }(x) := \sup _{t \ge 0} \, \exp (-\omega t) P_t \varrho (x) \end{aligned}$$
for \( x \in X \). Notice that \( \tilde{\varrho }\) is an admissible weight function, since
$$\begin{aligned} \{ \tilde{\varrho }\le R \} = \cap _{t \ge 0} \, \{ P_t \varrho \le \exp (\omega t) R \} \le \{ \varrho \le R \} \end{aligned}$$
is compact by the definition of \(\varrho \) and the continuity of \(x \mapsto \psi _t(x)\) which leads to an intersection of closed subsets of compacts. Additionally, we have that
$$\begin{aligned} \varrho \le \tilde{\varrho }\le M \varrho \end{aligned}$$
by the growth bound, and therefore, the norm on \( \mathcal {B}^\varrho (X) \) is equivalent to
$$\begin{aligned} {||f ||}_{\tilde{\varrho }} = \sup _{x \in X} \frac{|f(x)|}{\tilde{\varrho }(x)} \, . \end{aligned}$$
$$\begin{aligned} ||P_tf ||_{\tilde{\varrho }} \le \exp (\omega t) ||f ||_{\tilde{\varrho }} \end{aligned}$$
holds for all \(t\ge 0\) and \( f \in \mathcal {B}^\varrho (X) \). Indeed, this is a consequence of the following estimate
$$\begin{aligned} ||P_tf ||_{\tilde{\varrho }}&= \sup _{x}\left| \frac{f(\psi _t(x))}{\sup _s \exp (-\omega s) \varrho (\psi _s(x))}\right| \le \sup _{x}\left| \frac{f(\psi _t(x))}{\sup _s \exp (-\omega (t+s)) \varrho (\psi _{t+s}(x))}\right| \\&\le \exp (\omega t) \sup _{x}\left| \frac{f(\psi _t(x))}{\sup _s \exp (-\omega s) \varrho (\psi _{s}(\psi _t(x)))}\right| \le \exp (\omega t) \Vert f\Vert _{\tilde{\varrho }}. \end{aligned}$$
$$\begin{aligned} |P_tf(x)| \le \exp (\omega t)\tilde{\varrho }(x) \Vert f\Vert _{\tilde{\varrho }}, \end{aligned}$$
which implies
$$\begin{aligned} P_t \tilde{\varrho }\le \exp (\omega t) \tilde{\varrho }, \quad t \ge 0. \end{aligned}$$

Proposition 3.4

Let \( (X,\varrho ) \) be a weighted space with weight function \( \varrho \ge 1 \). Consider an operator A on \(\mathcal {B}^{\varrho }(X)\) with dense domain \({\text {dom}}(A)\) generating on \( \mathcal {B}^\varrho (X) \) a generalized Feller semigroup \( (P_t)_{t\ge 0} \) of transport type as in (3.2), such that for all \(t \ge 0\), we have \( \Vert P_t\Vert _{L(B^{\varrho }(X))} \le M_1 \exp (\omega t)\) for some \(M_1\) and \( \omega \) and such that \( \mathcal {B}^{\sqrt{\varrho }}(X) \subset \mathcal {B}^\varrho (X) \) is left invariant.

Consider furthermore a family of finite measures \( \mu (x,.)\) for \( x \in X \) on X such that the operator B acts on \(\mathcal {B}^{\varrho }(X)\) by
$$\begin{aligned} B f (x) : = \int (f(y) - f(x)) \mu (x, \mathrm{d}y) \end{aligned}$$
for \( x \in X \) yielding continuous functions on \( \{\varrho \le R \} \) for \( R \ge 0 \), and such that the following properties hold true:
  • For all \( x \in X \)
    $$\begin{aligned} \int \varrho (y) \mu (x,\mathrm{d}y) \le M \varrho ^2 (x), \end{aligned}$$
    as well as
    $$\begin{aligned} \int \sqrt{\varrho (y)} \mu (x,\mathrm{d}y) \le M \varrho (x), \end{aligned}$$
    $$\begin{aligned} \int \mu (x,\mathrm{d}y) \le M \sqrt{\varrho (x)}, \end{aligned}$$
    hold true for some constant M.
  • For some constant \( \widetilde{\omega } \in \mathbb {R} \),
    $$\begin{aligned} \int \Big | \frac{\sup _{t \ge 0} \exp (-\omega t) P_t \varrho (y) -\sup _{t \ge 0} \exp (- \omega t ) P_t \varrho (x)}{\sup _{t \ge 0} \exp (-\omega t) P_t \varrho (x)} \Big | \mu (x,\mathrm{d}y) \le \widetilde{\omega } , \end{aligned}$$
    for all \( x \in X \). In particular, \( y \mapsto \sup _{t \ge 0} \exp (-\omega t) P_t \varrho (y) \) should be integrable with respect to \( \mu (x,.) \)
Then, \( A + B \) generates a generalized Feller semigroup \((P_t^{\infty })_{t \ge 0}\) on \( \mathcal {B}^\varrho (X) \) satisfying \(\Vert P^{\infty }_t\Vert _{L(\mathcal {B}^{\varrho }(X))} \le M_1 \exp ((\omega + \tilde{\omega })t)\).


See Cuchiero and Teichmann (2018).\(\square \)

Remark 3.5

In contrast to classical Feller theory, also processes with unbounded jump intensities can be constructed easily if \( \varrho \) is unbounded on X. The general character of the proposition allows to build general processes from simple ones by perturbation.

4 Lifting stochastic Volterra jump processes with values in \(\mathbb {S}^d_+ \)

Building on the theory of generalized Feller processes from the above, we shall now treat the following type of matrix measure-valued SPDEs
$$\begin{aligned} \mathrm{d} \lambda _t(\mathrm{d}x)= & {} \mathcal {A}^* \lambda _t(\mathrm{d}x) \mathrm{d}t + \nu (\mathrm{d}x) \mathrm{d}X_t + \mathrm{d}X_t \nu (\mathrm{d}x), \nonumber \\ \lambda _0\in & {} \mathcal {E} . \end{aligned}$$
As shown below, this equation corresponds to a Markovian lift of the Volterra jump process in (1.2).

We consider here the setting of Sect. 2.2. The underlying Banach space \(Y^*\) is here the space of finite \(\mathbb {S}^d\)-valued regular Borel measures on the extended half real line \(\overline{\mathbb {R}}_+:=\mathbb {R}_+ \cup \{\infty \}\), and \(\mathcal {E}\) denotes a (positive definite) subset of \(Y^*\). Moreover, \(\mathcal {A}^*\) is the generator of a strongly continuous semigroup \(\mathcal {S}^*\) on \(Y^*\), \(\nu \in Y^*\) (or in a slightly larger space denoted by \(Z^*\) in the sequel). The pre-dual space Y is given by \(C_{b}(\overline{\mathbb {R}}_+, \mathbb {S}^d)\) functions. Note that since \(\overline{\mathbb {R}}_+\) is compact, \(Y=C_{b}(\overline{\mathbb {R}}_+, \mathbb {S}^d)\) is separable. The driving process X is an \(\mathbb {S}^d\)-valued pure jump Itô-semimartingale, whose differential characteristics depend linearly on \(\lambda \), precisely specified below. Let us remark that other forms of differential characteristics of X, in particular beyond the linear case, can be easily incorporated in this setting.

The pairing between Y and \(Y^*\), denoted by \(\langle \cdot , \cdot \rangle \), is specified via:
$$\begin{aligned} \langle \cdot , \cdot \rangle : Y \times Y^* \rightarrow \mathbb {R}, \quad (y, \lambda ) \mapsto \langle y, \lambda \rangle ={{\,\mathrm{Tr}\,}}\left( \int _0^{\infty } y(x) \lambda (\mathrm{d}x) \right) , \end{aligned}$$
where \({{\,\mathrm{Tr}\,}}\) denotes the trace. We also define another bilinear map via
$$\begin{aligned} \langle \langle \cdot , \cdot \rangle \rangle : Y \times Y^* \rightarrow \mathbb {S}^d, \quad (y, \lambda ) \mapsto \langle \langle y, \lambda \rangle \rangle =\int _0^{\infty } y(x)\lambda (\mathrm{d}x) + \int _0^{\infty } \lambda (\mathrm{d}x) y(x). \end{aligned}$$
In the following, we summarize the main ingredients of our setting. For the norm on \(\mathbb {S}^d\), we write \(\Vert \cdot \Vert \), which is given by \(\Vert u \Vert = \sqrt{{{\,\mathrm{Tr}\,}}(u^2)}\) for \(u \in \mathbb {S}^d\).

Assumption 4.1

Throughout this section, we shall work under the following conditions:
  1. (i)
    We are given an admissible weight function \( \varrho \) on \( Y^* \) (in the sense of Sect. 2) such that
    $$\begin{aligned} \varrho (\lambda ) = 1+ {\Vert \lambda \Vert }_{Y^*}^2, \quad \lambda \in Y^*, \end{aligned}$$
    where \(\Vert \cdot \Vert _{Y^*}\) denotes the norm on \(Y^*\), which is the total variation norm of \( \lambda \).
  2. (ii)

    We are given a closed convex cone \( \mathcal {E} \subset Y^* \) (in the sequel the cone of \(\mathbb {S}^d_+\) valued measures) such that \( (\mathcal {E},\varrho ) \) is a weighted space in the sense of Sect. 2. This will serve as state space of (4.1).

  3. (iii)

    Let \( Z \subset Y \) be a continuously embedded subspace.

  4. (iv)
    We assume that a semigroup \( \mathcal {S}^* \) with generator \( \mathcal {A}^* \) acts in a strongly continuous way on \( Y^* \) and \( Z^* \), with respect to the respective norm topologies. Moreover, we suppose that for any matrix \(A \in \mathbb {S}^d\), it holds that
    $$\begin{aligned} \mathcal {S}^*_t(\lambda (\cdot ) A+ A \lambda (\cdot ))= (\mathcal {S}^*_t\lambda (\cdot )) A+ A (\mathcal {S}^*_t \lambda (\cdot )). \end{aligned}$$
  5. (v)

    We assume that \( \lambda \mapsto \mathcal {S}^*_t\lambda \) is weak-\(*\)-continuous on \( Y^* \) and on \( Z^* \) for every \( t \ge 0 \) (considering the weak-\(*\)-topology on both the domain and the image space).

  6. (vi)

    We suppose that the (pre-) adjoint operator of \( \mathcal {A}^* \), denoted by \(\mathcal {A}\) and domain \( {\text {dom}}(\mathcal {A}) \subset Z \subset Y \), generates a strongly continuous semigroup on Z with respect to the respective norm topology (but not necessarily on Y).

To analyze solvability of (4.1), we first consider the following linear deterministic equation
$$\begin{aligned} \mathrm{d} \lambda _t(\mathrm{d}x) = \mathcal {A}^* \lambda _t(\mathrm{d}x) \mathrm{d}t + \nu (\mathrm{d}x) \beta (\lambda _t(\cdot ) )\mathrm{d}t+ \beta (\lambda _t(\cdot )) \nu (\mathrm{d}x)\mathrm{d}t \end{aligned}$$
for \( \lambda _0 \in Y^* \), \( \nu \in Z^*\) and \(\beta \) a bounded linear operator from \(Y^* \rightarrow \mathbb {S}^d\) which satisfies for \( A \in \mathbb {S}^d\) and \(\lambda \in Y^*\)
$$\begin{aligned} \beta (\lambda (\cdot ) A +A\lambda (\cdot ))= \beta (\lambda (\cdot )) A + A \beta (\lambda (\cdot )). \end{aligned}$$
We denote by \(\beta _*: \mathbb {S}^d \rightarrow Y\) the adjoint operator defined via
$$\begin{aligned} {{\,\mathrm{Tr}\,}}( u \beta (\lambda ))= {{\,\mathrm{Tr}\,}}\left( \int _0^{\infty } \beta _*(u)(x) \lambda (\mathrm{d}x)\right) =\langle \beta _*(u), \lambda \rangle , \quad u \in \mathbb {S}^d, \, \lambda \in Y^*. \end{aligned}$$

Remark 4.2

Notice that drift specifications could be more general here, but for the sake or readability, we leave this direction for the interested reader.

For notational convenience, we shall often leave the \(\mathrm{d}x\) argument away when writing an (S)PDE of type (4.4) subsequently. Under the following assumptions on \( \mathcal {S}^* \) and \( \nu \in Z^* \), we can guarantee that (4.4) can be solved on the space \(Y^*\) for all times in the mild sense with respect to the dual norm \(\Vert \cdot \Vert _{Y^*}\) by a standard Picard iteration method.

Assumption 4.3

We assume that
  1. (i)

    \( \mathcal {S}^*_t \nu \in Y^* \) for all \( t > 0 \) even though \( \nu \) does not necessarily lie in \( Y^* \) itself, but only in \( Z^* \);

  2. (ii)

    \( \int _0^t \Vert \mathcal {S}^*_s \nu \Vert ^2_{Y^*} \mathrm{d}s < \infty \) for all \( t > 0 \).

For the linear operator \(\beta \) as of (4.5), we define
$$\begin{aligned} K(t) := \beta (S_t^{*} \nu ), \end{aligned}$$
which will correspond to a kernel in \(L^2_{\text {loc}}(\mathbb {R}_+, \mathbb {S}^d)\) of a Volterra equation. Define furthermore \(R_K \in L^2_{\text {loc}}(\mathbb {R}_+, \mathbb {S}^d)\) as a symmetrized version of the resolvent of the second kind [(see, e.g., Gripenberg et al. (1990, Theorem 3.1)] that solves
$$\begin{aligned} K*R_K+R_K * K=K-R_K, \end{aligned}$$
where \(K*R_K\) denotes the convolution, i.e., \(K*R_K= \int _0^{\cdot } K(\cdot -s)R_K(s) \mathrm{d}s\).

Example 4.4

The main examples that we have in mind for \(\beta \) and for \(\mathcal {S}^*\), and thus in turn for the kernel K, are the following specifications:
$$\begin{aligned} \beta (\lambda ) =\int _0^{\infty } \lambda (\mathrm{d}x), \quad \mathcal {S}^*_t \nu (\mathrm{d}x) = e^{-xt}\nu (\mathrm{d}x). \end{aligned}$$
In this case, \(K= \int _0^{\infty } e^{-xt}\nu (\mathrm{d}x)\) and the adjoint operator \(\beta _*\) is given by the constant function
$$\begin{aligned} (\beta _*(u))(x)= u, \quad \text {for all } x \in \mathbb {R}_+. \end{aligned}$$

Remark 4.5

To the semigroup \(\mathcal {S}^*_t= e^{-xt}\) of the above example, we associate our (main) specification of the space Z: Let \(Z \subset Y\) such that for all \(y \in Y\) the map
$$\begin{aligned} h_y : \overline{ \mathbb {R}}_+ \rightarrow \mathbb {S}^d, \quad x \mapsto x y(x) \end{aligned}$$
lies in Z equipped with the operator norm, i.e.,
$$\begin{aligned} \Vert h_y \Vert _Z =\sqrt{\sup _{ x \ge 0} \Vert y(x) \Vert + \sup _{x \ge 0}\Vert x y(x) \Vert } \text{ for } h_y \in Z \, . \end{aligned}$$
The corresponding dual space \(Z^* \supset Y^*\) is the space of regular \(\mathbb {S}^d\)-valued Borel measures \(\nu \) on \(\overline{\mathbb {R}}_+\) that satisfy
$$\begin{aligned} \Vert \int _0^{\infty } \left( \frac{1}{x} \wedge 1\right) \nu (\mathrm{d}x)\Vert <\infty \, . \end{aligned}$$
Note that we can specify the components of \(\nu \) to be measures of the form
$$\begin{aligned} \nu _{ij}(\mathrm{d}x) = x^{-\frac{1}{2}-H_{ij}}, \quad H_{ij} \in \left( 0, \frac{1}{2}\right) , \end{aligned}$$
which gives rise to fractional kernels \(K_{ij}(t)=\int _0^{\infty } e^{-xt} \nu _{ij}(\mathrm{d}x) \approx t^{H_{ij}-\frac{1}{2}}\). These are in turn main ingredients of rough covariance modeling.

Remark 4.6

In this article, we choose to work with state spaces of matrix-valued measures using the representation of the kernel K as Laplace transform of a matrix-valued measure \(\nu \) as specified in Example 4.4. We could, however, perform the same analysis on a Hilbert space of forward covariance curves. This corresponds then to a multivariate analogon of Cuchiero and Teichmann (2018, Section 5.2).

Proposition 4.7

Under Assumption 4.3, there exists a unique mild solution of (4.4) with values in \(Y^*\). Additionally, the solution operator is a weak-\(*\)-continuous map \( \lambda _0 \mapsto \lambda _t \), for each \( t > 0 \), and the solution satisfies
$$\begin{aligned} \varrho (\lambda _t) \le C \varrho (\lambda _0), \quad \text {for all } \lambda _0 \in Y^* \text { and } t \in [0, \varepsilon ] \end{aligned}$$
for some positive constants C and \( \varepsilon \).

Remark 4.8

The unique mild solution of Equation (4.4) satisfies by means of (4.3) the variation of constants equation
$$\begin{aligned} \lambda _t = \mathcal {S}^*_t \lambda _0 + \int _0^t ( \mathcal {S}^*_{t-s} \nu \beta (\lambda _s) + \beta (\lambda _s) \mathcal {S}^*_{t-s} \nu )\mathrm{d}s, \end{aligned}$$
for all \( t \ge 0 \). Applying the linear operator \(\beta \) and using property (4.5), we obtain a deterministic linear Volterra equation of the form
$$\begin{aligned} \beta ( \lambda _t )= & {} \beta (\mathcal {S}_t^{*} \lambda _0 )+ \int _0^t \beta \left( \mathcal {S}_{t-s}^{*} \nu \beta (\lambda _s) +\beta (\lambda _s)\mathcal {S}_{t-s}^{*} \nu \right) \mathrm{d}s \nonumber \\= & {} \beta (\mathcal {S}_t^{*} \lambda _0 )+ \int _0^t \left( K(t-s) \beta (\lambda _s) +\beta (\lambda _s)K(t-s) \right) \mathrm{d}s \end{aligned}$$
where we have used (4.6).


We follow the arguments of Cuchiero and Teichmann (2018) and translate the proof to the matrix-valued stetting. We show first the completely standard convergence of the Picard iteration scheme with respect to the dual norm on \( Y^* \). Define
$$\begin{aligned} \lambda ^0_t&= \lambda _0,\\ \lambda ^{n+1}_t&= \mathcal {S}^*_t \lambda _0 + \int _0^t (\mathcal {S}^*_{t-s} \nu )\beta (\lambda ^n_s) \mathrm{d}s+ \int _0^t\beta (\lambda ^n_s) (\mathcal {S}^*_{t-s} \nu )\mathrm{d}s, \quad n \ge 0. \end{aligned}$$
Then, by Assumption 4.3, (i) each \(\lambda ^n_t\) lies \(Y^*\). Consider now
$$\begin{aligned} \Vert \lambda ^{n+1}_t - \lambda ^{n}_t\Vert _{Y^*}&=\Vert \int _0^t (\mathcal {S}^*_{t-s} \nu )(\beta (\lambda ^{n}_s) - \beta (\lambda ^{n-1}_s) )\mathrm{d}s\\&\quad + \int _0^t(\beta (\lambda ^{n}_s) -\beta (\lambda ^{n-1}_s)) (\mathcal {S}^*_{t-s} \nu )\mathrm{d}s\Vert _{Y^*}\\&\le 2 \Vert \beta \Vert _{\text {op}}\int _0^t \Vert \mathcal {S}^*_{t-s} \nu \Vert _{Y^*} \Vert \lambda ^n_s -\lambda ^{n-1}_s \Vert _{Y^*} \mathrm{d}s, \end{aligned}$$
where \(\Vert \beta \Vert _{\text {op}}\) denotes the operator norm of \(\beta \). Assumption 4.3 (ii) and an extended version of Gronwall’s inequality see Dalang (1999, Lemma 15) then yield convergence of \((\lambda ^n_t)_{n \in \mathbb {N}}\) to some \(\lambda _t\) with respect to the dual norm \(\Vert \cdot \Vert _{Y^*}\) uniformly in t on compact intervals. For details on strongly continuous semigroups and mild solutions, see Pazy (1983).
Having established the existence of a mild solution of (4.4) in \(Y^*\), consider now the \(\mathbb {S}^d\)-valued process \(\beta (\lambda _t)\):
$$\begin{aligned} \beta (\lambda _t)= & {} \beta (\mathcal {S}_t^{*} \lambda _0 )+ \int _0^t \beta \left( \mathcal {S}_{t-s}^* \nu \beta (\lambda _s) + \beta (\lambda _s) \mathcal {S}_{t-s}^* \nu \right) \mathrm{d}s ,\nonumber \\= & {} \beta (\mathcal {S}_t^{*} \lambda _0)+ \int _0^t \left( \beta (\mathcal { S}_{t-s}^* \nu ) \beta (\lambda _s) + \beta (\lambda _s) \beta (\mathcal {S}_{t-s}^* \nu ) \right) \mathrm{d}s\nonumber \\= & {} \beta (\mathcal {S}_t^{*} \lambda _0)+ \int _0^t \left( R_K(t-s) \beta (\mathcal {S}_s^{*} \lambda _0) + \beta (\mathcal {S}_s^{*} \lambda _0) R_K(t-s)\right) \mathrm{d}s \end{aligned}$$
where we applied property (4.5). Remember that \(R_K\) denotes the resolvent of the second kind of \( K(t)=\beta (\mathcal {S}_{t}^{*} \nu )\) as introduced in (4.7) by means of which we can solve the above equation in terms of integrals of \( t \mapsto \beta (\mathcal {S}_t^* \lambda _0 ) \). Since by assumption, \( \mathcal {S}^* \) is a weak-\(*\)-continuous solution operator, the map \( \lambda _0 \mapsto (t \mapsto \beta (\mathcal {S}^*_t \lambda _0 )) \) is weak-\(*\)-continuous as a map from \( Y^* \) to \( C(\mathbb {R}_{+},\mathbb {S}^d) \) (with the topology of uniform convergence on compacts on \(C(\mathbb {R}_{+},\mathbb {S}^d)\)). From (4.9), we thus infer that \( \beta (\lambda _t ) \) is weak-\(*\)-continuous for every \( t \ge 0 \), which clearly translates to the solution map of Equation (4.4).
Finally, we have to show that the stated inequality for \( \varrho (\lambda _t) \) holds true on small time intervals \( [0,\varepsilon ]\). Observe first that for \(t \in [0,\varepsilon ]\)
$$\begin{aligned} \Vert \mathcal {S}^*_t \lambda \Vert _{Y^*}^2 \le C \Vert \lambda \Vert ^2_{Y^*} \end{aligned}$$
for all \(\lambda \in Y^*\) just by the assumption that \(\mathcal {S}^*_t\) is strongly continuous, for some constant \( C \ge 1 \). Furthermore for \(t \in [0, \varepsilon ]\),
$$\begin{aligned} \Vert \lambda _t\Vert _{Y^*}^2&\le 3 \left( C \Vert \lambda _0\Vert ^2_{Y^*} + t\int _0^t \Vert \mathcal {S}_{t-s}^* \nu \beta (\lambda _s)\Vert _{Y^*}^2 + t\int _0^t \Vert \beta (\lambda _s) \mathcal {S}_{t-s}^* \nu \Vert _{Y^*}^2\right) \\&\le 3 \left( C \Vert \lambda _0\Vert ^2_{Y^*} + 2\varepsilon \Vert \beta \Vert ^2_{\text {op}} \int _0^t \Vert \mathcal {S}^*_{t-s} \nu \Vert ^2_{Y^*} \Vert \lambda _s \Vert _{Y^*}^2 \mathrm{d}s \right) . \end{aligned}$$
Consider now the kernel \(K'(t,s)=6\varepsilon \Vert \beta \Vert ^2_{\text {op}} \Vert \mathcal {S}^*_{t-s} \nu \Vert ^2_{Y^*} 1_{\{s \le t\}}\) and denote by \(R'\) the resolvent of \(-K'\), which is non-positive. By exactly the same arguments as in Cuchiero and Teichmann (2018), we then have for \(t \in [0,\varepsilon ]\)
$$\begin{aligned} \Vert \lambda _t\Vert _{Y^*}^2 \le \widetilde{C} \Vert \lambda _0\Vert ^2_{Y^*} \left( 1 - \int _0^{\varepsilon } R'(s) \mathrm{d}s\right) , \end{aligned}$$
for some constant \(\widetilde{C}\). This leads to the desired assertion due to the definition of \(\varrho \). From this inequality, also uniqueness follows in a standard way.\(\square \)
As our goal is to consider \(\mathbb {S}^d_+\)-measure-valued processes, we denote by \(\mathcal {E}\) the following weak-\(*\)-closed convex cone
$$\begin{aligned} \mathcal {E}=\{\lambda _0 \in Y^* \, |\, \lambda _0 \text { is an } \mathbb {S}^d_+ \text { -valued measure on } \overline{\mathbb {R}}_+\}. \end{aligned}$$
The next proposition establishes that the solution of (4.4) leaves \(\mathcal {E}\) invariant, if the following assumption holds true:

Assumption 4.9

We assume that
  1. (i)

    \(\mathcal {S}^*_t (\mathcal {E}) \subseteq \mathcal {E},\quad \forall t \ge 0\);

  2. (ii)

    \(\nu \) is an \(\mathbb {S}^d_+\)-valued measure;

  3. (iii)

    \(\beta (\mathcal {E}) \subseteq \mathbb {S}^d_+\).


Proposition 4.10

Let Assumptions 4.3 and 4.9 be in force. Then, the solution of (4.4) leaves \( \mathcal {E} \) invariant and it defines a generalized Feller semigroup on \( (\mathcal {E},\varrho ) \) by \( P_t f(\lambda _0) := f(\lambda _t) \) for all \( f \in \mathcal {B}^\varrho (\mathcal {E}) \) and \( t \ge 0 \).


Consider first the slightly modified equation
$$\begin{aligned} \mathrm{d} \lambda _t(\mathrm{d}x) = \mathcal {A}^* \lambda _t(\mathrm{d}x) \mathrm{d}t +\mathcal {S}_{\varepsilon }^* \nu (\mathrm{d}x) \beta (\lambda _t(\cdot ) )\mathrm{d}t+ \beta (\lambda _t(\cdot )) \mathcal {S}_{\varepsilon }^*\nu (\mathrm{d}x)\mathrm{d}t \end{aligned}$$
for some \(\varepsilon >0\). Then, the operator \(B=\mathcal {S}_{\varepsilon }^* \nu (\mathrm{d}x) \beta (\cdot ) + \beta (\cdot ) \mathcal {S}_{\varepsilon }^*\nu (\mathrm{d}x)\) is bounded and the associated semigroup is given by \(P_t^{\varepsilon }=e^{Bt}\). Due to the assumptions on \(\mathcal {S}^*\), \(\nu \) and \(\beta \), we have \(B(\mathcal {E}) \subseteq \mathcal {E}\) implying that \(P^{\varepsilon }_t(\mathcal {E}) \subseteq \mathcal {E}\) for all \(t \ge 0\). The Trotter-Kato theorem, see, e.g., Engel and Nagel (2000, Theorem III.5.8), then yields that the semigroup associated with (4.10) maps \(\mathcal {E}\) to itself. This then also holds true for the limit when \(\varepsilon =0\) by Theorem 3.1.

Since by Proposition , the solution operator is weak-\(*\)-continuous, we can conclude that \(\lambda _0 \mapsto f(\lambda _t)\) lies in \( \mathcal {B}^\varrho (\mathcal {E}) \) for a dense set of \( \mathcal {B}^\varrho (\mathcal {E}) \) by Theorem 2.15. Moreover, it satisfies the necessary bound (2.12) for \( \varrho \) and (2.13) is satisfied by (norm)-continuity of \(t \mapsto \lambda _t\). Hence, all the conditions of Assumption 2.16 are satisfied and the solution operator therefore defines a generalized Feller semigroup \( (P_t) \) on \( \mathcal {B}^\varrho (\mathcal {E}) \) by Theorem 2.18. This generalized Feller semigroup of course coincides with the previously constructed limit. \(\square \)

By the previous results, we can now construct a generalized Feller process on \( \mathcal {E} \) which jumps up by multiples of \( \mathcal {S}^*_{\varepsilon }\nu \) for some \(\varepsilon \ge 0\) and with an instantaneous intensity of size \( \beta (\lambda _t) \). Recall that \(\mathcal {E}_* \subset Y\) denotes the (pre-)polar cone of \( \mathcal {E}\), that is,
$$\begin{aligned} \mathcal {E}_*=\{y \in Y \, |\, y \in C_b(\overline{\mathbb {R}}_+, \mathbb {S}_-^d)\}. \end{aligned}$$
Recall the notation from (4.2) and define the following set
$$\begin{aligned} \mathcal {D}=\{ y \in Y \, |\, y \in {\text {dom}}(\mathcal {A})~\text { s.t. } \langle \langle y, \nu \rangle \rangle \text { is well defined}\}. \end{aligned}$$

Proposition 4.11

Let Assumptions 4.3 and 4.9 be in force. Moreover, let \(\mu \) be a finite \(\mathbb {S}^d_+\)-valued measure on \(\mathbb {S}^d_+\) such that \(\int _{\Vert \xi \Vert \ge 1}\Vert \xi \Vert ^2 \Vert \mu (\mathrm{d}\xi )\Vert < \infty \). Consider the SPDE
$$\begin{aligned} \mathrm{d} \lambda _t&= \mathcal {A}^* \lambda _t \mathrm{d}t +\nu \beta (\lambda _t) \mathrm{d}t + \beta (\lambda _t) \nu \mathrm{d}t + \mathcal {S}^*_\varepsilon \nu \mathrm{d}N_t + \mathrm{d}N_t \mathcal {S}^*_\varepsilon \nu , \end{aligned}$$
where \((N_t)_{t \ge 0}\) is a pure jump process with jump sizes in \(\mathbb {S}^d_+\) and compensator
$$\begin{aligned} \int _0^{\cdot } \int _{\mathbb {S}_+^d} \xi {{\,\mathrm{Tr}\,}}\left( \beta (\lambda _s) \mu (\mathrm{d}\xi ) \right) \mathrm{d}s. \end{aligned}$$
  1. (i)

    Then, for every \( \lambda _0 \in \mathcal {E} \) and \( \varepsilon > 0 \) , the SPDE (4.12) has a solution in \( \mathcal {E} \) given by a generalized Feller process associated with the generator of (4.12).

  2. (ii)
    This generalized Feller process is also a probabilistically weak and analytically mild solution of (4.12), i.e.,
    $$\begin{aligned} \lambda _t&= \mathcal {S}^*_t \lambda _0 \mathrm{d}s +\int _0^t \mathcal {S}^*_{t-s}\nu \beta (\lambda _s) \mathrm{d}s + \int _0^t\beta (\lambda _s) \mathcal {S}_{t-s}^*\nu \mathrm{d}s + \\&\quad +\int _0^t\mathcal {S}^*_{t-s+\varepsilon } \nu \mathrm{d}N_s+ \int _0^t \mathrm{d}N_s \mathcal {S}^*_{t-s+\varepsilon } \nu \, , \end{aligned}$$
    which justifies Eq. (4.12). In particular for every initial value the process N can be constructed on an appropriate probabilistic basis. The stochastic integral is defined in a pathwise way along finite variation paths. Moreover, for every family \((f_n)_n \in \mathcal {B}^{\varrho }(\mathcal {E})\), \(t \mapsto f_n(\lambda _t)\) can be chosen to be càglàd for all n.
  3. (iii)
    For every \( \varepsilon > 0 \), the corresponding Riccati equation \(\partial _t y_t=R(y_t)\) with \(R: \mathcal {D} \cap \mathcal {E}_* \rightarrow Y\) given by
    $$\begin{aligned} R(y)= & {} \mathcal {A} y + \beta _*\left( \int _0^{\infty } y(x) \nu (\mathrm{d}x) + \nu (\mathrm{d}x) y(x)\right) \nonumber \\&+ \beta _*\left( \int _{\mathbb {S}^d_+} \left( \exp ( \langle y , \mathcal {S}^*_{\varepsilon } \nu \xi +\xi \mathcal {S}^*_{\varepsilon } \nu \rangle )-1 \right) \mu (\mathrm{d}\xi )\right) , \end{aligned}$$
    admits a unique global solution in the mild sense for all initial values \( y_0 \in \mathcal {E}_* \).
  4. (iv)
    The affine transform formula holds true, i.e.,
    $$\begin{aligned} \mathbb {E}_{\lambda _0}\left[ \exp ( \langle y_0, \lambda _t \rangle )\right] =\exp (\langle y_t, \lambda _0 \rangle ), \end{aligned}$$
    where \(y_t\) solves \(\partial _t y_t=R(y_t)\) for all \(y_0 \in \mathcal {E}_*\) in the mild sense with R given by (4.13). Moreover, \(y_t \in \mathcal {E}_*\) for all \(t \ge 0\).


We assume that \( \nu \ne 0 \), otherwise there is nothing to prove. To prove the first assertion, we apply Proposition 3.4. By Propositions and 4.10, the deterministic equation (4.4) has a mild solution on \(\mathcal {E}\) which—by Assumption 4.3—defines a generalized Feller semigroup \((P_t)_{t\ge 0}\) on \( \mathcal {B}^\varrho (\mathcal {E}) \). The operator A in Proposition 3.4 then corresponds to the generator of \((P_t)_{t\ge 0}\), i.e., the semigroup associated with the purely deterministic part of (4.12). This is a transport semigroup, and in view of Remark 3.3, we can have an equivalent norm with respect to a new weight function \( \tilde{\varrho }\) on \(\mathcal {B}^{\varrho }(\mathcal {E})\), such that \( \Vert P_t \Vert _{L(B^{\tilde{\varrho }}(\mathcal {E}))} \le \exp (\omega t) \). Therefore, we find ourselves in the conditions of Proposition 3.4.

Note that by the same arguments as in Proposition 4.10 and by applying Theorem 2.18, we can prove that \((P_t)_{t \ge 0}\) also defines a generalized Feller semigroup on \( \mathcal {B}^{\sqrt{\varrho }}(\mathcal {E}) \). For the detailed proof which translates literally to the present setting, we refer to Cuchiero and Teichmann (2018).

Finally, we need to verify (3.3)–(3.5), which read as follows
$$\begin{aligned} \int \ \varrho (\lambda + \mathcal {S}^*_{\varepsilon } \nu \xi + \xi \mathcal {S}^*_{\varepsilon } \nu ) {{\,\mathrm{Tr}\,}}(\beta (\lambda ) \mu (\mathrm{d}\xi ))&\le M \varrho (\lambda )^2, \\ \int \sqrt{ \varrho }(\lambda + \mathcal {S}^*_{\varepsilon } \nu \xi + \xi \mathcal {S}^*_{\varepsilon } \nu ) {{\,\mathrm{Tr}\,}}(\beta (\lambda ) \mu (\mathrm{d}\xi ))&\le M \varrho (\lambda ),\\ \int {{\,\mathrm{Tr}\,}}( \beta ( \lambda ) \mu (\mathrm{d}\xi ))&\le M \sqrt{ \varrho (\lambda )} \, , \end{aligned}$$
which hold true by the second-moment condition on \(\mu \). Concerning (3.6), denote as in Remark 3.3
$$\begin{aligned} \tilde{\varrho }(\lambda ) =\sup _{t\ge 0} \exp (-\omega t) P_t \varrho (\lambda ) \, . \end{aligned}$$
In particular, we know that \( \varrho \le \tilde{\varrho }\) and it holds that \(P_tf(x)=f(\psi _t(x))\) where \(\psi \) is the solution of (4.4) which is linear. Using this together with \( | \sup _t c(t) - \sup _t d(t)| \le \sup _t |c(t) - d(t) | \), we obtain for some \(\widetilde{\omega }\)
$$\begin{aligned}&\int \big | \frac{\tilde{\varrho }(\lambda + \mathcal {S}^*_{\varepsilon } \nu \xi + \xi \mathcal {S}^*_{\varepsilon } \nu )-\tilde{\varrho }(\lambda )}{\tilde{\varrho }(\lambda )} \big | {{\,\mathrm{Tr}\,}}( \beta (\lambda ) \mu (\mathrm{d}\xi )) \\&\quad \le \int \big | \frac{\sup _{t \ge 0} \exp (-\omega t) |P_t\varrho (\lambda + \mathcal {S}^*_{\varepsilon } \nu \xi + \xi \mathcal {S}^*_{\varepsilon } \nu )- P_t \varrho ( \lambda )|}{\tilde{\varrho }(\lambda )} \big | {{\,\mathrm{Tr}\,}}( \beta (\lambda ) \mu (\mathrm{d}\xi ))\\&\quad \le \int \big | \frac{\sup _{t \ge 0} \exp (-\omega t) |\varrho (\psi _t(\lambda + \mathcal {S}^*_{\varepsilon } \nu \xi +\xi \mathcal {S}^*_{\varepsilon } \nu )) - \varrho (\psi _t (\lambda ))|}{\tilde{\varrho }(\lambda )} \big | {{\,\mathrm{Tr}\,}}( \beta (\lambda ) \mu (\mathrm{d}\xi ))\\&\quad = \int \big | \frac{ \sup _{t \ge 0} \exp (-\omega t) ( 2\Vert \psi _t( \lambda ) \Vert _{Y^*} \; \Vert \psi _t( \mathcal {S}^*_{\varepsilon } \nu \xi +\xi \mathcal {S}^*_{\varepsilon } \nu )\Vert _{Y^*} + {\Vert \psi _t (\mathcal {S}^*_{\varepsilon } \nu \xi +\xi \mathcal {S}^*_{\varepsilon } \nu ) \Vert }_{Y^*}^2 )}{\varrho ( \lambda )} \big |\\&\quad \quad \times {{\,\mathrm{Tr}\,}}( \beta (\lambda ) \mu (\mathrm{d}\xi )) \le \widetilde{\omega } \, . \end{aligned}$$
The last inequality holds by the linearity of \(\psi \) and the second-moment condition on \(\mu \). Proposition 3.4 now allows to conclude that \(A+B\), where B is given by
$$\begin{aligned} Bf(\lambda )= \int (f(\lambda + \mathcal {S}^*_{\varepsilon } \nu \xi +\xi \mathcal {S}^*_{\varepsilon } \nu ) -f(\lambda )) {{\,\mathrm{Tr}\,}}(\beta (\lambda )\mu (\mathrm{d}\xi )), \end{aligned}$$
generates a generalized Feller semigroup \(\widetilde{P}\) as asserted.
For (ii), we now construct the probabilistically weak and analytically mild solution directly from the properties of the generalized Feller process: take \( y \in \mathcal {D}\) where \(\mathcal {D}\) is defined in (4.11) and consider the \(\mathbb {S}^d\)-valued martingale
$$\begin{aligned} M^y_t:= & {} \langle \langle y, \lambda _t \rangle \rangle - \langle \langle y , \lambda _0 \rangle \rangle - \int _0^t \langle \langle \mathcal {A} y, \lambda _s \rangle \rangle + \langle \langle y, \nu \beta (\lambda _s)+ \beta (\lambda _s) \nu \rangle \rangle \mathrm{d}s \nonumber \\&- \int _0^t \int \langle \langle y, \mathcal {S}^*_\varepsilon \nu \xi + \xi \mathcal {S}^*_{\varepsilon } \nu \rangle \rangle {{\,\mathrm{Tr}\,}}(\beta ( \lambda _s )\mu (\mathrm{d} \xi ) )\mathrm{d}s \end{aligned}$$
for \( t \ge 0 \) (after an appropriate and possible regularization according to Theorem 2.13).
Let now y be as above with the additional property that \( \langle \langle y , \mathcal {S}^*_{\varepsilon }\nu \xi +\xi \mathcal {S}^*_{\varepsilon } \nu \rangle \rangle = \pi \xi + \xi \pi \) for all \(\xi \in \mathbb {S}^d_+\) and some fixed \( \pi \in \mathbb {S}^d_+\). For such y, define
$$\begin{aligned} N^\pi _t = \pi N_t + N_t \pi := M^y_t + \int _0^t \int \langle \langle y, \mathcal {S}^*_\varepsilon \nu \xi +\xi \mathcal {S}^*_{\varepsilon } \nu \rangle \rangle {{\,\mathrm{Tr}\,}}(\beta (\lambda _s) \mu (\mathrm{d} \xi )) \mathrm{d}s \end{aligned}$$
for \( t \ge 0 \), which is a càglàd semimartingale. Notice that the left-hand side only defines \( N^\pi \) and not the more suggestive \( \pi N + N \pi \). Then, \(N^\pi \) does not depend on y by construction. Indeed, for all \(y_i\) with \( \langle \langle y_i , \mathcal {S}^*_{\varepsilon }\nu \xi + \xi \mathcal {S}^*_{\varepsilon } \nu \rangle \rangle = \pi \xi + \xi \pi \) for all \(\xi \), \(i=1,2\), we clearly have
$$\begin{aligned} \int _0^t \int \langle \langle y_1-y_2, \mathcal {S}^*_\varepsilon \nu \xi + \xi \mathcal {S}^*_{\varepsilon } \nu \rangle \rangle {{\,\mathrm{Tr}\,}}( \beta (\lambda _s) \mu (\mathrm{d} \xi )) \mathrm{d}s =0 \end{aligned}$$
and \( M^{y_1} - M^{y_2} =M^{y_1-y_2}= 0 \) as well. The latter follows from the fact that the martingale \( M^y \) is constant if \( \langle \langle y , \mathcal {S}^*_\varepsilon \nu \xi + \xi \mathcal {S}^*_\varepsilon \nu \rangle \rangle = 0 \) for all \(\xi \), since its quadratic variation vanishes in this case.

Moreover, by the definition of \(N^\pi \) in (4.15), its compensator is given by \(\int _0^t \int (\pi \xi + \xi \pi ) {{\,\mathrm{Tr}\,}}(\beta (\lambda _s) \mu (\mathrm{d}\xi ))\mathrm{d}s\). Since it is sufficient to perform the previous construction for finitely many \( \pi \) to obtain all necessary projections, a process N can be defined such that \( N^\pi = \pi N + N \pi \), as suggested by the notation.

By (4.14) and the very definition of (4.15), we obtain that
$$\begin{aligned} \langle \langle y, \lambda _t \rangle \rangle&= \langle \langle y , \lambda _0 \rangle \rangle + \int _0^t \langle \langle \mathcal {A}y , \lambda _s \rangle \rangle \mathrm{d}s + \int _0^t \langle \langle y, \nu \beta (\lambda _s) + \beta (\lambda _s) \nu \rangle \rangle \mathrm{d}s \\&\quad + \langle \langle y , \mathcal {S}^*_\varepsilon \nu N_t\rangle \rangle + \langle \langle y ,N_t \mathcal {S}^*_\varepsilon \nu \rangle \rangle \\ \end{aligned}$$
for \( y \in \mathcal {D} \). This analytically weak form can be translated into a mild form by standard methods. Indeed, notice that the integral is just along a finite variation path, and therefore, we can readily apply variation of constants. The last assertion about the càglàd property is a consequence of Theorem 2.13 by noting that \(\varrho (\lambda )\) does not explode. This proves (ii).
Concerning (iii), note first that we have a unique mild solution to
$$\begin{aligned} \partial _t y_t = \mathcal {A}y_t + \beta _*\left( \int _0^{\infty } y(x) \nu (\mathrm{d}x) + \int _0^{\infty } \nu (\mathrm{d}x) y(x)\right) , \quad y_0 \in Y, \end{aligned}$$
since this is the adjoint equation of (4.4). For the equation with jumps, we proceed as in Proposition via Picard iteration. Denote the semigroup associated with (4.16) by \(\mathcal {S}^{\beta _*}\) and define
$$\begin{aligned} y_t^0&= y_0,\\ y_t^{n}&=\mathcal {S}^{\beta _*}_ty_0+\int _0^t \mathcal {S}^{\beta _*}_{t-s}\beta _*\left( \int _{\mathbb {S}^d_+} \left( \exp ( \langle y_s^{n-1} , \mathcal {S}^*_{\varepsilon } \nu \xi +\xi \mathcal {S}^*_{\varepsilon } \nu \rangle )-1 \right) \mu (\mathrm{d}\xi )\right) \mathrm{d}s. \end{aligned}$$
Moreover, for \(t \in [0,\delta ]\) for some \(\delta >0\), we have by local Lipschitz continuity of \(x\mapsto \exp (x)\)
$$\begin{aligned} \Vert y_t^{n+1}-y_t^n\Vert _{Y}&\le \Vert \int _0^t \mathcal {S}^{\beta _*}_{t-s}\beta _* \left( \int _{\mathbb {S}_+^d }( \exp ( \langle y^{n}_s , \mathcal {S}^*_{\varepsilon } \nu \xi \rangle )- \exp ( \langle y^{n-1}_s , \mathcal {S}^*_{\varepsilon } \nu \xi \rangle ) )\mu (\mathrm{d}\xi ) \right) \mathrm{d}s\Vert _{Y}\\&\le \int _0^t C \Vert \mathcal {S}^{\beta _*}_{t-s}\beta _*\Vert _{\text {op}} \Vert y_s^{n}-y_s^{n-1}\Vert _{Y} \left( \int _{\mathbb {S}_+^d} \Vert \mathcal {S}^*_{\varepsilon } \nu \xi \Vert _{Y^*}\mu (\mathrm{d}\xi ) \right) \mathrm{d}s. \end{aligned}$$
By an extension of Gronwall’s inequality, see Dalang (1999, Lemma 15), this yields convergence of \((y_t^n)_{n\in \mathbb {N}}\) with respect to \(\Vert \cdot \Vert _{Y}\) and hence the existence of a unique local mild solution to (4.13) up to some maximal life time \(t_+(y_0)\). That \(t_+(y_0)= \infty \) for all \(y_0 \in \mathcal {E}_*\) follows from the subsequent estimate
$$\begin{aligned} \Vert y_t\Vert _{Y}&= \Vert \mathcal {S}^{\beta _*}_t y_0 +\int _0^t \mathcal {S}^{\beta _*}_{t-s} \beta _* \left( \int _{\mathbb {S}_+^d} \left( \exp ( \langle y_s , \mathcal {S}^*_{\varepsilon } \nu \xi + \xi \mathcal {S}^*_{\varepsilon } \nu \rangle )-1 \right) \mu (\mathrm{d}\xi ) \right) \mathrm{d}s \Vert _{Y}\\&\le \Vert \mathcal {S}^{\beta _*}_t y_0 \Vert _Y + \int _0^t \Vert S^{\beta _*}_{t-s} \beta _*\Vert _{\text {op}} \left( \int _{\mathbb {S}_+^d} |\exp (\langle y_s , \mathcal {S}^*_{\varepsilon } \nu \xi +\xi \mathcal {S}^*_{\varepsilon } \nu \rangle )-1| \mu (\mathrm{d}x) \right) \mathrm{d}s\\&\le \Vert \mathcal {S}^{\beta _*}_t y_0 \Vert _Y + t \sup _{s \le t}\Vert \mathcal {S}^{\beta _*}_{s} \beta _*\Vert _{\text {op}} \mu (\mathbb {S}_+^d), \end{aligned}$$
where we used \(|\exp (\langle y , \mathcal {S}^*_{\varepsilon } \nu \xi + \xi \mathcal {S}^*_{\varepsilon } \nu \rangle )-1| \le 1\) for all \(y \in \mathcal {E}_*\) in the last estimate.
To prove (iv), just note that by the existence of a generalized Feller semigroup, the abstract Cauchy problem for the initial value \( \exp (\langle y_0,.\rangle ) \) can be solved uniquely for \( y_0 \in \mathcal {E}_* \). Indeed, \(\mathbb {E}_{\lambda } [\exp (\langle y_0,\lambda _t\rangle )]\) uniquely solves
$$\begin{aligned} \partial _t u(t,\lambda )= A u(t, \lambda ), \quad u(0, \lambda ) = \exp (\langle y_0, \lambda \rangle ), \end{aligned}$$
where A denotes the generator associated with (4.12). Setting \(u(t,\lambda )=\exp (\langle y_t, \lambda \rangle )\), we have
$$\begin{aligned} \partial _t u(t,\lambda )=\exp (\langle y_t, \lambda \rangle ) R(y_t), \end{aligned}$$
where the right-hand side is nothing else than \(A\exp (\langle y_t, \lambda \rangle )\); hence, the affine transform formula holds true. This also implies that \(y_t \in \mathcal {E}_*\) for all \(t \ge 0\), simply because \(\mathbb {E}_{\lambda } [\exp (\langle y_0,\lambda _t\rangle )] \le 1\) for all \(\lambda \in \mathcal {E}\). \(\square \)
We are now ready to state the main theorem of this section, namely an existence and uniqueness result for equations of the type
$$\begin{aligned} \mathrm{d} \lambda _t = \mathcal {A}^* \lambda _t \mathrm{d}t + \nu \mathrm{d}X_t + \mathrm{d}X_t \nu , \end{aligned}$$
where \((X_t)_{t \ge 0}\) is a \(\mathbb {S}^d_+\)-valued pure jump Itô semimartingale of the form
$$\begin{aligned} X_t = \int _0^t \beta (\lambda _s)\mathrm{d}s+ \int _0^t \int _{\mathbb {S}^d_+} \xi \mu ^X(\mathrm{d}\xi ,\mathrm{d}s) , \end{aligned}$$
with \(\beta \) specified in (4.5) satisfying Assumption 4.9 and random measure of the jumps \(\mu ^X\). Its compensator satisfies the following condition:

Assumption 4.12

The compensator of \(\mu ^X\) is given by
$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \beta (\lambda _t)\frac{\mu (\mathrm{d}\xi )}{\Vert \xi \Vert \wedge 1}\right) \end{aligned}$$
where \(\mu \) is a \(\mathbb {S}^d_+\)-valued finite measure on \(\mathbb {S}^d_+\) satisfying \(\int _{\Vert \xi \Vert \ge 1}\Vert \xi \Vert ^2 \Vert \mu (\mathrm{d}\xi )\Vert < \infty \).
For the formulation of the subsequent theorem, we shall need the following set of Fourier basis elements
$$\begin{aligned} \mathcal {D}=\{ f_y: \mathcal {E} \rightarrow [0,1]; \lambda \mapsto \exp ( \langle y , \lambda \rangle ) \, | \, y \in \mathcal {E}_* \cap {\text {dom}}(\mathcal {A}) \text { s.t.~} \langle \langle y, \nu \rangle \rangle \text { is well defined}\}. \end{aligned}$$

Theorem 4.13

Let Assumptions 4.3, 4.9 and 4.12 be in force.
  1. (i)
    Then, the stochastic partial differential equation (4.17) admits a unique Markovian solution \((\lambda _t)_{t\ge 0} \) in \( \mathcal {E} \) given by a generalized Feller semigroup on \( \mathcal {B}^\varrho (\mathcal {E}) \) whose generator takes on the set of Fourier elements
    $$\begin{aligned} f_y: \mathcal {E} \rightarrow [0,1]; \lambda \mapsto \exp ( \langle y , \lambda \rangle ) \end{aligned}$$
    for \(y \in \mathcal {D} \cap \mathcal {E}_*\) where \(\mathcal {D}\) is defined in (4.11) the form
    $$\begin{aligned} Af_y(\lambda )=f_y(\lambda ) (\langle \mathcal {A}y, \lambda \rangle + \langle \mathcal {R}(\langle \langle y, \nu \rangle \rangle ), \lambda \rangle ), \end{aligned}$$
    with \(\mathcal {R}: \mathbb {S}^d_- \rightarrow Y\) given by
    $$\begin{aligned} \mathcal {R}(u)= & {} \beta _*(u)+ \beta _*\left( \int _{\mathbb {S}^d_+} \left( \exp ( {{\,\mathrm{Tr}\,}}(u \xi ) -1 \right) \frac{\mu (\mathrm{d}\xi )}{\Vert \xi \Vert \wedge 1} \right) . \end{aligned}$$
  2. (ii)
    This generalized Feller process is also a probabilistically weak and analytically mild solution of (4.17), i.e.,
    $$\begin{aligned} \lambda _t = \mathcal {S}^*_t \lambda _0 \mathrm{d}s +\int _0^t \mathcal {S}^*_{t-s}\nu \mathrm{d}X_s + \int _0^t \mathrm{d}X_s \mathcal {S}_{t-s}^*\nu , \end{aligned}$$
    This justifies Eq. (4.17); in particular, for every initial value, the process X can be constructed on an appropriate probabilistic basis. The stochastic integral is defined in a pathwise way along finite variation paths. Moreover, for every family \((f_n)_n \in \mathcal {B}^{\varrho }(\mathcal {E})\), \(t \mapsto f_n(\lambda _t)\) can be chosen to be càg for all n.
  3. (iii)
    The affine transform formula is satisfied, i.e.,
    $$\begin{aligned} \mathbb {E}_{\lambda _0}\left[ \exp ( \langle y_0, \lambda _t \rangle )\right] =\exp (\langle y_t, \lambda _0 \rangle ), \end{aligned}$$
    where \(y_t\) solves \(\partial _t y_t=R(y_t)\) for all \(y_0 \in \mathcal {E}_*\) and \(t >0\) in the mild sense with \(R: \mathcal {D} \cap \mathcal {E}_* \rightarrow Y\) given by
    $$\begin{aligned} R(y) = \mathcal {A} y + \mathcal {R}( \langle \langle y , \nu \rangle \rangle ) \end{aligned}$$
    with \(\mathcal {R}\) defined in (4.21). Furthermore, \(y_t \in \mathcal {E}_*\) for all \(t \ge 0\).
  4. (iv)
    For all \(\lambda _0 \in \mathcal {E}\), the corresponding stochastic Volterra equation, \(V_t:= \beta (\lambda _t )\), given by
    $$\begin{aligned} V_t = \beta ( \lambda _t )= & {} \beta (\mathcal {S}_t^* \lambda _0) + \int _0^t\beta (\mathcal {S}_{t-s}^*\nu ) \mathrm{d}X_s + \int _0^t \mathrm{d}X_s \beta (\mathcal {S}_{t-s}^*\nu )\nonumber \\= & {} h(t) + \int _0^t K(t-s)\mathrm{d}X_s +\int _0^t \mathrm{d}X_s K(t-s) \end{aligned}$$
    admits a probabilistically weak solution with càg trajectories. Here, \(h(t):=\beta (\mathcal {S}^*_t \lambda _0)\).
  5. (v)
    The Laplace transform of the Volterra equation \(V_t\) is given by
    $$\begin{aligned} \mathbb {E}_{\lambda _0}\left[ \exp \left( {{\,\mathrm{Tr}\,}}(u V_t)\right) \right] =\exp \left( {{\,\mathrm{Tr}\,}}(u h(t))+ \int _0^t {{\,\mathrm{Tr}\,}}(\mathfrak {R}(\psi _s) h(t-s) ) \mathrm{d}s\right) , \end{aligned}$$
    where \(h(t)=\beta (\mathcal {S}_t^*\lambda _0 )\), \(\mathfrak {R}: \mathbb {S}^d_- \rightarrow \mathbb {S}^d_-,\, u \mapsto \mathfrak {R}(u)= u + \int _{\mathbb {S}_+^d} (e^{{{\,\mathrm{Tr}\,}}(u\xi )}-1 ) \frac{\mu (\mathrm{d}\xi )}{\Vert \xi \Vert \wedge 1}\) and \(\psi _t\) solves the matrix Riccati–Volterra equation
    $$\begin{aligned} \psi _t=u K(t)+\int \mathfrak {R}(\psi _s) K(t-s) \mathrm{d}s, \quad t >0. \end{aligned}$$
    Hence, the solution of the stochastic Volterra equation in (4.23) is unique in law.

Remark 4.14

One essential point here is that we loose the càglàd property as stated in Proposition 4.11 (ii) when we let \( \varepsilon \) of \(\mathcal {S}_{\varepsilon }\) tend to zero. As long as the kernel K has a singularity at \( t = 0 \), it is impossible to preserve finite growth bounds with\( M = 1 \), as \( \varepsilon \rightarrow 0 \), but we get càg versions (compare with the second conclusion in Theorem 2.13 and Remark 2.14).

Remark 4.15

Note that for \(\beta \) as of Example 4.4, the above equations simplify considerably. In particular, \(\beta _*\) in (4.21) is simply the identity.


We apply Theorem 3.2 and consider a sequence of generalized Feller semigroups \((P^n)_{n \in \mathbb {N}}\) with generators \(A^n\) corresponding to the solution \(\lambda ^n\) of (4.12) for \(\varepsilon =\frac{1}{n}\), and compensator
$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \beta (\lambda ^n_t)\frac{1_{\{\Vert \xi \Vert > \frac{1}{n}\}} \mu (\mathrm{d}\xi )}{\Vert \xi \Vert \wedge 1}\right) , \quad n \in \mathbb {N}. \end{aligned}$$
Let us first establish a uniform growth bound for this sequence. To this end, denote
$$\begin{aligned} F^n(\mathrm{d}\xi ):=\frac{1_{\{\Vert \xi \Vert > \frac{1}{n}\}} \mu (\mathrm{d}\xi )}{\Vert \xi \Vert \wedge 1}. \end{aligned}$$
Note that for the solution of (4.12), we have due to Proposition 4.11 (ii) the following estimate for \(t \in [0,T]\) for some fixed \(T>0\)
$$\begin{aligned} \mathbb {E} [\Vert \lambda ^{n}_t\Vert ^2_{Y^*}]&\le 5\Vert \mathcal {S}^*_t \lambda _0\Vert ^2_{Y^*}+ 10t\int _0^t \Vert \mathcal {S}^*_{t-s}\nu \Vert ^2_{Y^*} \Vert \beta \Vert ^2_{\text {op}} \mathbb {E}[\Vert \lambda ^{n}_s\Vert ^2_{Y^*}]\mathrm{d}s \\&\quad + 10\mathbb {E} \left[ \left\| \int _0^t \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \mathrm{d}N_s -\int _0^t \int \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \xi {{\,\mathrm{Tr}\,}}(\beta ( \lambda ^{n}_s ) F^n(\mathrm{d}\xi ) )\mathrm{d}s \right\| _{Y^*}^2 \right] \\&\quad + 10\mathbb {E} \left[ \left\| \int _0^t \mathrm{d}N_s\mathcal {S}^*_{t-s+\frac{1}{n}} \nu -\int _0^t \int \xi \mathcal {S}^*_{t-s+\frac{1}{n}} \nu {{\,\mathrm{Tr}\,}}(\beta ( \lambda ^{n}_s ) F^n(\mathrm{d}\xi ) )\mathrm{d}s) \right\| _{Y^*}^2\right] \\&\quad + 10\mathbb {E} \left[ \left\| \int _0^t \int \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \xi {{\,\mathrm{Tr}\,}}(\beta ( \lambda ^{n}_s ) F^n(\mathrm{d}\xi ) ) \mathrm{d}s \right\| _{Y^*}^2\right] \\&\quad + 10\mathbb {E} \left[ \left\| \int _0^t \int \xi \mathcal {S}^*_{t-s+\frac{1}{n}} \nu {{\,\mathrm{Tr}\,}}(\beta ( \lambda ^{n}_s ) F^n(\mathrm{d}\xi ) ) \mathrm{d}s \right\| _{Y^*}^2\right] . \end{aligned}$$
As a consequence of Itô’s isometry, the martingale part can be estimated by
$$\begin{aligned}&\mathbb {E} \left[ \left\| \int _0^t \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \mathrm{d}N_s -\int _0^t \int \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \xi {{\,\mathrm{Tr}\,}}(\beta ( \lambda ^{n}_s ) F^n(\mathrm{d}\xi ) )\mathrm{d}s \right\| _{Y^*}^2\right] \\&\quad \le \mathbb {E}\left[ \Vert \int _0^t \int \Vert \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \Vert _{Y^*}^2 \Vert \xi \Vert ^2 {{\,\mathrm{Tr}\,}}(\beta ( \lambda ^{n}_s ) F^n(\mathrm{d}\xi ) )\mathrm{d}s\right] \\&\quad \le \int \Vert \xi \Vert ^2 \Vert F^n(\mathrm{d}\xi )\Vert \int _0^t \Vert \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \Vert _{Y^*}^2 \Vert \beta \Vert _{\text {op}} \mathbb {E} [ \Vert \lambda ^{n}_s \Vert _{Y^*}] \mathrm{d}s\\&\quad \le \left( \int _{\Vert \xi \Vert \le 1}\Vert \mu (\mathrm{d}\xi )\Vert + \int _{\Vert \xi \Vert > 1} \Vert \xi \Vert ^2 \Vert \mu (\mathrm{d}\xi )\Vert \right) \int _0^t \Vert \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \Vert _{Y^*}^2 \Vert \beta \Vert _{\text {op}} \mathbb {E} [ \Vert \lambda ^{n}_s \Vert _{Y^*}] \mathrm{d}s\\&\quad \le \widetilde{C} \int _0^t \Vert \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \Vert _{Y^*}^2 \Vert \beta \Vert _{\text {op}} \mathbb {E} [ \Vert \lambda ^{n}_s \Vert _{Y^*}] \mathrm{d}s\\&\quad \le \widetilde{C} K \int _0^t \Vert \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \Vert _{Y^*}^2 (1 +\Vert \beta \Vert ^2_{\text {op}} \mathbb {E} [ \Vert \lambda ^{n}_s \Vert ^2_{Y^*}] ) \mathrm{d}s \end{aligned}$$
where \(\widetilde{C}= \left( \int _{\Vert \xi \Vert \le 1}\Vert \mu (\mathrm{d}\xi )\Vert + \int _{\Vert \xi \Vert > 1} \Vert \xi \Vert ^2 \Vert \mu (\mathrm{d}\xi )\Vert \right) \) and K some other constant. Moreover, for the last terms, we have
$$\begin{aligned}&\mathbb {E}\left[ \left\| \int _0^t \int \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \xi {{\,\mathrm{Tr}\,}}(\beta ( \lambda ^{n}_s ) F^n(\mathrm{d}\xi ) ) \mathrm{d}s \right\| _{Y^*}^2 \right] \\&\quad \le t \int _0^t \Vert \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \Vert _{Y^*}^2 \mathbb {E}\left[ \left\| \int \xi {{\,\mathrm{Tr}\,}}(\beta ( \lambda ^{n}_s ) F^n(\mathrm{d}\xi ) ) \right\| ^2\right] \mathrm{d}s \\&\quad \le 2t \int _0^t \Vert \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \Vert _{Y^*}^2 \mathbb {E}\left[ \left\| \int _{\Vert \xi \Vert \le 1} \xi {{\,\mathrm{Tr}\,}}(\beta ( \lambda ^{n}_s ) F^n(\mathrm{d}\xi ) ) \right\| ^2\right. \\&\qquad \left. +\left\| \int _{\Vert \xi \Vert \ge 1} \xi {{\,\mathrm{Tr}\,}}(\beta ( \lambda ^{n}_s ) F^n(\mathrm{d}\xi ) ) \right\| ^2\right] \mathrm{d}s \\&\quad \le 2t \int _0^t \Vert \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \Vert _{Y^*}^2 \Vert \beta \Vert ^2_{\text {op}}\mathbb {E} [ \Vert \lambda ^{n}_s\Vert ^2_{Y^*}] \int \Vert \mu (\mathrm{d}\xi )\Vert \\&\qquad \times \left( \int _{\Vert \xi \Vert \le 1}\Vert \mu (\mathrm{d}\xi )\Vert + \int _{\Vert \xi \Vert > 1} \Vert \xi \Vert ^2 \Vert \mu (\mathrm{d}\xi )\Vert \right) \\&\quad \le 2t {\widehat{C}} \int _0^t \Vert \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \Vert _{Y^*}^2 \Vert \beta \Vert ^2_{\text {op}}\mathbb {E}[ \Vert \lambda ^{n}_s\Vert ^2_{Y^*}] \end{aligned}$$
where \({\widehat{C}}= \int \Vert \mu (\mathrm{d}\xi )\Vert \widetilde{C}\). Putting this together, we obtain
$$\begin{aligned} \mathbb {E}[\Vert \lambda ^{n}_t\Vert ^2_{Y^*}]&\le C_0 \Vert \lambda _0\Vert ^2_{Y^*} + 10t\int _0^t \Vert \mathcal {S}^*_{t-s}\nu \Vert ^2_{Y^*} \Vert \beta \Vert ^2_{\text {op}} \mathbb {E}[\Vert \lambda ^{n}_s\Vert ^2_{Y^*}]\mathrm{d}s\\&\quad +20 \widetilde{C}K \int _0^t \Vert \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \Vert _{Y^*}^2 \mathrm{d}s\\&\quad + 20 (\widetilde{C}K + 2t {\widehat{C}})\int _0^t \Vert \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \Vert _{Y^*}^2 \Vert \beta \Vert ^2_{\text {op}}\mathbb {E}[ \Vert \lambda ^{n}_s\Vert ^2_{Y^*}] \\&\le C_0 \Vert \lambda _0\Vert ^2_{Y^*} + C_1\int _0^t \Vert \mathcal {S}^*_{t-s} \nu \Vert _{Y^*}^2 \mathrm{d}s + C_2 \int _0^t \Vert \mathcal {S}^*_{t-s} \nu \Vert ^2_{Y^*}\mathbb {E}[ \Vert \lambda ^{n}_s\Vert ^2_{Y^*}] \mathrm{d}s \end{aligned}$$
where \(C_0\) and \(C_2\) depend on T. We use \(\Vert S_t^*\lambda _0\Vert ^2 \le C_0 \Vert \lambda _0\Vert ^2\) for \(t \in [0,T]\), as well as \( \Vert \mathcal {S}^*_{t-s+\frac{1}{n}} \nu \Vert _{Y^*} \le C \Vert \mathcal {S}^*_{t-s} \nu \Vert _{Y^*}\) for some constant C and all \(n \in \mathbb {N}\) due to strong continuity. Exactly by the same arguments as in the proof of Proposition , we thus obtain for \(t \in [0,T]\) for some fixed T
$$\begin{aligned} \mathbb {E}[\Vert \lambda _t\Vert ^2_{Y^*}] \le \widetilde{C}(\Vert \lambda _0\Vert ^2_{Y^*}+1) \left( 1-\int _0^t R'(s), \mathrm{d}s\right) , \end{aligned}$$
where \(R'\) denotes the resolvent of \(-C_2\Vert \mathcal {S}^*_{t-s}\nu \Vert _{Y^*} \). Hence, \(\mathbb {E}[\varrho (\lambda _t)]\le C \varrho (\lambda _0)\) for \(t \in [0,T]\). From this, the desired uniform growth bound \(\Vert P_t\Vert _{L(\mathcal {B}^{\varrho }(\mathcal {E}))} \le M \exp (\omega t)\) for some \(M \ge 1 \) and \(\omega \in \mathbb {R}\) follows.
For the set D as of Theorem 3.2, we here choose Fourier basis elements of the form
$$\begin{aligned} f_y: \mathcal {E} \rightarrow [0,1]; \lambda \mapsto \exp ( \langle y , \lambda \rangle ) \end{aligned}$$
such that \(y \in \mathcal {E}_*\) and \(\lambda \mapsto \exp ( \langle y , \lambda \rangle )\) lies in \(\cap _{n \ge 1} {\text {dom}}(A^n)\), whose span is dense, whence (i) of Theorem 3.2. Here, \(A^n\) denotes the generator corresponding to (4.12) with \(\varepsilon =\frac{1}{n}\) and \(\mu \) replaced by \(F^n\). We now equip \({\text {span}}(D)\) with the uniform norm \(\Vert \cdot \Vert _{\infty }\) and verify Condition (ii), i.e., we check
$$\begin{aligned} \Vert A^n P^m_u f_y -A^mP^m_u f_y\Vert _{\varrho } \le \Vert f_y\Vert _{\infty }a_{nm} \end{aligned}$$
for all \(0 \le u \le t\) with \(a_{nm} \rightarrow 0\) as \(n,m \rightarrow \infty \), and possibly depending on y. Note that
$$\begin{aligned} A^nf_y(\lambda )=\langle R^n(y), \lambda \rangle f_y(\lambda ), \end{aligned}$$
where \(R^n\) corresponds to (4.13) for \(\varepsilon =\frac{1}{n}\) and \(\mu \) replaced by \(F^n\). As \(P^n\) leaves D invariant for all \(n \in \mathbb {N}\) by Proposition 4.11 (iv), we have
$$\begin{aligned}&\frac{|A^n P^m_u f_y(\lambda ) -A^m P^m_u f_y(\lambda )|}{\varrho (\lambda )}\\&\quad \le \frac{f_{y^m_u}(\lambda )}{\varrho (\lambda )}\Bigg (\beta _{*}\Big (\int _{\mathbb {S}^d_+} \underbrace{\exp (\langle y^m_u, \mathcal {S}^*_{\frac{1}{m}} \nu \xi + \xi \mathcal {S}^*_{\frac{1}{m}} \nu \rangle )1_{\{\Vert \xi \Vert \ge \frac{1}{n}\}}}_{:=b_{nm}(\xi )}\\&\qquad \times \underbrace{|\exp (\langle y^m_u, (\mathcal {S}^*_{\frac{1}{n}}\nu -\mathcal {S}^*_{\frac{1}{m}} \nu ) \xi + \xi (\mathcal {S}^*_{\frac{1}{n}}\nu -\mathcal {S}^*_{\frac{1}{m}} \nu ) \rangle ) -1| }_{\widetilde{a}_{nm}^1(\xi )}\frac{\mu (\mathrm{d}\xi )}{\Vert \xi \Vert \wedge 1}\Big )\\&\quad \quad +\beta _*\Big ( \underbrace{\int _{\mathbb {S}^d_+} \exp (\langle y^m_u, \mathcal {S}^*_{\frac{1}{m}} \nu \xi + \xi \mathcal {S}^*_{\frac{1}{m}} \nu \rangle )-1) | 1_{\{\Vert \xi \Vert \ge \frac{1}{n}\}}- 1_{\{\Vert \xi \Vert \ge \frac{1}{m}\}}|\frac{\mu (\mathrm{d}\xi )}{\Vert \xi \Vert \wedge 1}\Big )}_{\widetilde{a}^2_{nm}}\Bigg ). \end{aligned}$$
Here, \(y^m_u\) denotes the solution of \(\partial _t y^m_u=R^m(y^m_t)\) at time u with \(y_0=y\). Moreover, \(\widetilde{a}^1_{nm}(\xi )\) and \(\widetilde{a}^2_{nm}\) can be chosen uniformly for all \(u \le t\) and tend to 0 as \(n,m \rightarrow \infty \). This is possible since for the chosen initial values y we obtain that \(y^m_u\) is bounded on compact intervals in time uniformly in m (see Cuchiero and Teichmann 2018 for details). This together with dominated convergence for the first term (note that \(b_{nm}(\xi ) \widetilde{a}^1_{nm}(\xi ) \) can be bounded by \(\Vert \xi \Vert \wedge 1\)) we thus infer (4.26). The conditions of Theorem 3.2 are therefore satisfied, and we obtain a generalized Feller semigroup whose generator is given by (4.20).

For the second assertion, we proceed as in the proof of Proposition  4.11, the proof of the existence of X can be transferred verbatim. However, one looses the existence of càglàd paths of \(f_n(\lambda )\) due to the possible lack of finite mass of \( \nu \). Here, we only obtain càg trajectories (compare with Remarks 2.14 and 4.14).

Concerning the third assertion, the affine transform formula follows simply from the convergence of the semigroups \(P^n\) as asserted in Theorem 3.2 by setting \(y_t= \lim _{n \rightarrow \infty } y_t^{n}\), where \(y_t^{n}\) solves \(\partial _t y_t^{n}=R^{n}(y_t^{n} )\) in the mild sense with \(R^{n}\) given again by (4.13) with \(\varepsilon =\frac{1}{n}\) and \(\mu \) replaced by \(F^n\). Since \(\exp (\langle y_t, \lambda \rangle )\) is then also the unique solution of the abstract Cauchy problem for initial value \( \exp (\langle y_0,\lambda \rangle ) \), i.e., it solves
$$\begin{aligned} \partial _t u(t,\lambda )= A u(t, \lambda ), \quad u(0, \lambda ) = \exp (\langle y_0, \lambda \rangle ), \end{aligned}$$
where A denotes the generator (4.20), we infer that \(y_t\) satisfies \(\partial _t y_t=R(y_t)\) with R given by (4.22). This is because \(A\exp (\langle y_t, \lambda \rangle )=\exp (\langle y_t, \lambda \rangle )R(y_t)\).

The fourth claim follows from statement (ii), property (4.5) and the definition of K in (4.6).

Finally to prove (v), note that due to (iv) and the definition of the adjoint operator \(\beta _*\), we have
$$\begin{aligned} {{\,\mathrm{Tr}\,}}(u V_t)= {{\,\mathrm{Tr}\,}}(u \beta (\lambda _t))= \langle \beta _*(u), \lambda _t\rangle . \end{aligned}$$
Statement (iii) therefore implies that
$$\begin{aligned} \mathbb {E}[e^{{{\,\mathrm{Tr}\,}}(u V_t)}]= e^{\langle y_t, \lambda _0\rangle }, \end{aligned}$$
where the mild solution of \(y_t\) can be expressed by
$$\begin{aligned} y_t= \mathcal {S}_t \beta _*(u) + \int _0^t \mathcal {S}_{t-s} \mathcal {R}(\langle \langle y_s, \nu \rangle \rangle )\mathrm{d}s. \end{aligned}$$
Hence, by definition of \(\mathcal {R}\), \(\mathfrak {R}\) and h, we find
$$\begin{aligned} \langle y_t, \lambda _0\rangle= & {} \langle \mathcal {S}_t \beta _*(u) + \int _0^t \mathcal {S}_{t-s} \mathcal {R}(\langle \langle y_s, \nu \rangle \rangle )\mathrm{d}s, \lambda _0\rangle \nonumber \\= & {} {{\,\mathrm{Tr}\,}}(u\beta (\mathcal {S}^*_t\lambda _0))+ \int _0^t {{\,\mathrm{Tr}\,}}(\mathfrak {R}(\langle \langle y_s, \nu \rangle \rangle )\beta (\mathcal {S}^*_{t-s}\lambda _0)) \mathrm{d}s \nonumber \\= & {} {{\,\mathrm{Tr}\,}}(u h(t))+ \int _0^t {{\,\mathrm{Tr}\,}}(\mathfrak {R}(\langle \langle y_s, \nu \rangle \rangle )h(t-s))\mathrm{d}s \end{aligned}$$
From this and (4.27), it is easily seen that we can replace \(\langle \langle y_s, \nu \rangle \rangle \) in (4.28) by a solution of the following Volterra–Riccati equation
$$\begin{aligned} \psi _t= u K(t)+ \int _0^t \mathfrak {R}(\psi _s)K(t-s). \end{aligned}$$
Note that we do not need to symmetrize here since we apply the trace and h is symmetric. This proves the assertion.\(\square \)

The following example illustrates how a multivariate Hawkes process can easily be defined by means of (4.18).

Example 4.16

Let \(\beta \) and \(\mathcal {S}^*\) be as of Example 4.4. Define \(\mu _{ii}(\mathrm{d}\xi )=\delta _{e_{ii}}(\mathrm{d}\xi )\) and \(\mu _{ij}=0\) for \(i\ne j\). Then, the Volterra equation as of (4.23) is given by
$$\begin{aligned} V_t= & {} \int _0^{\infty }e^{-xt}\lambda _0(\mathrm{d}x) + \int _0^t (K(t-s)V_s + V_s K(t-s)) \mathrm{d}s \\&+\int _0^t K(t-s) \mathrm{d}N_s + \int _0^t \mathrm{d}N_s K(t-s). \end{aligned}$$
Only the diagonal components of the matrix-valued process N jump, and we can define \(\widehat{N}:= {\text {diag}}(N)\) which is a process with values in \(\mathbb {N}_0^d\). Its components jump by one, and the compensator of \(N_{ii}=\widehat{N}_i \) is given by \(\int _0^{\cdot }V_{s,ii} \mathrm{d}s\), which justifies the name multivariate Hawkes process. Note that the components of V are not independent if \(\nu \) and in turn K are not diagonal.

5 Squares of matrix-valued Volterra OU processes

As in the finite dimensional setting, squares of Gaussian processes provide us with important process classes for financial and statistical modeling. In this section, we outline this program in utmost generality from a stochastic and analytic point of view. In particular, we consider continuous affine Volterra-type processes on \(\mathbb {S}_+^d\), which we construct as squares of matrix-valued Volterra Ornstein–Uhlenbeck (OU) processes (see Remark 5.4). Following the finite dimensional analogon (Bru 1991), we start by considering matrix measure-valued OU processes of the form
$$\begin{aligned} \mathrm{d}\gamma _t(\mathrm{d}x)= \mathcal {A}^*\gamma _t(\mathrm{d}x)\mathrm{d}t+ \mathrm{d}W_t \nu (\mathrm{d}x), \quad \gamma _0 \in Y^*(\mathbb {R}^{n \times d}). \end{aligned}$$
The underlying Banach space, denoted by \(Y^*(\mathbb {R}^{n \times d})\), is the space of finite \(\mathbb {R}^{n \times d}\)-valued regular Borel measures on the extended half real line \(\overline{\mathbb {R}}_+:=\mathbb {R}_+ \cup \{\infty \}\). Together with
$$\begin{aligned} \varrho (\gamma ) = 1+ {\Vert \gamma \Vert }_{Y^*(\mathbb {R}^{n \times d})}^2, \quad \gamma \in Y^*(\mathbb {R}^{n \times d}), \end{aligned}$$
where \(\Vert \cdot \Vert _{Y^*(\mathbb {R}^{n \times d})}\) denotes the total variation norm, this becomes a weighted space. Moreover, \(\mathcal {A}^*\) is the generator of a strongly continuous semigroup \(\mathcal {S}^*\) on \(Y^*(\mathbb {R}^{n \times d})\), which satisfies a property analogous to (4.3), i.e., for elements \(A \in \mathbb {R}^{n \times d}\), and it holds that
$$\begin{aligned} \mathcal {S}^*_t(\gamma (\cdot ) A^{\top })= (\mathcal {S}^*_t\gamma (\cdot )) A^{\top } \quad \text { and } \quad \mathcal {S}^*_t(A\gamma ^{\top }(\cdot ) )= A(\mathcal {S}^*_t\gamma (\cdot ))^{\top }. \end{aligned}$$
The process W is a \(n\times d\) matrix of Brownian motions and \(\nu \in Y^*=:Y^*(\mathbb {S}^d)\) or \(Z^*\), as defined in Sect. 4 such that Assumption 4.3 holds true. The pre-dual space denoted by \(Y(\mathbb {R}^{n \times d})\) is given by \(C_{b}(\overline{\mathbb {R}}_+, \mathbb {R}^{n \times d})\) functions, where we fix the pairing \(\langle \cdot , \cdot \rangle \) as follows
$$\begin{aligned} \langle \cdot , \cdot \rangle : Y(\mathbb {R}^{n \times d}) \times Y^*(\mathbb {R}^{n \times d}) \rightarrow \mathbb {R}, \quad (y, \gamma ) \mapsto \langle y, \gamma \rangle ={{\,\mathrm{Tr}\,}}\left( \int _0^{\infty } y^{\top }(x) \gamma (\mathrm{d}x) \right) \, . \end{aligned}$$
Again \({{\,\mathrm{Tr}\,}}\) denotes the trace. We assume that all relevant properties from Assumption 4.1 are translated to the current setting.

Remark 5.1

Observe the analogy to the process \(\gamma \) defined in the introduction. If \(\mathcal {A}^*=0\) and \(\nu \) is supported on a finite space with k points, then (5.1) is exactly the process from the introduction.

Proposition 5.2

For every \( \gamma _0 \in Y^*(\mathbb {R}^{n \times d})\), the SPDE (5.1) has a solution given by a generalized Feller semigroup on \(\mathcal {B}^{\varrho }(Y^*(\mathbb {R}^{n \times d}))\) associated with the generator of (5.1). The mild formulation directly yields a stochastically strong solution
$$\begin{aligned} \gamma _t (\mathrm{d}x) = S_t^* \gamma _0 (\mathrm{d}x) + \int _0^t \mathrm{d}W_s S_{t-s}^* \nu (\mathrm{d}x) \, \end{aligned}$$
where order matters, i.e., the matrix Brownian increment is applied to \( S_{t-s}^* \nu (\mathrm{d}x) \) on the left. The integral is understood in the weak sense, i.e., after pairing with \( y \in Y(\mathbb {R}^{n \times d}) \).


The construction of the generalized Feller process can be done by jump approximation of the Brownian motion similarly as in Cuchiero and Teichmann (2018, Theorem 4.16). Notice here that we consider the process on the whole space \(Y^*(\mathbb {R}^{n \times d})\). So no issues with state space constraints occur.

The right-hand side of the stochastically strong formulation defines —after pairing with \( y \in Y(\mathbb {R}^{n \times d}) \)— almost surely a continuous linear functional with value
$$\begin{aligned} \langle y , S_t^* \gamma _0 \rangle + \int _0^t \langle y , \mathrm{d}W_s S_{t-s}^* \nu \rangle \, , \end{aligned}$$
since the integrand of the stochastic integral is deterministic and in \( L^2 \) for each \( t \ge 0 \).

\(\square \)

In order to define the actual process of interest, we need to introduce some further notations: For elements in \(\gamma \in Y^{*}(\mathbb {R}^{n \times d})\), we define
$$\begin{aligned} (\gamma \widehat{\otimes } \gamma )(\cdot , \cdot ):=\gamma ^{\top }(\cdot ) \gamma (\cdot ). \end{aligned}$$
The corresponding contracted, i.e., one matrix multiplication is performed, algebraic tensor product is denoted by \(Y^{*}(\mathbb {R}^{n\times d}) \widehat{\otimes }Y^{*}(\mathbb {R}^{n\times d})\), and we set
$$\begin{aligned} \widehat{\mathcal {E}}:=\big \{ \gamma \widehat{\otimes } \gamma \in Y^{*}(\mathbb {R}^{n\times d}) \widehat{\otimes }Y^{*}(\mathbb {R}^{n\times d}) \big \}. \end{aligned}$$
This corresponds to the space of finite \(\mathbb {S}_+^{d}\)-valued, rank n, product measures on \(\overline{\mathbb {R}}_+ \times \overline{\mathbb {R}}_+\). We shall introduce a particular dual topology on \( \widehat{\mathcal {E}} \), namely \( \sigma ( \widehat{\mathcal {E}},Y \otimes Y) \), where the corresponding pairing is given by
$$\begin{aligned} (y_1 \otimes y_2, \gamma _1\widehat{\otimes } \gamma _2)&\mapsto \langle y_1 \widehat{\otimes } y_2, \gamma _1 \widehat{\otimes } \gamma _2\rangle \\&={{\,\mathrm{Tr}\,}}\left( \int _0^{\infty } y_1^{\top }(x_1)y_2(x_2) \gamma _1^{\top }(\mathrm{d}x_1)\gamma _2(\mathrm{d}x_2) \right) \, . \end{aligned}$$
We denote the pre-dual cone by
$$\begin{aligned} -\widehat{\mathcal {E}}_*= \big \{ y \widehat{\otimes } y \in Y(\mathbb {R}^{n\times d}) \widehat{\otimes } Y(\mathbb {R}^{n\times d}) \big \} \, , \end{aligned}$$
where we use again the contracted algebraic tensor product corresponding to the following matrix multiplication of \(\mathbb {R}^{n\times d}\) valued functions
$$\begin{aligned} (y\widehat{\otimes } y)(\cdot ,\cdot )= y^{\top }(\cdot ) y(\cdot ),\quad y \in Y(\mathbb {R}^{n \times d}) \, . \end{aligned}$$
The minus on the left-hand side of (5.4) is to obtain elements in the polar cone.
Let us now define the actual process of interest, namely
$$\begin{aligned} \lambda _t(\mathrm{d}x_1, \mathrm{d}x_2):= \gamma _t^{\top }(\mathrm{d}x_1)\gamma _t(\mathrm{d}x_2)= \gamma _t(\mathrm{d}x_1) \widehat{\otimes } \gamma _t(\mathrm{d}x_2). \end{aligned}$$
Note again the analogy to the Wishart process \(\lambda \) defined in the introduction. The process (5.5) clearly takes values in \(\widehat{\mathcal {E}}\) as defined in (5.3). We will now show that we can define a Volterra-type process by considering projections on \(\mathbb {S}_+^d\). Applying Itô’s formula, we see that \(\lambda _t(\mathrm{d}x_1, \mathrm{d}x_2)\) satisfies the following equation
$$\begin{aligned} \mathrm{d}\lambda _t(\mathrm{d}x_1, \mathrm{d}x_2)= & {} \left( \mathcal {A}_1^*\lambda _t(\mathrm{d}x_1, \mathrm{d}x_2) + \mathcal {A}_2^*\lambda _t(\mathrm{d}x_1, \mathrm{d}x_2) + n \nu (\mathrm{d}x_1)\nu (\mathrm{d}x_2) \right) \mathrm{d}t \nonumber \\&+ \nu (\mathrm{d}x_1) \mathrm{d}W_t^{\top } \gamma _t(\mathrm{d}x_2) + \gamma _t(\mathrm{d}x_1)^{\top } \mathrm{d}W_t \nu (\mathrm{d}x_2), \end{aligned}$$
where \(\mathcal {A}^*_1 \lambda _t(\mathrm{d}x_1, \mathrm{d}x_2)=\mathcal {A}^*\lambda _t(\cdot , \mathrm{d}x_2)(\mathrm{d}x_1)\) and analogously for \(\mathcal {A}^*_2\). Note that for \(\mathcal {A}^* =0\), this is completely analogous to (1.3).
By a lot of abuse of notation, but parallel with Bru (1991) and Eqs. (1.4)–(1.5), we can also write
$$\begin{aligned} \mathrm{d}\lambda _t(\mathrm{d}x_1, \mathrm{d}x_2)= & {} \left( \mathcal {A}_1^*\lambda _t(\mathrm{d}x_1, \mathrm{d}x_2) + \mathcal {A}_2^*\lambda _t(\mathrm{d}x_1, \mathrm{d}x_2) + n \nu (\mathrm{d}x_1)\nu (\mathrm{d}x_2) \right) \mathrm{d}t \nonumber \\&+ \int _0^{\infty } \int _{0}^{\infty } \sqrt{\nu \widehat{\otimes } \nu }(\mathrm{d}x_1,\mathrm{d}x) \mathrm{d}B_t^\top (\mathrm{d}y,\mathrm{d}x) \sqrt{\lambda _t}(\mathrm{d}y,\mathrm{d}x_2)\nonumber \\&+ \int _0^{\infty } \int _0^{\infty } \sqrt{\lambda _t}(\mathrm{d}x_1,\mathrm{d}x) \mathrm{d}B_t(\mathrm{d}x,\mathrm{d}y) \sqrt{\nu \widehat{\otimes } \nu } (\mathrm{d}y,\mathrm{d}x_2), \end{aligned}$$
where heuristically \(B(\mathrm{d}x,\mathrm{d}y)\) is \(d \times d\) matrix of Brownian fields. We shall not develop a framework where this notation makes sense, but continue with proving that \( \lambda \) is actually a generalized Feller process, which should be considered the correct infinite dimensional version of a Wishart process.
By only a slight abuse of notation, we understand \(\mathcal {A}^*\), and in the sequel also \(\mathcal {S}^*\) and other linear operators, as operators acting on both \(\mathbb {S}^{d}\)-valued measures as well as \(\mathbb {R}^{d \times n}\)-valued or \(\mathbb {R}^{n \times d}\)-valued ones as in (5.1). The mild formulation of (5.6), denoting the semigroup generated by \(\mathcal {A}_1^*+ \mathcal {A}_2^*\) by \(\mathcal {S}_t^{*,\widehat{\otimes }}\), then reads as
$$\begin{aligned} \lambda _t(\mathrm{d}x_1, \mathrm{d}x_2)&= \mathcal {S}_t^{*,\widehat{\otimes }}\lambda _0(\mathrm{d}x_1, \mathrm{d}x_2) + n \int _0^t \mathcal {S}^{*,\widehat{\otimes }}_{t-s}\nu (\mathrm{d}x_1) \nu (\mathrm{d}x_2) \mathrm{d}s\\&\quad + \int _0^t \mathcal {S}^{*,\widehat{\otimes }}_{t-s}(\nu (\mathrm{d}x_1)\mathrm{d}W^{\top }_s \gamma _s(\mathrm{d}x_2) + \gamma _s(\mathrm{d}x_1)^{\top } \mathrm{d}W_t \nu (\mathrm{d}x_2))\\&=\mathcal {S}_t^{*, \widehat{\otimes }}\lambda _0(\mathrm{d}x_1, \mathrm{d}x_2) + n \int _0^t (\mathcal {S}^{*}_{t-s}\nu (\mathrm{d}x_1)) (\mathcal {S}^{*}_{t-s}\nu (\mathrm{d}x_2) )\mathrm{d}s\\&\quad + \int _0^t (\mathcal {S}^{*}_{t-s}\nu (\mathrm{d}x_1))\mathrm{d}W^{\top }_s (\mathcal {S}^{*}_{t-s} \gamma _s(\mathrm{d}x_2)) \\&\quad + \int _0^t ( \mathcal {S}^{*}_{t-s}\gamma _s(\mathrm{d}x_1))^{\top } \mathrm{d}W_s (\mathcal {S}^{*}_{t-s}\nu (\mathrm{d}x_2)) \, , \end{aligned}$$
where the second equality follows from property (5.2).
Let now \(\beta \) be a linear operator from \(Y^*(F)\) to F where F stands here for \(\mathbb {R}^{n\times d}\), or \(\mathbb {S}^d\) with the property that for a constant matrix A with appropriate matrix dimensions, we have
$$\begin{aligned} \beta ( A \gamma (\cdot ))=A\beta (\gamma (\cdot )),\quad \beta ( \gamma (\cdot )A)=\beta (\gamma (\cdot ))A. \end{aligned}$$
By means of \(\beta \), define now an operator \(\widehat{\beta }\) acting on \(\mathbb {R}^{d \times d}\) valued product measures as follows
$$\begin{aligned} \widehat{\beta }( \gamma _1^{\top }(\cdot ) \gamma _2(\cdot ))= \beta (\gamma _1 (\cdot ))^{\top } \beta (\gamma _2 (\cdot )), \end{aligned}$$
where \(\gamma _1\) and \(\gamma _2\) are either in \(Y^*(\mathbb {R}^{n\times d})\) or in \(Y^*(\mathbb {S}^d)\). In the latter case, the transpose is not needed. Note that (5.9) implies that \( \widehat{\beta }( \gamma ^{\top }(\cdot ) \gamma (\cdot ))\) is \(\mathbb {S}^d_+\)-valued. Applying \(\widehat{\beta }\) to \(\lambda \), we find
$$\begin{aligned} \widehat{\beta }(\lambda _t)&=\widehat{\beta }(\mathcal {S}_t^{*,\widehat{\otimes }}\lambda _0) + n \int _0^t \beta (\mathcal {S}^{*}_{t-s}\nu )\beta (\mathcal {S}^{*}_{t-s}\nu )\mathrm{d}s\\&\quad + \int _0^t \beta (\mathcal {S}^{*}_{t-s}\nu )\mathrm{d}W^{\top }_s \beta (\mathcal {S}^{*}_{t-s} \gamma _s) + \int _0^t \beta ( \mathcal {S}^{*}_{t-s}\gamma _s)^{\top } \mathrm{d}W_s\beta (\mathcal {S}^{*}_{t-s}\nu ). \end{aligned}$$
Defining as in Eq. (4.6) an \(\mathbb {S}^d\)-valued kernel via
$$\begin{aligned} K(t)=\beta (\mathcal {S}_t^*\nu ), \end{aligned}$$
we obtain the following generalized \(\mathbb {S}^d_+\)-valued Volterra equation
$$\begin{aligned} V_t:= & {} \widehat{\beta }(\lambda _t) =\widehat{\beta }(\mathcal {S}_t^{*,\widehat{\otimes }}\lambda _0) + n \int _0^t K(t-s) K(t-s)\mathrm{d}s \nonumber \\&+ \int _0^t K(t-s)\mathrm{d}W^{\top }_s \beta (\mathcal {S}^{*}_{t-s} \gamma _s) + \int _0^t \beta ( \mathcal {S}^{*}_{t-s}\gamma _s)^{\top } \mathrm{d}W_s K(t-s), \end{aligned}$$
which we call Volterra Wishart process in the following definition.

Definition 5.3

For \(\beta \), \(\widehat{\beta }\) as given in (5.8)–(5.9) and an \(\mathbb {S}^d\)-valued kernel K(t) defined by \(K(t) =\beta (\mathcal {S}_t^*\nu )\), we call the process defined in (5.10), Volterra Wishart process.

Remark 5.4

  1. (i)
    Note that \(\beta (\gamma _t)\) defines an \(\mathbb {R}^{n\times d}\)-valued Volterra OU process, that is,
    $$\begin{aligned} X_t:=\beta (\gamma _t)=\beta (\mathcal {S}^*_t \gamma _0)+ \int _0^t \mathrm{d}W_s K(t-s). \end{aligned}$$
    By the definition of \(\widehat{\beta }\), the Volterra Wishart process
    $$\begin{aligned} V_t=\widehat{\beta }(\lambda _t)= \beta (\gamma _t (\cdot ))^{\top } \beta (\gamma _t(\cdot ))=X^{\top }_t X_t \end{aligned}$$
    is thus the matrix square of a Volterra OU process, which justifies the terminology.
  2. (ii)
    Note that different lifts of the Volterra OU process given in (5.11) are possible, e.g., the forward process lift \(f_t(x):=\mathbb {E}[X_{t+x}|\mathcal {F}_t]\). Then, \(f_t(0)=X_t\), and similarly as in Cuchiero and Teichmann (2018, Section 5.2), it can be shown that f is an infinite dimensional OU process that solves the following SPDE (in the mild sense)
    $$\begin{aligned} \mathrm{d}f_t(x)=\frac{\mathrm{d}}{\mathrm{d}x} f_t(x) \mathrm{d}t + \mathrm{d}W_t K(x), \quad f_0(x) = \beta (\mathcal {S}^*_x \gamma _0), \end{aligned}$$
    on a Hilbert space H of absolutely continuous functions (AC) with values in \(\mathbb {R}^{n \times d}\), precisely \( H=\left\{ f \in AC(\mathbb {R}_+, \mathbb {R}^{n \times d}) \, | \, \int _0^{\infty } \Vert f'(x)\Vert ^2 \alpha (x) \mathrm{d}x < \infty \right\} \) where \( \alpha > 0 \) denotes a weight function (compare Filipović 2001). We can then set \(\lambda _t(x,y)=f_t^{\top }(x)f_t(y)\) and define the same Volterra Wishart process as in (5.10) by \(V_t:=\lambda _t(0,0)=X_t^{\top }X_t\). By Itô’s formula and variation of constants, its dynamics can then equivalently be expressed via
    $$\begin{aligned} V_t:=\lambda _t(0,0)= & {} f^{\top }_0(t)f_0(t)+ n\int _0^t K(t-s)K(t-s)\mathrm{d}s\nonumber \\&+\int _0^t K(t-s)\mathrm{d}W^{\top }_sf_s(t-s) + \int _0^t f^{\top }_s(t-s)\mathrm{d}W_s K(t-s).\nonumber \\ \end{aligned}$$
    Comparing (5.12) and (5.10) yields
    $$\begin{aligned} \beta (\mathcal {S}_x^*\gamma _t)=f_t(x)=\mathbb {E}[X_{t+x}| \mathcal {F}_t], \quad x,t \ge 0. \end{aligned}$$
  3. (iii)
    In the case when \(\beta \) and \(\mathcal {S}^*\) are as in Example 4.4, (5.10) reads as
    $$\begin{aligned} \int _{\mathbb {R}^2} \lambda (\mathrm{d}x_1, \mathrm{d}x_2)&= \int _{\mathbb {R}^2}e^{-(x_1+x_2)t}\lambda _0(\mathrm{d}x_1,\mathrm{d}x_2) + n \int _0^t K(t-s) K(t-s)\mathrm{d}s\\&\quad + \int _0^t\int _0^{\infty } K(t-s)\mathrm{d}W^{\top }_s e^{-x(t-s)} \gamma _s(\mathrm{d}x)\\&\quad + \int _0^t \int _0^{\infty } e^{-x(t-s)} \gamma ^{\top }_s(\mathrm{d}x)\mathrm{d}W_s K(t-s). \end{aligned}$$
    Hence by (5.13), \(\int _0^{\infty } e^{-x(t-s)}\gamma _s(\mathrm{d}x)=\mathbb {E}[X_t | \mathcal {F}_s]\). This yields exactly Eq. (1.6) considered in the introduction. Note that if \(\nu \) and in turn K are chosen as in Remark 4.5, this Volterra Wishart process has exactly the roughness properties desired in rough covariance modeling.

In the following remark, we list several properties of Volterra Wishart processes.

Remark 5.5

  1. (i)

    Note that the marginals of V are Wishart distributed as they arise from squares of Gaussians.

  2. (ii)

    In order to bring (5.6) in a “standard” Wishart form (with the matrix square root) as in (1.1) by replacing \(\gamma (\mathrm{d}x)\) by \(\sqrt{\lambda }(\mathrm{d}x,\mathrm{d}y)\), new notation has to be introduced, compare with (5.7).

  3. (iii)
    Nevertheless, both the drift and the diffusion characteristics of \(\lambda \) depend linearly only on \(\lambda \), e.g.,
    $$\begin{aligned} \frac{\mathrm{d}[\lambda _{ij}(\mathrm{d}x_1,\mathrm{d}x_2), \lambda _{kl}(\mathrm{d}y_1,\mathrm{d}y_2)]_t}{\mathrm{d}t}&= (K(x_1) K(y_1))_{ik} \lambda _{t,jl}(\mathrm{d}x_2,\mathrm{d}y_2)\\&\quad +(K(x_1) K(y_2))_{il} \lambda _{t,jk}(\mathrm{d}x_2,\mathrm{d}y_1)\\&\quad + (K(x_2)K(y_1))_{jk}\lambda _{t,il}(\mathrm{d}x_1,\mathrm{d}y_2)\\&\quad + (K(x_2)K(y_2))_{jl}\lambda _{t,ik}(\mathrm{d}x_1,\mathrm{d}y_1) \, , \end{aligned}$$
    which indicates that \((\lambda _t)_{t\ge 0}\) is Markovian on its own. This is shown rigorously below.
Using Theorem 2.8, we now show that \(\lambda \) is a generalized Feller process on \((\widehat{\mathcal {E}},\widehat{\varrho })\) with weight function \(\widehat{\varrho }\) satisfying
$$\begin{aligned} \widehat{\varrho }(\gamma \widehat{\otimes } \gamma ) = \varrho (\gamma ). \end{aligned}$$
We also prove that this generalized Feller process is affine, in the sense that its Laplace transform is exponentially affine in the initial value. The process \(\lambda \) can therefore be viewed as an infinite dimensional Wishart process on \(\widehat{\mathcal {E}}\) analogously to Bru (1991), Cuchiero et al. (2011).

Theorem 5.6

The process \(\lambda \) defined in (5.5) is Markovian on \(\widehat{\mathcal {E}}\). The corresponding semigroup is a generalized Feller semigroup on \( \mathcal {B}^{\widehat{\varrho }}(\widehat{\mathcal {E}}) \), where \(\widehat{\varrho }\) satisfies (5.14). Moreover, for \(y \in Y(\mathbb {R}^{n\times d})\),
$$\begin{aligned} \mathbb {E}_{\lambda _0} \left[ \exp \left( - \langle y \widehat{\otimes } y , \lambda _t \rangle \right) \right] = \exp (-\phi _t- \langle \psi _t, \lambda _0\rangle ), \end{aligned}$$
where \(\psi \) and \(\phi \) satisfy the following Riccati differential equations, namely \(\psi _0=y\widehat{\otimes } y \) and \(\partial _t \psi _t= R(\psi _t)\) in the mild sense with \(R: \widehat{\mathcal {E}}_* \rightarrow \widehat{\mathcal {E}}_*\) given by
$$\begin{aligned} R(y \widehat{\otimes } y)(x_1,x_2)&= \mathcal {A} y(x_1) \widehat{\otimes } y (x_2)+ y(x_1) \widehat{\otimes } \mathcal {A} y (x_2)\\&\quad - 2 \int _0^{\infty } \int _0^{\infty } y(\mathrm{d}x_1) \widehat{\otimes }y(\mathrm{d}x) \nu \widehat{\otimes } \nu (\mathrm{d}x,\mathrm{d}y) y(\mathrm{d}y)\widehat{\otimes }y(\mathrm{d}x_2) \end{aligned}$$
and \(\phi _0=0\) and \(\partial _t \phi _t= F(\psi _t)\) with \(F: \widehat{\mathcal {E}}_* \rightarrow \mathbb {R}\) given by
$$\begin{aligned} F(y \widehat{\otimes } y) = n \langle y \widehat{\otimes } y, \nu \widehat{\otimes } \nu \rangle . \end{aligned}$$


We apply Theorem 2.8 and Corollary 2.11 with
$$\begin{aligned} q: \mathcal {Y}^*(\mathbb {R}^{n\times d}) \rightarrow \widehat{\mathcal {E}}, \, \gamma \mapsto \gamma \widehat{\otimes } \gamma = \gamma (\cdot )^{\top }\gamma (\cdot ). \end{aligned}$$
Observe that this is a continuous map, since we use the dual topology \( \sigma (\widehat{\mathcal {E}}, Y \otimes Y) \) on \(\widehat{\mathcal {E}}\) and the respective polar \(\widehat{\mathcal {E}}_*\) defined by (5.4). Consider now the following set of Fourier basis elements
$$\begin{aligned} \widehat{\mathcal {D}}=\{ f_y: \widehat{\mathcal {E}} \rightarrow [0,1]; \lambda \mapsto \exp ( - \langle y \widehat{\otimes } y , \lambda \rangle ) \, |&\, y \in Y(\mathbb {R}^{n\times d})\} \end{aligned}$$
which is dense in \(\mathcal {B}^{\widehat{\varrho }}(\widehat{\mathcal {E}})\) by the very definition of the dual topology. We check now that the generalized Feller semigroup \(P^{\text {(OU)}}\) corresponding to (5.1) satisfies Assumption (2.8) for \(f \in \widehat{\mathcal {D}}\) , i.e., for every \(f \in \widehat{\mathcal {D}}\), there exists some g such that
$$\begin{aligned} P^{\text {(OU)}}_t (f \circ q) = g \circ q \, . \end{aligned}$$
Hence, we need to compute \( \mathbb {E}_{\gamma _0} \left[ \exp \left( - \langle y \widehat{\otimes } y , \gamma _t \widehat{\otimes }\gamma _t \rangle \right) \right] . \) By Lemma 5.7, this expression is given by (5.17). Therefore, (5.16) is clearly satisfied. This proves the first assertion. Concerning the affine property, we can deduce from Lemma 5.7 that \(\psi \) and \(\phi \) are given by
$$\begin{aligned} \psi _t&=(2q_t(y \widehat{\otimes } y )+{\text {Id}}_d)^{-1}(\mathcal {S}_t y \widehat{\otimes } \mathcal {S}_t y), \\ \phi _t&=\frac{n}{2}\log \det (2q_t(y \widehat{\otimes } y )+ {\text {Id}}_d), \end{aligned}$$
with \(q_t\) given in Lemma 5.7. Taking derivatives then leads to the form of the Riccati differential equations.\(\square \)

The following lemma provides an explicit expression for the Laplace transform of \(\gamma _t \widehat{\otimes }\gamma _t\). This resembles not surprisingly the Laplace transform of a non-central Wishart distribution with n degrees of freedom.

Lemma 5.7

Let \( \gamma \) be an Ornstein–Uhlenbeck process as defined in (5.1). Then for \(y \in Y(\mathbb {R}^{n\times d})\), the Laplace transform of \(\gamma _t \widehat{\otimes }\gamma _t\) is given by
$$\begin{aligned} \mathbb {E}_{\gamma _0}\left[ \exp (-\langle y \widehat{\otimes } y , \gamma _t \widehat{\otimes }\gamma _t \rangle )\right]= & {} \det (2q_t(y \widehat{\otimes } y )+ {\text {Id}}_d)^{-\frac{n}{2} }\nonumber \\&\times \exp (-\langle (2q_t(y \widehat{\otimes } y )+{\text {Id}}_d)^{-1}(\mathcal {S}_t y \widehat{\otimes } \mathcal {S}_t y) , \gamma _0 \widehat{\otimes } \gamma _0 \rangle ),\nonumber \\ \end{aligned}$$
where \(q_t(y \widehat{\otimes } y ) = \int _0^t \int _0^{\infty } \int _0^{\infty } \mathcal {S}^*_{s}\nu (\mathrm{d}x_1) y^{\top }(x_1) y(x_2)\mathcal {S}^*_{s}\nu (\mathrm{d}x_2) \mathrm{d}s \).


Assume for simplicity first that \(\mathcal {A}^*\) is equal to 0. Then, (5.1) becomes
$$\begin{aligned} \gamma _t(\mathrm{d}x)= \gamma _0(\mathrm{d}x) + W_t\nu (\mathrm{d}x). \end{aligned}$$
Fix \(y \in Y(\mathbb {R}^{n\times d})\) such that \(\int _0^{\infty } y(x) \nu (\mathrm{d}x)\) is well defined. We then have
$$\begin{aligned} \langle y \widehat{\otimes } y , \gamma _t \widehat{\otimes }\gamma _t \rangle&= \langle y \widehat{\otimes } y ,(\gamma _0 + W_t\nu ) \widehat{\otimes } (\gamma _0 + W_t\nu )\rangle \\&= \langle y \widehat{\otimes } y , \gamma _0 \widehat{\otimes } \gamma _0 \rangle + \langle y \widehat{\otimes } y , \gamma _0 \widehat{\otimes } W_t\nu \rangle +\langle y \widehat{\otimes } y , W_t\nu \widehat{\otimes } \gamma _0 \rangle \\&\quad + \langle y \widehat{\otimes } y , W_t\nu \widehat{\otimes } W_t\nu \rangle . \end{aligned}$$
Note now that
$$\begin{aligned} \langle y \widehat{\otimes } y , \gamma _0 \widehat{\otimes } W_t\nu \rangle&= {{\,\mathrm{Tr}\,}}\left( \left( W_t \int _0^{\infty } \int _0^{\infty } \nu (\mathrm{d}x_2) y^{\top }(x_1) y(x_2) \gamma _0^{\top }(\mathrm{d}x_1)\right) \right) \\&=:{{\,\mathrm{Tr}\,}}(W_t a),\\ \langle y \widehat{\otimes } y , W_t\nu \widehat{\otimes } \gamma _0 \rangle&= {{\,\mathrm{Tr}\,}}\left( \left( \int _0^{\infty } \int _0^{\infty }\gamma _0(\mathrm{d}x_2) y^{\top }(x_1) y(x_2) \nu (\mathrm{d}x_1)\right) W^{\top }_t\right) \\&=:{{\,\mathrm{Tr}\,}}(a_1 W^{\top }_t)={{\,\mathrm{Tr}\,}}( W_t a_1^{\top })={{\,\mathrm{Tr}\,}}(W_t a),\\ \langle y \widehat{\otimes } y , W_t\nu \widehat{\otimes } W_t\nu \rangle&= {{\,\mathrm{Tr}\,}}\left( \left( \int _0^{\infty } \int _0^{\infty } \nu (\mathrm{d}x_2) y^{\top }(x_1) y(x_2)\nu (\mathrm{d}x_1)\right) W_t^{\top }W_t\right) \\&=: {{\,\mathrm{Tr}\,}}(b W_t^{\top } W_t), \end{aligned}$$
where \(a\in \mathbb {R}^{d \times n}\), \(a_1 \in \mathbb {R}^{n \times d}\), \(b \in \mathbb {R}^{d\times d}\) and \(a=a_1^{\top }\).
For the following calculation, let \(n=1\). Then, using these expressions, we find
$$\begin{aligned}&\mathbb {E}\left[ \exp (-\langle y \widehat{\otimes } y , \gamma _t \widehat{\otimes }\gamma _t \rangle )\right] \\&\quad =\exp (-\langle y \widehat{\otimes } y , \gamma _0 \widehat{\otimes } \gamma _0 \rangle )\mathbb {E}\left[ \exp ( -2 {{\,\mathrm{Tr}\,}}(W_ta) - {{\,\mathrm{Tr}\,}}(bW_t^{\top } W_t)\right] \\&\quad =\exp (-\langle y \widehat{\otimes } y , \gamma _0 \widehat{\otimes } \gamma _0 \rangle )\frac{1}{(2\pi )^{\frac{d}{2}} t^{\frac{d}{2}}} \int _{\mathbb {R}^{1 \times d}} e^{-2{{\,\mathrm{Tr}\,}}(xa) - {{\,\mathrm{Tr}\,}}(bx^{\top }x) -\frac{1}{2t} x x^{\top }}\mathrm{d}x\\&\quad =\exp (-\langle y \widehat{\otimes } y , \gamma _0 \widehat{\otimes } \gamma _0 \rangle )\\&\quad \quad \times \frac{1}{\det (2b+ \frac{1}{t} {\text {Id}}_d)^\frac{1}{2} t^{\frac{d}{2}}}\frac{1}{(2\pi )^{\frac{d}{2}} } \int _{\mathbb {R}^{1 \times d}} e^{-2 xa - \frac{1}{2}x (2b+ \frac{1}{t} {\text {Id}}_d) x^{\top } }\det (2b+ \frac{1}{t} {\text {Id}}_d)^{\frac{1}{2}}\mathrm{d}x\\&\quad =\frac{1}{\det (2b+ \frac{1}{t} {\text {Id}}_d)^\frac{1}{2} t^{\frac{d}{2}}}\exp (-\langle y \widehat{\otimes } y , \gamma _0 \widehat{\otimes } \gamma _0 \rangle )\exp (2 a^{\top } (2b+ \frac{1}{t} {\text {Id}}_d)^{-1} a), \end{aligned}$$
where in the last line we used the formula for the moment generating function of a Gaussian random variable with covariance \((2b+ \frac{1}{t} {\text {Id}}_d)^{-1}\). Simplifying further yields
$$\begin{aligned}&\mathbb {E}\left[ \exp (-\langle y \widehat{\otimes } y , \gamma _t \widehat{\otimes }\gamma _t \rangle )\right] \nonumber \\&\quad =\frac{1}{\det (2b+ \frac{1}{t} {\text {Id}}_d)^\frac{1}{2} t^{\frac{d}{2}}} \exp (\langle (2 b(2b+ \frac{1}{t} {\text {Id}}_d)^{-1}-{\text {Id}}_d) (y \widehat{\otimes } y), \gamma _0 \widehat{\otimes } \gamma _0 \rangle ) \nonumber \\&\quad =\frac{1}{\det (2bt+ {\text {Id}}_d)^{\frac{1}{2} }} \exp (\langle -({\text {Id}}_d + 2bt)^{-1} (y \widehat{\otimes } y), \gamma _0 \widehat{\otimes } \gamma _0 \rangle ). \end{aligned}$$
For general n, note that we can write
$$\begin{aligned} W^{\top }_t W_t= \sum _{j=1}^n W^{\top }_{j,t} W_{j,t}, \end{aligned}$$
where the \(W_j\) are the rows of W and thus take values in \(\mathbb {R}^{1 \times d }\). Similarly,
$$\begin{aligned} {{\,\mathrm{Tr}\,}}(W_t a)= {{\,\mathrm{Tr}\,}}\left( \sum _{j=1}^n W_{j,t} \left( \int _0^{\infty } \int _0^{\infty } \nu (\mathrm{d}x_2) y^{\top }(x_1) y(x_2) \gamma _{0,j}^{\top }(\mathrm{d}x_1)\right) \right) \!\! =:\sum _{j=1}^n W_{j,t} a_j, \end{aligned}$$
where \(\gamma _{0,j}\) are the rows of \(\gamma _0\). Using the independence of all \(W_j\) and applying (5.18) then lead to
$$\begin{aligned} \mathbb {E}\left[ \exp (-\langle y \widehat{\otimes } y , \gamma _t \widehat{\otimes }\gamma _t \rangle )\right] = \frac{1}{\det (2bt+ {\text {Id}}_d)^{\frac{n}{2} }} \exp (-\langle ({\text {Id}}_d + 2bt)^{-1} (y \widehat{\otimes } y), \gamma _0 \widehat{\otimes } \gamma _0 \rangle ). \end{aligned}$$
The general case for \(\mathcal {A}^* \ne 0\) can now be traced back to this situation. Indeed, by the variation of constants formula, \(\gamma _t\) is given by
$$\begin{aligned} \gamma _t=\mathcal {S}^*_t \gamma _0 + \int _0^t \mathrm{d}W_s \mathcal {S}^*_{t-s}\nu (\mathrm{d}x). \end{aligned}$$
Therefore, we need to replace bt by
$$\begin{aligned} q_t=\int _0^t\int _0^{\infty } \int _0^{\infty } \mathcal {S}^*_{t-s}\nu (\mathrm{d}x_1) y^{\top }(x_1) y(x_2)\mathcal {S}^*_{t-s}\nu (\mathrm{d}x_2) \mathrm{d}s \end{aligned}$$
and \(\gamma _0\) by \(\mathcal {S}^*_t \gamma _0\). This then yields (5.17). Note that this now holds for general \(y \in Y(\mathbb {R}^{n \times d})\) even if \(\int _0^{\infty } y(x) \nu (\mathrm{d}x)\) is not necessarily well defined.\(\square \)

6 (Rough) Volterra-type affine covariance models

The goal of this section is to apply the above constructed affine covariance models for multivariate stochastic volatility models with d assets. We exemplify this with the Volterra Wishart process of Sect. 5 and define a (rough) multivariate Volterra Heston-type model with possible jumps in the price process. Roughness can be achieved by specifying \(\nu \) and in turn the kernel of the Volterra Wishart process as in Remark 4.5. The log-price process denoted by P and taking values in \(\mathbb {R}^d\) evolves according to
$$\begin{aligned} dP_t= & {} -\frac{1}{2}{\text {diag}}(V_t)\mathrm{d}t - \int _{\mathbb {R}^d} (e^{\xi }- \mathbf {1} - \xi ) {{\,\mathrm{Tr}\,}}(V_t m(\mathrm{d}\xi )) + X_t^{\top } \mathrm{d}B_t \nonumber \\&+ \int _{\mathbb {R}^d} \xi (\mu ^P(\mathrm{d}\xi ) - {{\,\mathrm{Tr}\,}}(V_t m(\mathrm{d}\xi )), \end{aligned}$$
where \(X_t\) denotes the Volterra OU process defined in Remark 5.4, \(\mathbf {1}\) the vector in \(\mathbb {R}^d\) with all entries being 1 and \(e^{\xi }\) has to be understood componentwise. Moreover, \(B_t\) is an \(\mathbb {R}^n\)-valued Brownian motion, which can be correlated with the matrix Brownian motion W appearing in (5.1) as follows
$$\begin{aligned} B_t= W_t\varrho + \sqrt{(1 -\varrho ^{\top } \varrho )} \widetilde{B}_t, \end{aligned}$$
where \(\widetilde{B}_t\) is an \(\mathbb {R}^n\)-valued Brownian motion independent of W and \(\varrho \in \mathbb {R}^d\). Moreover, \(\mu ^P\) denotes the random measure of the jumps with compensator \({{\,\mathrm{Tr}\,}}(V m(\mathrm{d}\xi ))\), where V is the Volterra Wishart process of (5.10) and m a positive semidefinite measure supported on \(\mathbb {R}^d\).

As a corollary of Section 5 and Cuchiero (2011, Section 5), we obtain the following result, namely that the log-price process together with the infinite dimensional Wishart process \(\lambda \) given in (5.5) is an affine Markov process.

Before formulating the precise statement, note that the continuous covariation2\(\langle P_{i}, \lambda _{kl}(\mathrm{d}x_1, \mathrm{d}x_2)\rangle _t \) is given by
$$\begin{aligned} \frac{\langle P_{i}, \lambda _{kl}(\mathrm{d}x_1, \mathrm{d}x_2)\rangle _t}{\mathrm{d}t}&= (\beta ^{\top }(\gamma _t)\gamma _t(\mathrm{d}x_1))_{il}(\nu (\mathrm{d}x_2)\varrho )_k\\&\quad +(\beta ^{\top }(\gamma _t)\gamma _t(\mathrm{d}x_1))_{ik}(\nu (\mathrm{d}x_2)\varrho )_l, \end{aligned}$$
where \(\gamma \) is the infinite dimensional OU process of (5.1). Note that \(\beta ^{\top }(\gamma _t)\gamma _t(\mathrm{d}x_1)\) can also be written as linear map from \(\widehat{\mathcal {E}} \rightarrow Y^*(\mathbb {S}^d)\) which we denote by \(\widetilde{\beta }\), i.e.,
$$\begin{aligned} \widetilde{\beta }(\lambda _t)(\mathrm{d}x_1)=\beta ^{\top }(\gamma _t)\gamma _t(\mathrm{d}x_1). \end{aligned}$$
In the standard example of 4.4, we have \( \widetilde{\beta }(\lambda )(\mathrm{d}x_1) = \int _{x_2} \lambda (\mathrm{d}x_1, \mathrm{d}x_2)\). The adjoint operator of \(\widetilde{\beta }\) from \(Y(\mathbb {S}^d)\) to \(Y(\mathbb {R}^{n\times d}) \widehat{\otimes } Y(\mathbb {R}^{n\times d})\) is denoted by \( \widetilde{\beta }_*\) and given by
$$\begin{aligned} \langle \widetilde{\beta }(\lambda ), y \rangle = \langle \lambda , \widetilde{\beta }_*(y)\rangle , \quad y \in Y(\mathbb {S}^d), \end{aligned}$$
where the brackets are the pairings in the respective spaces. With this notation, we are now ready to state the result. Its proof is a combination of the results of Section 5 and Cuchiero (2011, Section 5).

Corollary 6.1

The joint process \((\lambda , P)\) with \(\lambda \) defined in (5.5) and P defined in (6.1) is Markovian with state space \((\widehat{\mathcal {E}}, \mathbb {R}^d)\). It is affine in the sense that for \((y, v) \in Y(\mathbb {R}^{n \times d}) \times \mathbb {R}^d\), we have
$$\begin{aligned} \mathbb {E}_{\lambda _0, P_0} \left[ \exp \left( - \langle y \widehat{\otimes } y , \lambda _t \rangle + \text {i } v^{\top } P_t\right) \right] = \exp (-\phi _t- \langle \psi _t, \lambda _0\rangle + \text {i } v^{\top } P_0). \end{aligned}$$
The function \(\psi \) satisfies the following Riccati differential equations, namely \(\psi _0=y\widehat{\otimes } y \) and \(\partial _t \psi _t= R(\psi _t,\text {i } v )\), in the mild sense with \(R: \widehat{\mathcal {E}}_* \times \text {i } \mathbb {R}^d \rightarrow \widehat{\mathcal {E}}_*\) given by
$$\begin{aligned} R(y \widehat{\otimes } y, \text {i } v)(x_1,x_2)&= \mathcal {A} y(x_1) \widehat{\otimes } y (x_2)+ y(x_1) \widehat{\otimes } \mathcal {A} y (x_2)\\&\quad - 2 \int _0^{\infty } \int _0^{\infty } y(\mathrm{d}x_1) \widehat{\otimes }y(\mathrm{d}x) \nu \widehat{\otimes } \nu (\mathrm{d}x,\mathrm{d}y) y(\mathrm{d}y)\widehat{\otimes }y(\mathrm{d}x_2) \\&\quad + \frac{1}{2} \sum _{i=1}^d \text {i } v_i \widehat{\beta }_* (e_i e_i^{\top })(x_1, x_2)\\&\quad + \widehat{\beta }_*(\int _{\mathbb {R}^d}(\text {i } v^{\top } (e^{\xi }- \mathbf {1} - \xi ) )m(\mathrm{d}\xi ))(x_1,x_2)\\&\quad + \frac{1}{2} \widehat{\beta }_*(v v^{\top })(x_1, x_2)\\&\quad + \widetilde{\beta }_* (\int _0^{\infty } y(\cdot ) \widehat{\otimes }y(x)\nu (\mathrm{d}x))(x_1,x_2) \varrho \text {i } v^{\top }\\&\quad + \text {i } v \varrho ^{\top } \widetilde{\beta }_* (\int _0^{\infty } \nu (\mathrm{d}x) y(\cdot ) \widehat{\otimes }y(x))(x_1,x_2) \\&\quad -\widehat{\beta }_*( \int _{\mathbb {R}^d}( \exp (\text {i } v^{\top } \xi ) -1 - \text {i } v^{\top } \xi ) m(\mathrm{d}\xi ))(x_1, x_2), \end{aligned}$$
where \(\widehat{\beta }_*\) and \(\widetilde{\beta }_*\) are the adjoint operators of \(\widehat{\beta }\) given in (5.9) and \(\widetilde{\beta }\) given in (6.2), respectively. The function \(\phi \) satisfies \(\phi _0=0\) and \(\partial _t \phi _t= F(\psi _t)\) with \(F: \widehat{\mathcal {E}}_* \rightarrow \mathbb {R}\) given by
$$\begin{aligned} F(y \widehat{\otimes } y) = n \langle y \widehat{\otimes } y, \nu \widehat{\otimes } \nu \rangle . \end{aligned}$$

Remark 6.2

In a similar spirit, one can define multivariate affine covariance models with the affine Volterra jump process V given in (4.23). The log-price process (under some risk neutral measure) evolves then according to
$$\begin{aligned} dP_t&=-\frac{1}{2}{\text {diag}}(V_t)\mathrm{d}t - \int _{\mathbb {R}^d} (e^{\xi }- \mathbf {1} - \xi ) {{\,\mathrm{Tr}\,}}(V_t m(\mathrm{d}\xi )) + \sqrt{V}_t \mathrm{d}B_t \\&\quad + \int _{\mathbb {R}^d} \xi (\mu ^P(\mathrm{d}\xi ) - {{\,\mathrm{Tr}\,}}(V_t m(\mathrm{d}\xi )), \end{aligned}$$
where B is a d-dimensional Brownian motion and the jump measure m of P and \(\mu \) of the Markovian lift \(\lambda \) as given in (4.17) can be the marginals of some common measure supported on \(\mathbb {S}^d_+ \times \mathbb {R}^d \).


  1. 1.

    We refer to the original paper of (Hawkes 1971) for the one-dimensional case.

  2. 2.

    Here, the brackets stand for the covariation and not for the pairing.



Open access funding provided by Vienna University of Economics and Business (WU).


  1. Abi Jaber, E., El Euch, O.: Markovian structure of the Volterra Heston model. Stat. Probab. Lett. 149, 63–72 (2019)CrossRefGoogle Scholar
  2. Abi Jaber, E., Larsson, M., Pulido, S.: Affine Volterra processes. Ann. Appl. Probab. (2019) (to appear) Google Scholar
  3. Alòs, E., León, J., Vives, J.: On the short-time behavior of the implied volatility for jump-diffusion models with stochastic volatility. Finance Stoch. 11(4), 571–589 (2007)CrossRefGoogle Scholar
  4. Alòs, E., Yang, Y.: A fractional Heston model with H \(>\)1/2. Stochastics 89(1), 384–399 (2017)CrossRefGoogle Scholar
  5. Bayer, C., Friz, P., Gatheral, J.: Pricing under rough volatility. Quant. Finance 16(6), 887–904 (2016)CrossRefGoogle Scholar
  6. Bru, M.F.: Wishart processes. J. Theor. Probab. 4(4), 725–751 (1991)CrossRefGoogle Scholar
  7. Cuchiero, C.: Affine and polynomial processes. Ph.D. thesis, ETH Zürich (2011)Google Scholar
  8. Cuchiero, C., Filipović, D., Mayerhofer, E., Teichmann, J.: Affine processes on positive semidefinite matrices. Ann. Appl. Probab. 21(2), 397–463 (2011)CrossRefGoogle Scholar
  9. Cuchiero, C., Teichmann, J.: Generalized Feller processes and Markovian lifts of stochastic Volterra processes: the affine case. arXiv:1804.10450 (2018)
  10. Dalang, R.C.: Extending the martingale measure stochastic integral with applications to spatially homogeneous SPDEs. Electron. J. Probab. 4(6), 29 (1999)Google Scholar
  11. Dörsek, P., Teichmann, J.: A semigroup point of view on splitting schemes for stochastic (partial) differential equations. arXiv:1011.2651 (2010)
  12. El Euch, O., Rosenbaum, M.: The characteristic function of rough Heston models. Math. Finance 29(1), 3–38 (2019)CrossRefGoogle Scholar
  13. Engel, K.J., Nagel, R.: One-Parameter Semigroups for Linear Evolution Equation. Volume 194 of Graduate Texts in Mathematics. Springer, New York (2000). (With contributions by S. Brendle, M. Campiti, T. Hahn, G. Metafune, G. Nickel, D. Pallara, C. Perazzoli, A. Rhandi, S. Romanelli and R. Schnaubelt)Google Scholar
  14. Ethier, S.N., Kurtz, T.G.: Markov Processes: Characterization and Convergence. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wiley, New York (1986)CrossRefGoogle Scholar
  15. Filipović, D.: Consistency Problems for Heath–Jarrow–Morton Interest Rate Models. Lecture Notes in Mathematics, vol. 1760. Springer, Berlin (2001)CrossRefGoogle Scholar
  16. Gatheral, J., Jaisson, T., Rosenbaum, M.: Volatility is rough. Quant. Finance 18(6), 933–949 (2018)CrossRefGoogle Scholar
  17. Gripenberg, G., Londen, S.-O., Staffans, O.: Volterra Integral and Functional Equations. Volume 34 of Encyclopedia of Mathematics and Its Applications. Cambridge University Press, Cambridge (1990)CrossRefGoogle Scholar
  18. Hawkes, A.G.: Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1), 83–90 (1971)CrossRefGoogle Scholar
  19. Mayerhofer, E.: Affine processes on positive semidefinite \(dd\) matrices have jumps of finite variation in dimension \(d>1\). Stoch. Process. Appl. 122(10), 3445–3459 (2012)CrossRefGoogle Scholar
  20. Pazy, A.: Semigroups of Linear Operators and Applications to Partial Differential Equations. Volume 44 of Applied Mathematical Sciences. Springer, New York (1983)CrossRefGoogle Scholar
  21. Schaefer, H.H., Wolff, M.P.: Topological Vector Spaces. Volume 3 of Graduate Texts in Mathematics, 2nd edn. Springer, New York (1999)CrossRefGoogle Scholar

Copyright information

© The Author(s) 2019

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Vienna University of Economics and BusinessViennaAustria
  2. 2.ETH ZürichZurichSwitzerland

Personalised recommendations