Markovian lifts of positive semidefinite affine Volterra type processes

We consider stochastic partial differential equations appearing as Markovian lifts of matrix valued (affine) Volterra type processes from the point of view of the generalized Feller property (see e.g., \cite{doetei:10}). We introduce in particular Volterra Wishart processes with fractional kernels and values in the cone of positive semidefinite matrices. They are constructed from matrix products of infinite dimensional Ornstein Uhlenbeck processes whose state space are matrix valued measures. Parallel to that we also consider positive definite Volterra pure jump processes, giving rise to multivariate Hawkes type processes. We apply these affine covariance processes for multivariate (rough) volatility modeling and introduce a (rough) multivariate Volterra Heston type model.


Introduction
It is the goal of this article to investigate the results of [9] on infinite dimensional Markovian lifts of stochastic Volterra processes in a multivariate setup: we are mainly interested in the case where the stochastic Volterra processes take values in the cone of positive semidefinite matrices S d + . We shall concentrate on the affine case due to its relevance for tractable rough covariance modeling, extending rough volatility (see e.g., [3,16,5]) to a setting of d "roughly correlated" assets.
Viewing stochastic Volterra processes from an infinite dimensional perspective allows to dissolve a generic non-Markovanity of the at first sight naturally low dimensional volatility process. Indeed, this approach makes it actually possible to go beyond the univariate case considered so far and treat the problem of multivariate rough covariance models for more than one asset. Moreover, the considered Markovian lifts allow to apply the full machinery of affine processes. We refer to the introduction of [9] for an overview of theoretical and practical advantages of Markovian lifts in the context of Volterra type processes.
Let us start now by explaining why the matrix valued positive definite case is actually more involved than the scalar one in R + , where for instance the Volterra Cox-Ingersoll-Ross process takes values (see e.g., [14,1,4] where it appears as variance process in a rough Heston model): consider a standard Wishart process on S d + , as defined in [6,8], of the form Here √ . denotes the matrix square root, Id d the identity matrix and W a d × d matrix of Brownian motions. The (necessary) presence of the dimension d in the drift is an obvious obstruction to infinite dimensional versions of this equation, which could be projected to obtain Volterra type equations by the variation of constants formula (see [9] for such a projection on R + ). In order to circumvent this difficulty we present two approaches in this paper: • We develop a theory of infinite dimensional affine Markovian lifts of pure jump positive semidefinite Volterra processes. • We develop a theory of squares of Gaussian processes in a general setting to construct infinite dimensional analogs of Wishart processes. Their finite dimensional projections, however, look different from naively conjectured Volterra Wishart processes following the role model of Volterra Cox-Ingersoll-Ross processes. They are also different in dimension one, as outlined below. The jump part appears natural and comes without any further probabilistic problem when constrained to finite variation jumps. Note that in the (non-Volterra) case of affine processes on positive semidefinite matrices, quadratic variation jumps are not possible either (see [19]). With the generalized Feller approach from [11,9] we obtain a new class of stochastic Volterra processes taking values in S d + of the form (1.2) where h : R + → S d + is some deterministic function, K a (potentially fractional) kernel in L 2 (R + , S d + ) and N a pure jump process of finite variation with jump sizes in S d + , whose compensator is a linear function in V . This allows for instance to define a multivariate Hawkes process N (see [18] for the one-dimensional case) with values in N d 0 given by the diagonal entries of N , i.e., diag(N ) = N and the compensator of N i is given by · 0 V s,ii ds (see Example 4.16). By means of the affine transform formula for the infinite dimensional lift of (1.2), we are able to derive an expression for the Laplace transform of V t which can be computed by means of matrix Riccati Volterra equations.
The difficulty of the continuous part arises from geometric constraints, which can however be circumvent by building squares of unconstrained processes. Let us illustrate the idea in a finite dimensional setting: Let W be an n × d matrix of Brownian motions and let ν be a matrix in R d×dk consisting of k submatrixes ν i ∈ R d×d , i = 1, . . . , k, i.e., ν = (ν 1 , . . . , ν k ).
Define now a Gaussian process with values in R n×dk by γ := W ν. Then, by Itô's product formula the R dk×dk valued process γ ⊤ t γ t satisfies the following equation Following Marie-France Bru [6,Subsection 5.2] and setting λ t := γ ⊤ t γ t , this can however also be written via a kd × kd matrix of independent Brownian motions B satisfying in the more familiar form Our article is devoted to analyze the situation where the index variable ν gets continuous, which is the only possible form of an infinite dimensional Wishart process. We believe that generalized Feller processes are the right arena to achieve this purpose. In this article we choose measure spaces, but an analogous analysis can be done in the setting of function spaces as for instance the Hilbert space setting of [15] (see [9,Section 5.2]). In the measure-valued setting we proceed as follows: let γ be an infinite dimensional Ornstein-Uhlenbeck process taking values in R n×d -valued regular Borel measures on R + . Then Volterra Wishart processes arise as finite dimensional projections of γ ⊤ (dx 1 )γ(dx 2 ) on S d + and can be written as where h and K are as in (1.2), W an n×d matrix of Brownian motions and Y (t, s) = ∞ 0 e −x(t−s) γ s (dx). As explained in Remark 5.4, V t corresponds to the matrix square of a Volterra Ornstein Uhlenbeck process X t , obtained as finite dimensional projection of γ(dx). The Volterra Wishart process (1.6) can then also be written in terms of the forward process of X t , i.e. (E[X t |F s ]) s≤t , namely Note that this is not of standard Volterra form, as e.g. in [2], since Y (t, s) or E[X t |F s ] respectively cannot be expressed as a function of V t . By moving to a Brownian field analogous to (1.4) it could however be expressed as a path functional of (V s ) s≤t . For n = d = 1 it also gives rise to a different equation than the Volterra CIR process. We explain the connection between (1.6) and (1.3)-(1.5) in detail in Section 5. Note that by choosing K to be a matrix of fractional kernels the trajectories of (1.6) become rough, whence V qualifies for rough covariance modeling with potentially different roughness regimes for different assets and their covariances. This is in accordance with econometric observations. In Section 6 we show how such models can be defined: we introduce a (rough) multivariate Volterra Heston type model with jumps and show that it can again be cast in the affine framework. This is particularly relevant for pricing basket or spread options using the Fourier pricing approach.
The remainder of the article is organized as follows: in Section 1.1 we introduce some notation and review certain functional analytic concepts. In Section 2 and 3, we recall and extend results on generalized Feller processes as outlined in [9]. In particular, Theorem 2.8 provides a result on invariant (sub)spaces for generalized Feller processes that is crucial for the square construction as outlined above. In Sections 4 we apply the presented theory to SPDEs which are lifts of matrix valued stochastic Volterra jump processes of type (1.2). Section 5 is devoted to present a theory of infinite dimensional Wishart processes which in turn give rise to (rough) Volterra Wishart processes. In Section 6 we apply these processes for multivariate (rough) volatility modeling.
1.1. Notation and some functional analytic notions. For the background in functional analysis we refer to the excellent textbook [21] as main reference and to the equally excellent books [12,20] for the background in strongly continuous semigroups.
We shall apply the following notations: let Y be a Banach space and Y * its dual space, i.e. the space of linear continuous functionals with the strong dual norm where y, λ := λ(y) denotes the evaluation of the linear functional λ at the point y ∈ Y . Since in the case of equation (1.2), cones E of Y * will be our statespaces, we denote the polar cones in pre-dual notation, i.e.
We denote spaces of bounded linear operators from Banach spaces . On Y * we shall usually consider beside the strong topology (induced by the strong dual norm) the weak- * -topology, which is the weakest locally convex topology making all linear functionals y, · on Y * continuous. Let us recall the following facts: • The weak- * -topology is metrizable if and only if Y is finite dimensional: this is due to Baire's category theorem since Y * can be written as a countable union of closed sets, whence at least one has to contain an open set, which in turn means that compact neighborhoods exist, i.e. a strictly finite dimensional phenomenon. • Norm balls K R of any radius R in Y * are compact with respect to the weak- * -topology, which is the Banach-Alaoglu theorem. • These balls are metrizable if and only if Y is separable: this is true since Y can be isometrically embedded into C(K 1 ), where y → y, · , for y ∈ Y . Since Y is separable, its embedded image is separable, too, which means -by looking at the algebra generated by Y in C(K 1 ) -that C(K 1 ) is separable, which is the case if and only if K 1 is metrizable.
Even though some results are more general, in particular often only compactness of K R is used, we shall always assume separability in this article. Finally, a family of linear operators (P t ) t≥0 on a Banach space Y with P t P s = P t+s for s, t ≥ 0 and with P 0 = I where I denotes the identity is called strongly continuous semigroup if lim t→0 P t y = y holds true for every y ∈ Y . We denote its generator usually by A which is defined as lim t→0 Pty−y t for all y ∈ dom(A), i.e. the set of elements where the limit exists. Notice that dom(A) is left invariant by the semigroup P and that its restriction on the domain equipped with the operator norm y dom(A) := y 2 + Ay 2 is again a strongly continuous semigroup. Moreover, as already used in the introduction, S d denotes the vector space of symmetric d × d matrices and S d + the cone of positive semidefinite ones. Furthermore, we denote by diag(A) the vector consisting of the diagonal elements of a matrix A.

Generalized Feller semigroups and processes
In the context of Markovian lifts of stochastic Volterra processes (signed) measure valued processes appear in a natural way. The generalized Feller framework is taylor-made for such processes, as it allows to consider non-locally compact state spaces. This we need explicitely in Section 5 for Ornstein-Uhlenbeck processes whose state space are matrix-valued measures. Beyond that jump processes with unbounded but finite activity can be easily constructed in this setting, see Proposition 3.4 and Section 4. We shall first collect some results from [9] and generalize accordingly for the purposes of this article.

2.1.
Defintions and results. First we introduce weighted spaces and state a central Riesz-Markov-Kakutani representation result. The underlying space X here is a completely regular Hausdorff topological space.
is called admissible weight function if the sets K R := {x ∈ X : ̺(x) ≤ R} are compact and separable for all R > 0.
An admissible weight function ̺ is necessarily lower semicontinuous and bounded from below by a positive constant. We call the pair X together with an admissible weight function ̺ a weighted space. A weighted space is σ-compact. In the following remark we clarify the question of local compactness of convex subsets E ⊂ X when X is a locally convex topological space and ̺ convex.
Remark 2.2. Let X be a separable locally convex topological space and E a convex subset. Moreover, let ̺ be a convex admissible weight function. Then ̺ is continuous on E if and only if E is locally compact. Indeed if ̺ is continuous on E, then of course the topology on E is locally compact since every point has a compact neighborhood of type {̺ ≤ R} for some R > 0. On the other hand if the topology on E is locally compact, then for every point λ 0 ∈ E there is a a convex, compact s ∈]0, 1]. This in turn means that ̺ is continuous at λ 0 .
From now on ̺ shall always denote an admissible weight function. For completeness we start by putting definitions for general Banach space valued functions, although in the sequel we shall only deal with R-valued functions: let Z be a Banach space with norm · Z . The vector space is a Banach space itself. It is also clear that for Z-valued bounded continuous functions the continuous embedding C b (X; Z) ⊂ B ̺ (X; Z) holds true, where we consider the supremum norm on bounded continuous functions, i.e. sup x∈X f (x) . Definition 2.3. We define B ̺ (X; Z) as the closure of C b (X; Z) in B ̺ (X; Z). The normed space B ̺ (X; Z) is a Banach space.
If the range space Z = R, which from now on will be the case, we shall write B ̺ (X) for B ̺ (X; R) and analogously B ̺ (X).
We consider elements of B ̺ (X) as continuous functions whose growth is controlled by ̺. More precisely we have by [11,Theorem 2.7 Additionally, by [11,Theorem 2.8] it holds that for every f ∈ B ̺ (X) with sup x∈X f (x) > 0, there exists z ∈ X such that which emphasizes the analogy with spaces of continuous functions vanishing at ∞ on locally compact spaces. Let us now state the following crucial representation theorem of Riesz type: Theorem 2.4 (Riesz representation for B ̺ (X)). For every continuous linear functional ℓ : B ̺ (X) → R there exists a finite signed Radon measure µ on X such that Additionally where |µ| denotes the total variation measure of µ.
We shall next consider strongly continuous semigroups on B ̺ (X) spaces and recover very similar structures as well known for Feller semigroups on the space of continuous functions vanishing at ∞ on locally compact spaces.
We obtain due to the Riesz representation property the following key theorem: Theorem 2.6. Let (P t ) t≥0 satisfy (i) to (iv) of Definition 2.5. Then, (P t ) t≥0 is strongly continuous on B ̺ (X), that is, One can also establish a positive maximum principle in case that the semigroup P t grows around 0 like exp(ωt) for some ω ∈ R with respect to the operator norm on B ̺ (X). Indeed, the following theorem proved in [11,Theorem 3.3] is a reformulation of the Lumer-Philips theorem for pseudo-contraction semigroups using a generalized positive maximum principle which is formulated in the sequel.
Theorem 2.7. Let A be an operator on B ̺ (X) with domain D, and ω ∈ R. A is closable with its closure A generating a generalized Feller semigroup (P t ) t≥0 with P t L(B ̺ (X)) ≤ exp(ωt) for all t ≥ 0 if and only if (i) D is dense, (ii) A − ω 0 has dense image for some ω 0 > ω, and (iii) A satisfies the generalized positive maximum principle, that is, As a new contribution to the general theorems we shall work out a statement on invariant subspaces which will be crucial for constructing squares of infinite dimensional OU-processes.
Theorem 2.8. Let X be a weighted space with weight ̺ 1 , and q : X → q(X) be a (surjective) continuous map from (X, ̺ 1 ) to the weighted space (q(X), ̺ 2 ). Let P (1) be a generalized Feller semigroup acting on B ̺1 (X). Assume that ̺ 2 • q ≤ ̺ 1 on X. Let D be a dense subspace of B ̺2 (q(X)). Furthermore, for every f ∈ D ⊂ B ̺2 (q(X)) and for every t ≥ 0, there is some g ∈ B ̺2 (q(X)) such that

8)
and additionally there is a constant C ≥ 1 such that Then there is a generalized Feller semigroup P (2) acting on B ̺2 (q(X)) such that due to the assumption ̺ 2 •q ≤ ̺ 1 . It is also injective, but its image is not necessarily closed. Assumption (2.8) and (2.9) now mean that for every f ∈ B ̺2 (q(X)) and not only for f ∈ D. Hence we can define which is by the very construction a semigroup of linear operators on B ̺2 (q(X)). Since M is continuous, its graph is closed, whence P (2) t is a bounded linear operator by the closed graph theorem. Moreover, property (iv) of Definition 2.5 holds true due to Assumption (2.9). Positivity is also preserved, since for f ≥ 0 we have due to Assumption (2.8) and the fact that P (1) is a generalized Feller semigroup, Here, g is nonnegative due the positivity of P   Remark 2.9. In the setting of general semigroups it is not clear that restrictions of semigroups to (not even closed) subspaces preserve strong continuity.
Remark 2.10. There are several methods to show that (2.8) is satisfied. In general it is not sufficient to assume that the generator of P (1) has this property.
Corollary 2.11. Let the assumptions of Theorem 2.8 except Assumption (2.9) hold true and suppose additionally that Then the same conclusions hold true. In particular the range of the operator M : We restate from [9] assertions on existence of generalized Feller processes and path properties. It is remarkable that in this very general context càg versions exist for countably many test functions.
Theorem 2.12. Let (P t ) t≥0 be a generalized Feller semigroup with P t 1 = 1 for t ≥ 0. Then there exists a filtered measurable space (Ω, (F t ) t≥0 ) with right continuous filtration, and an adapted family of random variables (λ t ) t≥0 such that for any initial value λ 0 ∈ X there exists a probability measure P λ0 with for t ≥ 0 and every f ∈ B ̺ (X). The Markov property holds true, i.e.
almost surely with respect to P λ0 . Theorem 2.13. Let (P t ) t≥0 be a generalized Feller semigroup and let (λ t ) t≥0 be a generalized Feller process on a filtered probability space. Then for every countable family (f n ) n≥0 of functions in B ̺ (X) we can choose a version of the processes , such that the trajectories are càglàd for all n ≥ 0. If additionally P t ̺ ≤ exp(ωt)̺ holds true, then (exp(−ωt)̺(λ t )) t≥0 is a super-martingale and can be chosen to have càglàd trajectories. In this case we obtain that the processes f n (λ t ) t≥0 can be chosen to have càglàd trajectories.
Remark 2.14. In the general case, when P t ̺ ≤ M exp(ωt)̺ for M > 1, we obtain for f n (λ t ) t≥0 only càg trajectories. To see this, consider the measurable set of sample events {sup 0≤t≤1 ̺(λ t ) ≤ R}. Then we can construct on the metrizable compact set {̺ ≤ R} a càglàd version of the processes fn(λt) and in turn also of f n (λ t ) t≥0 . The limit R → ∞, however, only leads to a càg version since we cannot control the right limits.

2.2.
Dual spaces of Banach spaces. The most important playground for our theory will be closed subsets of duals of Banach spaces, where the weak- * -topology appears to be σ-compact due to the Banach-Alaoglu theorem. Assume that E ⊂ Y * is a closed subset of the dual space Y * of some Banach space Y where Y * is equipped with its weak- * -topology. Consider a lower semicontinuous function ̺ : E → (0, ∞) and denote by (E, ̺) the corresponding weighted space. We have the following approximation result (see [11,Theorem 4.2]) for functions in B ̺ (E) by cylindrical functions. Set where ·, · denotes the pairing between Y * and Y . We denote by Cyl := N ∈N Cyl N the set of bounded smooth continuous cylinder functions on E.
Theorem 2.15. The closure of Cyl in B ̺ (E) coincides with B ̺ (E), whose elements appear to be precisely the functions f ∈ B ̺ (E) which satisfy (2.3) and that f | KR is weak- * -continuous for any R > 0.
Then we assume that (i) there are constants C and ε > 0 such that Theorem 2.18. Suppose Assumptions 2.16 hold true.
satisfies the generalized Feller property and is therefore a strongly continuous semigroup on B ̺ (E).
Proof. This follows from the arguments of [11, Section 5].

Approximation theorems
In order to establish existence of Markovian solutions for general generators A we could at least in the pseudo-contrative case either directly apply Theorem 2.7, where we have to assume that the generator A satisfies on a dense domain D a generalized positive maximum principle and that for at least one ω 0 > ω the range of A − ω 0 is dense, or we approximate a general generator A by (finite activity pure jump) generators A n and apply the following (well known) approximation theorems. They also work in the general context when the constant M > 1.
be a sequence of strongly continuous semigroups on a Banach space Z with generators (A n ) n∈N such that there are uniform (in n) growth bounds M ≥ 1 and ω ∈ R with for t ≥ 0. Let furthermore D ⊂ ∩ n dom(A n ) be a dense subspace with the following three properties: The sequence A n f converges as n → ∞ for each f ∈ D, in the following sense: there exists a sequence of numbers a nm → 0 as n, m → ∞ such that holds true for every f ∈ D and for all n, m.
Then there exists a strongly continuous semigroup (P ∞ t ) t≥0 with the same growth bound on Z such that lim n→∞ P n t f = P ∞ t f for all f ∈ Z uniformly on compacts in time and on bounded sets in D. Furthermore on D the convergence is of order O(a nm ). If in addition for each n ∈ N, (P n t ) t≥0 is a generalized Feller semigroup, then this property transfers also to the limiting semigroup.
For the purposes of affine processes a slightly more general version of the approximation theorem is needed, which we state in the sequel: Theorem 3.2. Let (P n t ) n∈N,t≥0 be a sequence of strongly continuous semigroups on a Banach space Z with generators (A n ) n∈N such that there are uniform (in n) growth bounds M ≥ 1 and ω ∈ R with P n t L(Z) ≤ M exp(ωt) for t ≥ 0. Let furthermore D ⊂ ∩ n dom(A n ) be a subset with the following two properties: (i) The linear span span(D) is dense.
(ii) There is a norm . D on span(D) such that for each f ∈ D and for t > 0 there exists a sequence a f,t nm , possibly depending on f and t, ,t nm f D holds true for n, m and for 0 ≤ u ≤ t, with a f,t nm → 0 as n, m → ∞. Then there exists a strongly continuous semigroup (P ∞ t ) t≥0 with the same growth bound on Z such that lim n→∞ P n t f = P ∞ t f for all f ∈ Z uniformly on compacts in time. If in addition for each n ∈ N, (P n t ) t≥0 is a generalized Feller semigroup, then this property transfers also to the limiting semigroup.
Our first application of Theorem 3.1 is the next proposition that extends wellknown results on bounded generators towards unbounded limits.
We repeat here a remark from [9] since it helps to understand the fourth condition on the measures: Remark 3.3. Let (P t ) t≥0 be a generalized Feller semigroup with P t L(B ̺ (X)) ≤ M exp(ωt) for some M ≥ 1 and some ω. Additionally it is assumed to be of transport type, i.e.
Notice that̺ is an admissible weight function, since is compact by the definition of ̺ and the continuity of x → ψ t (x) which leads to an intersection of closed subsets of compacts. Additionally we have that ̺ ≤̺ ≤ M ̺ by the growth bound and therefore the norm on B ̺ (X) is equivalent to Furthermore, P t f ̺ ≤ exp(ωt) f ̺ holds for all t ≥ 0 and f ∈ B ̺ (X). Indeed, this is a consequence of the following estimate Hence, Proposition 3.4. Let (X, ̺) be a weighted space with weight function ̺ ≥ 1. Consider an operator A on B ̺ (X) with dense domain dom(A) generating on B ̺ (X) a generalized Feller semigroup (P t ) t≥0 of transport type as in (3.2), such that for all t ≥ 0 we have P t L(B ̺ (X)) ≤ M 1 exp(ωt) for some M 1 and ω and such that B Consider furthermore a family of finite measures µ(x, .) for x ∈ X on X such that the operator B acts on B ̺ (X) by for x ∈ X yielding continuous functions on {̺ ≤ R} for R ≥ 0, and such that the following properties hold true: as well as for all x ∈ X. In particular y → sup t≥0 exp(−ωt)P t ̺(y) should be integrable with respect to µ(x, .) Then A + B generates a generalized Feller semigroup (P ∞ t ) t≥0 on B ̺ (X) satisfying P ∞ Remark 3.5. In contrast to classical Feller theory also processes with unbounded jump intensities can be constructed easily if ̺ is unbounded on X. The general character of the proposition allows to build general processes from simple ones by perturbation.

+
Building on the theory of generalized Feller proceses from above, we shall now treat the following type of matrix-measure valued SPDEs As shown below this equation corresponds to a Markovian lift of the Volterra jump process in (1.2).
We consider here the setting of Section 2.2. The underlying Banach space Y * is here the space of finite S d -valued regular Borel measures on the extended half real line R + := R + ∪ {∞} and E denotes a (positive definite) subset of Y * . Moreover, A * is the generator of a strongly continuous semigroup S * on Y * , ν ∈ Y * (or in a slightly larger space denoted by Z * in the sequel). The predual space Y is given by separable. The driving process X is an S d -valued pure jump Itô-semimartingale, whose differential characteristics depend linearly on λ, precisely specified below. Let us remark that other forms of differential characteristics of X, in particular beyond the linear case, can be easily incorporated in this setting.
The pairing between Y and Y * , denoted by ·, · , is specified via: where Tr denotes the trace. We also define another bilinear map via In the following we summarize the main ingredients of our setting. For the norm on S d we write · , which is given by u = Tr(u 2 ) for u ∈ S d . Assumption 4.1. Throughout this section we shall work under the following conditions: (i) We are given an admissible weight function ̺ on Y * (in the sense of Section 2) such that is a weighted space in the sense of Section 2. This will serve as statespace of (4.1). (iii) Let Z ⊂ Y be a continuously embedded subspace.
(iv) We assume that a semigroup S * with generator A * acts in a strongly continuous way on Y * and Z * , with respect to the respective norm topologies. Moreover, we suppose that for any matrix A ∈ S d it holds that (v) We assume that λ → S * t λ is weak- * -continuous on Y * and on Z * for every t ≥ 0 (considering the weak- * -topology on both the domain and the image space).
(vi) We suppose that the (pre-) adjoint operator of A * , denoted by A and domain dom(A) ⊂ Z ⊂ Y , generates a strongly continuous semigroup on Z with respect to the respective norm topology (but not necessarily on Y ).
To analyze solvability of (4.1) we first consider the following linear deterministic equation We denote by β * : S d → Y the adjoint operator defined via Remark 4.2. Notice that drift specifications could be more general here, but for the sake or readability we leave this direction for the interested reader.
For notational convenience we shall often leave the dx argument away when writing an (S)PDE of type (4.4) subsequently. Under the following assumptions on S * and ν ∈ Z * we can guarantee that (4.4) can be solved on the space Y * for all times in the mild sense with respect to the dual norm · Y * by a standard Picard iteration method.
For the linear operator β as of (4.5), we define which will correspond to a kernel in L 2 loc (R + , S d ) of a Volterra equation. Define furthermore R K ∈ L 2 loc (R + , S d ) as a symmetrized version of the resolvent of the second kind (see e.g. [17, Theorem 3.1]) that solves where K * R K denotes the convolution, i.e. K * R K = · 0 K(· − s)R K (s)ds. Example 4.4. The main example that we have in mind for β and for S * , and thus in turn for the kernel K, are the following specifications: In this case K = ∞ 0 e −xt ν(dx) and the adjoint operator β * is given by the constant function (β * (u))(x) = u, for all x ∈ R + .
Remark 4.5. To the semigroup S * t = e −xt of the above example, we associate our (main) specification of the space Z: let Z ⊂ Y such that for all y ∈ Y the map The corresponding dual space Z * ⊃ Y * is the space of regular S d -valued Borel measures ν on R + that satisfy Note that we can specify the components of ν to be measures of the form which gives rise to fractional kernels K ij (t) = ∞ 0 e −xt ν ij (dx) ≈ t Hij − 1 2 . These are in turn main ingredients of rough covariance modeling.
for all t ≥ 0. Applying the linear operator β and using property (4.5), we obtain a deterministic linear Volterra equation of the form where we have used (4.6).
Proof. We follow the arguments of [9] and translate the proof to the matrix-valued stetting. We show first the completely standard convergence of the Picard iteration scheme with respect to the dual norm on Y * . Define Then, by Assumption 4.3 (i) each λ n t lies Y * . Consider now where β op denotes the operator norm of β. Assumption 4.3 (ii) and an extended version of Gronwall's inequality see [10,Lemma 15] then yield convergence of (λ n t ) n∈N to some λ t with respect to the dual norm · Y * uniformly in t on compact intervals. For details on strongly continuous semigroups and mild solutions see [20].
Having established the existence of a mild solution of (4.4) in Y * , consider now the S d -valued process β(λ t ): where we applied property (4.5). Remember that R K denotes the resolvent of the second kind of K(t) = β(S * t ν) as introduced in (4.7) by means of which we can solve the above equation in terms of integrals of t → β(S * t λ 0 ). Since by assumption S * is a weak- * -continuous solution operator, the map λ 0 → (t → β(S * t λ 0 )) is weak- * -continuous as a map from Y * to C(R + , S d ) (with the topology of uniform convergence on compacts on C(R + , S d )). From (4.9) we thus infer that β(λ t ) is weak- * -continuous for every t ≥ 0, which clearly translates to the solution map of Equation (4.4).
Finally we have to show that the stated inequality for ̺(λ t ) holds true on small time intervals [0, ε]. Observe first that for t ∈ [0, ε] for all λ ∈ Y * just by the assumption that S * t is strongly continuous, for some Consider now the kernel K ′ (t, s) = 6ε β 2 op S * t−s ν 2 Y * 1 {s≤t} and denote by R ′ the resolvent of −K ′ , which is nonpositive. By exactly the same arguments as in [9], we then have for for some constant C. This leads to the desired assertion due to the definition of ̺. From this inequality also uniqueness follows in a standard way.
As our goal is to consider S d + -measure valued processes, we denote by E the following weak- * -closed convex cone The next proposition establishes that the solution of (4.4) leaves E invariant, if the following assumption holds true: Assumption 4.9. We assume that Proof. Consider first the slightly modified equation for some ε > 0. Then the operator B = S * ε ν(dx)β(·) + β(·)S * ε ν(dx) is bounded and the associated semigroup is given by P ε t = e Bt . Due to the assumptions on S * , ν and β, we have B(E) ⊆ E implying that P ε t (E) ⊆ E for all t ≥ 0. The Trotter-Kato Theorem (see, e.g., [12,Theorem III.5.8]) then yields that the semigroup associated to (4.10) maps E to itself. This then also holds true for the limit when ε = 0 by Theorem 3.1.
Since by Proposition 4.7 the solution operator is weak- * -continuous, we can conclude that λ 0 → f (λ t ) lies in B ̺ (E) for a dense set of B ̺ (E) by Theorem 2.15. Moreover, it satisfies the necessary bound (2.12) for ̺ and (2.13) is satisfied by (norm)-continuity of t → λ t . Hence all the conditions of Assumption 2.16 are satisfied and the solution operator therefore defines a generalized Feller semigroup (P t ) on B ̺ (E) by Theorem 2.18. This generalized Feller semigroup of course coincides with the previously constructed limit.
By the previous results we can now construct a generalized Feller process on E which jumps up by multiples of S * ε ν for some ε ≥ 0 and with an instantaneous intensity of size β(λ t ). Recall that E * ⊂ Y denotes the (pre-)polar cone of E, that is E * = {y ∈ Y | y ∈ C b (R + , S d − )}. Recall the notation from (4.2) and define the following set Proposition 4.11. Let Assumptions 4.3 and 4.9 be in force. Moreover, let µ be a finite S d + -valued measure on S d + such that ξ ≥1 ξ 2 µ(dξ) < ∞. Consider the SPDE where (N t ) t≥0 is a pure jump process with jump sizes in S d + and compensator · 0 S d + ξ Tr (β(λ s )µ(dξ)) ds.
(i) Then for every λ 0 ∈ E and ε > 0 , the SPDE (4.12) has a solution in E given by a generalized Feller process associated to the generator of (4.12). (ii) This generalized Feller process is also a probabilistically weak and analytically mild solution of (4.12), i.e.
which justifies Equation (4.12). In particular for every initial value the process N can be constructed on an appropriate probabilistic basis. The stochastic integral is defined in a pathwise way along finite variation paths. Moreover, for every family (f n ) n ∈ B ̺ (E), t → f n (λ t ) can be chosen to be càglàd for all n. (iii) For every ε > 0, the corresponding Riccati equation ∂ t y t = R(y t ) with R : D ∩ E * → Y given by admits a unique global solution in the mild sense for all initial values y 0 ∈ E * . (iv) The affine transform formula holds true, i.e.
where y t solves ∂ t y t = R(y t ) for all y 0 ∈ E * in the mild sense with R given by (4.13). Moreover y t ∈ E * for all t ≥ 0.
Proof. We assume that ν = 0, otherwise there is nothing to prove. To prove the first assertion we apply Proposition 3.4. By Proposition 4.7 and Proposition 4.10, the deterministic equation (4.4) has a mild solution on E which -by Assumption 4.3 -defines a generalized Feller semigroup (P t ) t≥0 on B ̺ (E). The operator A in Proposition 3.4 then corresponds to the generator of (P t ) t≥0 , i.e. the semigroup associated to the purely deterministic part of (4.12). This is a transport semigroup and in view of Remark 3.3 we can have an equivalent norm with respect to a new weight function̺ on B ̺ (E), such that P t L(B̺(E)) ≤ exp(ωt). Therefore we find ourselves in the conditions of Proposition 3.4.
Note that by the same arguments as in Proposition 4.10 and by applying Theorem 2.18, we can prove that (P t ) t≥0 also defines a generalized Feller semigroup on B √ ̺ (E). For the detailed proof which translates literally to the present setting we refer to [9].
In particular we know that ̺ ≤̺ and it holds that P t f (x) = f (ψ t (x)) where ψ is the solution of (4.4) which is linear. Using this together with The last inequality holds by the linearity of ψ and the second moment condition on µ. Proposition 3.4 now allows to conclude that A + B, where B is given by generates a generalized Feller semigroup P as asserted. For (ii), we now construct the probabilistically weak and analytically mild solution directly from the properties of the generalized Feller process: take y ∈ D where D is defined in (4.11) and consider the S d -valued martingale Ay, λ s + y, νβ(λ s ) + β(λ s )ν ds − t 0 y, S * ε νξ + ξS * ε ν Tr(β(λ s )µ(dξ))ds (4.14) for t ≥ 0 (after an appropriate and possible regularization according to Theorem 2.13). Let now y be as above with the additional property that y, S * ε νξ + ξS * ε ν = πξ + ξπ for all ξ ∈ S d + and some fixed π ∈ S d + . For such y define N π t = πN t + N t π := M y t + t 0 y, S * ε νξ + ξS * ε ν Tr(β(λ s )µ(dξ))ds (4.15) for t ≥ 0, which is a càglàd semimartingale. Notice that the left hand side only defines N π and not the more suggestive πN + N π. Then N π does not depend on y by construction. Indeed, for all y i with y i , S * ε νξ + ξS * ε ν = πξ + ξπ for all ξ, i = 1, 2, we clearly have t 0 y 1 − y 2 , S * ε νξ + ξS * ε ν Tr(β(λ s )µ(dξ))ds = 0 and M y1 − M y2 = M y1−y2 = 0 as well. The latter follows from the fact that the martingale M y is constant if y, S * ε νξ + ξS * ε ν = 0 for all ξ, since its quadratic variation vanishes in this case.
Moreover, by the definition of N π in (4.15) its compensator is given by t 0 (πξ + ξπ) Tr(β(λ s )µ(dξ))ds. Since it is sufficient to perform the previous construction for finitely many π to obtain all necessary projections, a process N can be defined such that N π = πN + N π, as suggested by the notation.
By (4.14) and the very definition of (4.15) we obtain that for y ∈ D. This analytically weak form can be translated into a mild form by standard methods. Indeed, notice that the integral is just along a finite variation path and therefore we can readily apply variation of constants. The last assertion about the càglàd property is a consequence of Theorem 2.13 by noting that ̺(λ) does not explode. This proves (ii). Concerning (iii), note first that we have a unique mild solution to (4.16) since this is the adjoint equation of (4.4). For the equation with jumps we proceed as in Proposition 4.7 via Picard iteration. Denote the semigroup associated to (4.16) by S β * and define Moreover, for t ∈ [0, δ] for some δ > 0 we have by local Lipschitz continuity of By an extension of Gronwall's inequality (see [10,Lemma 15]) this yields convergence of (y n t ) n∈N with respect to · Y and hence the existence of a unique local mild solution to (4.13) up to some maximal life time t + (y 0 ). That t + (y 0 ) = ∞ for all y 0 ∈ E * follows from the subsequent estimate where we used | exp( y, S * ε νξ + ξS * ε ν ) − 1| ≤ 1 for all y ∈ E * in the last estimate. To prove (iv), just note that by the existence of a generalized Feller semigroup the abstract Cauchy problem for the initial value exp( y 0 , . ) can be solved uniquely for y 0 ∈ E * . Indeed, E λ [exp( y 0 , λ t )] uniquely solves where A denotes the generator associated to (4.12). Setting u(t, λ) = exp( y t , λ ), where the right hand side is nothing else than A exp( y t , λ ), hence the affine transform formula holds true. This also implies that y t ∈ E * for all t ≥ 0, simply because We are now ready to state the main theorem of this section, namely an existence and uniqueness result for equations of the type where (X t ) t≥0 is an S d + -valued pure jump Itô semimartingale of the form with β specified in (4.5) satisfying Assumption 4.9 and random measure of the jumps µ X . Its compensator satisfies the following condition: Assumption 4.12. The compensator of µ X is given by For the formulation of the subsequent theorem we shall need the following set of Fourier basis elements  (i) Then the stochastic partial differential equation (4.17) admits a unique Markovian solution (λ t ) t≥0 in E given by a generalized Feller semigroup on B ̺ (E) whose generator takes on the set of Fourier elements for y ∈ D ∩ E * where D is defined in (4.11) the form Af y (λ) = f y (λ)( Ay, λ + R( y, ν ), λ ), (4.20) (ii) This generalized Feller process is also a probabilistically weak and analytically mild solution of (4.17), i.e.
This justifies Equation (4.17), in particular for every initial value the process X can be constructed on an appropriate probabilistic basis. The stochastic integral is defined in a pathwise way along finite variation paths. Moreover, for every family (f n ) n ∈ B ̺ (E), t → f n (λ t ) can be chosen to be càg for all n. (iii) The affine transform formula is satisfied, i.e.
where y t solves ∂ t y t = R(y t ) for all y 0 ∈ E * and t > 0 in the mild sense with R : D ∩ E * → Y given by given by with h(t) = β(S * t λ 0 ) admits a probabilistically weak solution with càg trajectories.
(v) The Laplace transform of the Volterra equation V t is given by E λ0 [exp (Tr(uV t ))] = exp Tr(uh(t)) + t 0 Tr(R(ψ s )h(t − s))ds , (4.24) where and ψ t solves the matrix Riccati Volterra equation Hence the solution of the stochastic Volterra equation in (4.23) is unique in law.
Remark 4.14. One essential point here is that we loose the càglàd property as stated in Proposition 4.11 (ii) when we let ε of S ε tend to zero. As long as the kernel K has a singularity at t = 0 it is impossible to preserve finite growth bounds with M = 1, as ε → 0, but we get càg versions (compare with the second conclusion in Theorem 2.13 and Remark 2.14).
Remark 4.15. Note that for β as of Example 4.4 the above equations simplify considerably. In particular β * in (4.21) is simply the identity.
Proof. We apply Theorem 3.2 and consider a sequence of generalized Feller semigroups (P n ) n∈N with generators A n corresponding to the solution λ n of (4.12) for ε = 1 n , and compensator Tr β(λ n t ) Let us first establish a uniform growth bound for this sequence. To this end denote Note that for the solution of (4.12), we have due to Proposition 4.11 (ii) the following estimate for t ∈ [0, T ] for some fixed T > 0 As a consequence of Itô's isometry the martingale part can be estimated by and K some other constant. Moreover, for the last terms we have where C = µ(dξ) C. Putting this together, we obtain where C 0 and C 2 depend on T . We use S * t λ 0 2 ≤ C 0 λ 0 2 for t ∈ [0, T ], as well as S * t−s+ 1 n ν Y * ≤ C S * t−s ν Y * for some constant C and all n ∈ N due to strong continuity. Exactly by the same arguments as in the proof of Proposition 4.7 , we thus obtain for t ∈ [0, T ] for some fixed T From this the desired uniform growth bound P t L(B ̺ (E)) ≤ M exp(ωt) for some M ≥ 1 and ω ∈ R follows.
For the set D as of Theorem 3.2 we here choose Fourier basis elements of the form such that y ∈ E * and λ → exp( y, λ ) lies in ∩ n≥1 dom(A n ), whose span is dense, whence (i) of Theorem 3.2. Here, A n denotes the generator corresponding to (4.12) with ε = 1 n and µ replaced by F n . We now equip span(D) with the uniform norm · ∞ and verify Condition (ii), i.e. we check for all 0 ≤ u ≤ t with a nm → 0 as n, m → ∞, and possibly depending on y. Note that A n f y (λ) = R n (y), λ f y (λ), where R n corresponds to (4.13) for ε = 1 n and µ replaced by F n . As P n leaves D invariant for all n ∈ N by Proposition 4.11 (iv), we have Here, y m u denotes the solution of ∂ t y m u = R m (y m t ) at time u with y 0 = y. Moreover a 1 nm (ξ) and a 2 nm can be chosen uniformly for all u ≤ t and tend to 0 as n, m → ∞. This is possible since for the chosen initial values y we obtain that y m u is bounded on compact intervals in time uniformly in m (see [9] for details). This together with dominated convergence for the first term (note that b nm (ξ) a 1 nm (ξ) can be bounded by ξ ∧ 1) we thus infer (4.26). The conditions of Theorem 3.2 are therefore satisfied and we obtain a generalized Feller semigroup whose generator is given by (4.20).
For the second assertion we proceed as in the proof of Proposition 4.11, the proof of the existence of X can be transferred verbatim. However, one looses the existence of càglàd paths of f n (λ) due to the possible lack of finite mass of ν. Here, we only obtain càg trajectories (compare with Remark 2.14 and Remark 4.14).
Concerning the third assertion, the affine transform formula follows simply from the convergence of the semigroups P n as asserted in Theorem 3.2 by setting y t = lim n→∞ y n t , where y n t solves ∂ t y n t = R n (y n t ) in the mild sense with R n given again by (4.13) with ε = 1 n and µ replaced by F n . Since exp( y t , λ ) is then also the unique solution of the abstract Cauchy problem for initial value exp( y 0 , λ ), i.e. it solves ∂ t u(t, λ) = Au(t, λ), u(0, λ) = exp( y 0 , λ ), where A denotes the generator (4.20), we infer that y t satisfies ∂ t y t = R(y t ) with R given by (4.22). This is because A exp( y t , λ ) = exp( y t , λ )R(y t ).
The fourth claim follows from statement (ii), property (4.5) and the definition of K in (4.6).
Finally to prove (v), note that due to (iv) and the definition of the adjoint operator β * we have Tr(uV t ) = Tr(uβ(λ t )) = β * (u), λ t . Hence, by definition of R, R and h, we find

Statement (iii) therefore implies that
Tr(R( y s , ν )h(t − s))ds (4.28) From this and (4.27) it is easily seen that we can replace y s , ν in (4.28) by a solution of the following Volterra Riccati equation Note that we do not need to symmetrize here since we apply the trace and h is symmetric. This proves the assertion.
The following example illustrates how a multivariate Hawkes process can easily be defined by means of (4.18).
Example 4.16. Let β and S * be as of Example 4.4. Define µ ii (dξ) = δ eii (dξ) and µ ij = 0 for i = j. Then the Volterra equation as of (4.23) is given by Only the diagonal components of the matrix valued process N jump and we can define N := diag(N ) which is a process with values in N d 0 . Its components jump by one and the compensator of N ii = N i is given by · 0 V s,ii ds, which justifies the name multivariate Hawkes process. Note that the components of V are not independent if ν and in turn K is not diagonal.

Squares of matrix valued Volterra OU processes
As in the finite dimensional setting squares of Gaussian processes provide us with important process classes for financial and statistical modeling. In this section we outline this program in utmost generality from a stochastic and analytic point of view. In particular we consider continuous affine Volterra type processes on S d + , which we construct as squares of matrix-valued Volterra Ornstein-Uhlenbeck (OU) processes (see Remark 5.4). Following the finite dimensional analogon [6], we start by considering matrix measure-valued OU-processes of the form The underlying Banach space, denoted by Y * (R n×d ), is the space of finite R n×dvalued regular Borel measures on the extended half real line where · Y * (R n×d ) denotes the total variation norm, this becomes a weighted space. Moreover, A * is the generator of a strongly continuous semigroup S * on Y * (R n×d ), which satisfies a property analogous to (4.3), i.e., for elements A ∈ R n×d it holds that S * The process W is a n × d matrix of Brownian motions and ν ∈ Y * =: Y * (S d ) or Z * , as defined in Section 4 such that Assumption 4.3 holds true. The predual space denoted by Y (R n×d ) is given by C b (R + , R n×d ) functions, where we fix the pairing ·, · as follows Again Tr denotes the trace. We assume that all relevant properties from Assumption 4.1 are translated to the current setting.
Remark 5.1. Observe the analogy to the process γ defined in the introduction. If A * = 0 and ν is supported on a finite space with k points, then (5.1) is exactly the process from the introduction.
Proposition 5.2. For every γ 0 ∈ Y * (R n×d ) the SPDE (5.1) has a solution given by a generalized Feller semigroup on B ̺ (Y * (R n×d )) associated to the generator of (5.1). The mild formulation directly yields a stochastically strong solution where order matters, i.e. the matrix Brownian increment is applied to S * t−s ν(dx) on the left. The integral is understood in the weak sense, i.e. after pairing with y ∈ Y (R n×d ).
Proof. The construction of the generalized Feller process can be done by jump approximation of the Brownian motion similarly as in [9,Theorem 4.16]. Notice here that we consider the process on the whole space Y * (R n×d ). So no issues with state space constraints occur.
The right hand side of the stochastically strong formulation defines -after pairing with y ∈ Y (R n×d ) -almost surely a continuous linear functional with value y, S * t γ 0 + t 0 y, dW s S * t−s ν , since the integrand of the stochastic integral is deterministic and in L 2 for each t ≥ 0.
The corresponding contracted, i.e. one matrix multiplication is performed, algebraic tensor product is denoted by Y * (R n×d ) ⊗Y * (R n×d ) and we set This corresponds to the space of finite S d + -valued, rank n, product measures on R + × R + . We shall introduce a particular dual topology on E, namely σ( E, Y ⊗ Y ), where the corresponding pairing is given by We denote the pre-dual cone by where we use again the contracted algebraic tensor product corresponding to the following matrix multiplication of R n×d valued functions (y ⊗y)(·, ·) = y ⊤ (·)y(·), y ∈ Y (R n×d ) .
The minus on the left hand side of (5.4) is to obtain elements in the polar cone.
Let us now define the actual process of interest, namely Note again the analogy to the Wishart process λ defined in the introduction. The process (5.5) clearly takes values in E as defined in (5.3). We will now show that we can define a Volterra type process by considering projections on S d + . Applying Itô's formula, we see that λ t (dx 1 , dx 2 ) satisfies the following equation where A * 1 λ t (dx 1 , dx 2 ) = A * λ t (·, dx 2 )(dx 1 ) and analogously for A * 2 . Note that for A * = 0 this is completely analogous to (1.3).
By a lot of abuse of notation, but parallel with [6] and Equation (1.4)-(1.5), we can also write where heuristically B(dx, dy) is d×d matrix of Brownian fields. We shall not develop a framework where this notation makes sense, but continue with proving that λ is actually a generalized Feller process, which should be considered the correct infinite dimensional version of a Wishart process. By only a slight abuse of notation, we understand A * , and in the sequel also S * and other linear operators, as operators acting on both S d -valued measures as well as R d×n -valued or R n×d -valued ones as in (5.1). The mild formulation of (5.6), denoting the semigroup generated by A * 1 + A * 2 by S * , ⊗ t , then reads as where the second equality follows from property (5.2). Let now β be a linear operator from Y * (F ) to F where F stands here for R n×d , or S d with the property that for a constant matrix A with appropriate matrix dimensions we have β(Aγ(·)) = Aβ(γ(·)), β(γ(·)A) = β(γ(·))A. (5.8) By means of β, define now an operator β acting on R d×d valued product measures as follows β(γ ⊤ 1 (·)γ 2 (·)) = β(γ 1 (·)) ⊤ β(γ 2 (·)), (5.9) where γ 1 and γ 2 are either in Y * (R n×d ) or in Y * (S d ) (in the latter case the transpose is not needed). Note that (5.9) implies that β(γ ⊤ (·)γ(·)) is S d + -valued. Applying β to λ we find Defining as in Equation (4.6) an S d -valued kernel via , we obtain the following generalized S d + -valued Volterra equation which we call Volterra Wishart process in the following definition.
(i) Note that β(γ t ) defines an R n×d -valued Volterra OU process, that is, By the definition of β, the Volterra Wishart process V t = β(λ t ) = β(γ t (·)) ⊤ β(γ t (·)) = X ⊤ t X t is thus the matrix square of a Volterra OU process, which justifies the terminology. (ii) Note that different lifts of the Volterra OU process given in (5.11) are possible, e.g. the forward process lift f t (x) := E[X t+x |F t ]. Then, f t (0) = X t and similarly as in [9, Section 5.2] it can be shown that f is an infinite dimensional OU process that solves the following SPDE (in the mild sense) where α > 0 denotes a weight function (compare [15]). We can then set λ t (x, y) = f ⊤ t (x)f t (y) and define the same Volterra Wishart process as in (5.10) by V t := λ t (0, 0) = X ⊤ t X t . By Itô's formula and variation of constants its dynamics can then equivalently be expressed via (5.12) Comparing (5.12) and (5.10) yields x, t ≥ 0. (5.13) (iii) In the case when β and S * are as in Example 4.4, (5.10) reads as Hence by (5.13), This yields exactly equation (1.6) considered in the introduction. Note that if ν and in turn K is chosen as in Remark 4.5, this Volterra Wishart process has exactly the roughness properties desired in rough covariance modeling.
In the following remark we list several properties of Volterra Wishart processes.
(i) Note that the marginals of V are Wishart distributed as they arise from squares of Gaussians.
(ii) In order to bring (5.6) in a "standard" Wishart form (with the matrix square root) as in (1.1) by replacing γ(dx) by √ λ(dx, dy) new notation has to be introduced (compare with (5.7)). (iii) Nevertheless, both the drift and the diffusion characteristic of λ depend linearly only on λ, e.g.
Using Theorem 2.8 we now show that λ is a generalized Feller process on ( E, ̺) with weight function ̺ satisfying ̺(γ ⊗γ) = ̺(γ). (5.14) We also prove that this generalized Feller process is affine, in the sense that its Laplace transform is exponentially affine in the initial value. The process λ can therefore be viewed as an infinite dimensional Wishart process on E analogously to [6,8].
Proof. We apply Theorem 2.8 and Corollary 2.11 with Observe that this is a continuous map, since we use the dual topology σ( E, Y ⊗ Y ) on E and the respective polar E * defined by (5.4). Consider now the following set of Fourier basis elements by the very definition of the dual topology. We check now that the generalized Feller semigroup P (OU) corresponding to (5.1) satisfies Assumption (2.8) for f ∈ D , i.e. for every f ∈ D there exists some g such that Hence we need to compute E γ0 exp − y ⊗y, γ t ⊗γ t . By Lemma 5.7 this expression is given by (5.17). Therefore (5.16) is clearly satisfied. This proves the first assertion. Concerning the affine property, we can deduce from Lemma 5.7 that ψ and φ are given by with q t given in Lemma 5.7. Taking derivatives then leads to the form of the Riccati differential equations.
The following lemma provides an explict expression for the Laplace transform of γ t ⊗γ t . This ressembles not surprisingly the Laplace transfrom of a non-central Wishart distribution with n degrees of freedom.
Note now that y ⊗y, γ 0 ⊗W t ν = Tr W t For general n, note that we can write where the W j are the rows of W and thus take values in R 1×d . Similary The general case for A * = 0 can now be traced back to this situation. Indeed, by the variation of constants formula, γ t is given by Therefore we need to replace bt by S * t−s ν(dx 1 )y ⊤ (x 1 )y(x 2 )S * t−s ν(dx 2 )ds and γ 0 by S * t γ 0 . This then yields (5.17). Note that this now holds for general y ∈ Y (R n×d ) even if ∞ 0 y(x)ν(dx) is not necessarily well defined.

(Rough) Volterra type affine covariance models
The goal of this section is to apply the above constructed affine covariance models for multivariate stochastic volatility models with d assets. We exemplify this with the Volterra Wishart process of Section 5 and define a (rough) multivariate Volterra Heston type model with possible jumps in the price process. Roughness can be achieved by specifing ν and in turn the kernel of the Volterra Wishart process as in Remark 4.5. The log-price process denoted by P and taking values in R d evolves according to where X t denotes the Volterra OU process defined in Remark 5.4, 1 the vector in R d with all entries being 1 and e ξ has to be understood componentwise. Moreover, B t is an R n -valued Brownian motion, which can be correlated with the matrix Brownian motion W appearing in (5.1) as follows Here, B t is an R n -valued Brownian motion independent of W and ̺ ∈ R d . Moreover, µ P denotes the random measure of the jumps with compensator Tr(V m(dξ)), where V is the Volterra Wishart process of (5.10) and m a positive semi-definite measure supported on R d . As a corollary of Section 5 and [7, Section 5] we obtain the following result, namely that the log-price process together with the infinite dimensional Wishart process λ given in (5.5) is an affine Markov process.
Before formulating the precise statement, note that the continuous covariation 1 P i , λ kl (dx 1 , dx 2 ) t is given by where γ is the infinite dimensional OU-process of (5.1). Note that β ⊤ (γ t )γ t (dx 1 ) can also be written as linear map from E → Y * (S d ) which we denote be β, i.e.
Remark 6.2. In a similar spirit one can define multivariate affine covariance models with the affine Volterra jump process V given in (4.23). The log-price process (under some risk neutral measure) evolves then according to where B is a d-dimensional Brownian motion and the jump measure m of P and µ of the Markovian lift λ as given in (4.17) can be the marginals of some common measure supported on S d + × R d .