1 Introduction

The signature of a path \(X :[0,T] \rightarrow {\mathbb {R}}^d\),

$$\begin{aligned} \mathcal {S}(X)_{0T} {:}{=}\sum _{n = 0}^\infty \int _{0< u_1< \ldots< u_n < T} \text {d}X_{u_1} \otimes \cdots \otimes \text {d}X_{u_n} \quad \in T(\!({\mathbb {R}}^d)\!)\text {,} \end{aligned}$$
(1)

is a series of tensors which, up to “retracings”, determines the image of X [6, 22]. The probabilistic counterpart to this result states that, in many cases of interest, the law of a stochastic process is determined by its expected signature [13], which is therefore seen to play a role for processes analogous to that of moments for random variables.

The best-known example of an explicit formula for the expected signature of a stochastic process occurs in the case of Brownian motion: calling \(\{e_1,\ldots ,e_d\}\) the canonical basis of \({\mathbb {R}}^d\), we have

$$\begin{aligned} {\mathbb {E}} \mathcal {S}(X)_{st} = \exp \bigg ( \frac{t-s}{2} \sum _{\gamma = 1}^d e_\gamma ^{\otimes 2}\bigg ) = \sum _{n = 0}^\infty \frac{(t-s)^n}{2^n n!} \sum _{\gamma _1,\ldots ,\gamma _n = 1}^d e_{\gamma _1}^{\otimes 2} \otimes \cdots \otimes e_{\gamma _n}^{\otimes 2}. \end{aligned}$$
(2)

This identity was first shown by [16, 31], and later proved in a variety of different ways [2, 20]. The expected signature of Brownian motion has also been studied in the case in which the process is stopped upon hitting the boundary of a domain [5, 27, 29].

In [3] the authors derive an integral expression for the expected signature of fractional Brownian motion (fBm) with Hurst parameter \(H \in (1/2,1)\). This result was extended in [4, 7] to a more general class of Gaussian Volterra processes with sample paths that are more regular than Brownian motion, with the formula for the expected signature written in terms of the Volterra kernel. The method used involves a piecewise-linear interpolation of the paths of the process X, which reduces the calculation to that of a sum of mixed Gaussian moments, to which Wick’s theorem applies, followed by a convergence argument. The expression in [3] does not, however, yield the correct prediction for the case of Brownian motion \(H = 1/2\). When \(H < 1/2\) it involves integrals that do not converge at all, and new ideas are needed to obtain a formula. On a technical level, the reason for these differences can be seen by considering the expression for the expected signature of a scalar \(1/2 < H\)-fBm X at level 2: calling \(R(s,t) {:}{=}{\mathbb {E}}[X_sX_t]\) the covariance function of X, the formula states that

$$\begin{aligned} {\mathbb {E}} \mathcal {S}(X)^{(2)}_{st} = \int _{s<u<v<t} R(\text {d}u, \text {d}v) = H(2H-1) \int _{s<u<v<t} (v-u)^{2H-2} \text {d}u \text {d}v. \end{aligned}$$
(3)

Integrating either of the two variables generates an evaluation \((v-u)^{2H-1}|_{u = v}\), which is only finite when \(H > 1/2\) and indeterminate when \(H = 1/2\). In fact, approximating X with a sequence of piecewise linear processes \((X^\ell )_{\ell \in {\mathbb {N}}}\) one obtains a sequence of integrals (actually finite sums) \(\int _{s< u< v < t} {\mathbb {E}}[\dot{X}_u^\ell \dot{X}^\ell _v] \text {d}u \text {d}v\) which converges to the above double integral when \(H > 1/2\), to \((t-s)/2\) when \(H = 1/2\) (as predicted by (2)), and continues to converge to \((t-s)^{2H}/2\) for \(1/4 < H \le 1/2\). When \(H \le 1/4\) the iterated integrals (in particular the Lévy area) of smooth approximations of X do not converge in mean square, and other techniques (e.g. [36]) must be relied upon to define a rough path, and hence a signature. These rough paths present a number of differences with the canonical one defined for \(H > 1/4\), and are therefore not considered in this paper.

Fig. 1
figure 1

Here we compare the two behaviours, corresponding to \(H >1/2\) and \(H<1/2\), of \(\int _{0< u< v < 1} {\mathbb {E}}[\dot{X}_u^\ell \dot{X}^\ell _v] \text {d}u \text {d}v\) with \(X^\ell \) the sequence piecewise linear interpolations of X on a partition. On the left we have chosen \(H = 2/3\), and the sequence of integrals converges to a finite improper integral, whereas on the right \(H = 1/3\) and the on- and off-diagonal contributions diverge to opposite infinities. (The plots are oriented in different ways and the z-axis is rescaled, both for improved visibility.) This graphic has been created using Wolfram Mathematica

What is needed to obtain a formula for the expected signature that also works in the case of negatively-correlated increments \(1/4< H < 1/2\) is a way of expressing the indeterminacy “\(\infty - \infty \)” explained in Fig. 1. The trick for doing this is simple to describe: integrate out the first variable in (3) and, calling \(R(t) {:}{=}R(t,t)\) the variance function of X, note that for \(H > 1/2\) we have

$$\begin{aligned}{} & {} \int _{s<u<v<t} \partial _{12} R(u, v) \text {d}u \text {d}v \nonumber \\{} & {} \quad = \int _s^t \big [ \partial _2 R(v, v) - \partial _2 R(s,v) \big ] \text {d}v = \int _s^t \big [ \tfrac{1}{2} R'(v) - \partial _2 R(s,v) \big ] \text {d}v . \end{aligned}$$
(4)

We have replaced \(\partial _2 R(v, v)\) with \(\frac{1}{2} R'(v)\), which can be done by symmetry of R:

$$\begin{aligned} R'(v) = \frac{\text {d}}{\text {d}v} R(v,v) = \partial _1 R(v,v) + \partial _2 R(v,v) = 2\partial _2 R(v,v) . \end{aligned}$$
(5)

This is relevant to the case of \((1/4,1/2) \ni H\)-fBm since, while \(\partial _2 R(v,v)\) or \(\partial _1 R(v,v)\) is the infinite evaluation discussed earlier, the last integral in (4) is perfectly well defined. These integrands can be chained together on simplices, e.g. \(\int _{s< u< v < t} [\tfrac{1}{2} R'(u) - \partial _2 R(s,u)][\tfrac{1}{2} R'(v) - \partial _2 R(u,v)] \text {d}u \text {d}v\), and combined with the other types of integrand \(\partial _{12} R(w, z)\), to yield a formula that is very similar to that of [3], but continues to be convergent for \(1/4< H < 1/2\) and agrees with (2) for \(H = 1/2\).

Showing that the formula obtained by such substitution actually coincides with the expected signature for X in a broad class of Gaussian processes—essentially those Gaussian rough paths introduced in [15, 19, 30] with the imposition of a few additional smoothness and regularity requirements on the (co)variance function—is the main focus of this paper. In fact, our main result will prove a formula for the full Wiener chaos expansion of \(\mathcal {S}(X)\), the 0th level of which is the expectation. As far as we know, the expression for the positive chaos projections of the signature is not to be found in the literature even in the classical case of Brownian motion. While the expression of the positive levels of Wiener chaos is very similar in spirit to that of the 0th, it requires us to use some Malliavin calculus in the setting of 1-parameter Gaussian processes, and results in technical complications in the proof of convergence. The main additional ingredients needed are Stroock’s formula for the m-th Wiener chaos projection and a novel definition of multiple Wiener integral of a function. For the latter, it should be noted that while multivariate, deterministic integrands for Gaussian noise naturally live in a certain Hilbert space (which for fBm can be identified with a Sobolev space), we are interested in integrating functions of multiple times, i.e. \(\int _{[0,T]^m} f(t_1,\ldots ,t_m) \text {d}X^{\gamma _1}_{t_1} \cdots \text {d}X^{\gamma _m}_{t_m}\) in a Skorokhod-type sense: this is achieved by approximating f with elementary integrands, and showing independence of the approximation. Computing the Wiener chaos projections of the signature of a Gaussian process X has the benefit of expressing \(\mathcal {S}(X)\) as a sum of terms that are orthogonal in \(L^2\), something that has the potential to be used for various types of numerical calculations, e.g. estimates of Euler expansions for Gaussian rough differential equations. It should be mentioned that, while (in the cases considered) the expected signature already determines the law of X and therefore that of the Wiener chaos projections of \(\mathcal {S}(X)\), it does not appear obvious how one may obtain the latter from the former directly. While fBm is the main example of a process for which our calculation is novel, we briefly also consider centred, continuous Gaussian semimartingales, such as the Brownian bridge returning to the origin and centred Ornstein–Uhlenbeck processes with deterministic initial condition.

As in the main reference article [3], the technique that underlies our proof is piecewise-linear approximation of X. The arguments needed to prove the result are however much more involved, for three essential reasons. First is the fact that we must perform and justify the substitution (4), which requires novel arguments for convergence; even proving finiteness of the integrals in the main formula requires more sophisticated bounds in the \(1/4< H < 1/2\) case than it does in the \(H > 1/2\) case (see Fig. 2 for the simplest example of an observation that must be made when \(H < 1/2\)). Second is that Malliavin derivatives are involved for positive levels of the Wiener chaos and third is that our arguments must accommodate a wider class of Gaussian processes.

Fig. 2
figure 2

A graphic representing the contour plot of \((t-s)^{2H-2}\) on \(\{0< s< t < 1\}\) (on the left) and \(\{0< s< u< t < 1\}\) with \(u \in [0,1]\) fixed (on the right): the integral of the former is improper on the whole diagonal, while that of the later only at a point: when \(0< H < 1/2\), only the latter converges. This graphic has been created using Wolfram Mathematica

While the substitution (4) may seem very natural, it does not emerge obviously from the proof that we have given here, and must instead be guessed in advance. Indeed, it is worth mentioning that the way in which we first derived the statement of the main result involved an entirely different approach, which made use of the Skorokhod-rough integral conversion formula [10, 11], applied recursively to the RDE for the signature. The outline of this proof can be found in the second named author’s PhD thesis [17, Ch. 5]. While this approach has the drawback of generating further technical problems, reason for which it is not the one presented here, it has the advantage of leading up constructively to the main formula.

This paper is organised as follows: in Sect. 1 we briefly introduce the class of Gaussian processes considered and the Malliavin calculus framework for them; we then use this language to identify functions as multiple Wiener integrands. In Sect. 2 we state the main result Theorem 2.3 and discuss a few consequences and examples that follow; in Sect. 3 we prove the main result; in Conclusions and further directions we outline some aspects that could be tackled in further research. Finally, it should be mentioned that in [3], in addition to the expected signature of \(1/2 < H\)-fBm, the authors also compute the expected signature at levels 2 and 4 for \(1/4 < H\)-fBm in a manner that does not obviously generalise to different processes or higher levels; while not necessary in our proofs, it is sensible to verify that our main result agrees with this calculation: this check is performed in “Appendix A”.

2 Background on Malliavin calculus for Gaussian processes

In this section we introduce the class of Gaussian processes to which this paper applies, establish some notation, and give a brief overview of the tools of Malliavin calculus that are necessary in the proof of the main result. We follow [34, 35] for the general Malliavin calculus framework, [23] for its aspects that pertain to Gaussian processes indexed by a time parameter, and [9,10,11] for aspects regarding the rough path lifts of such processes.

Throughout this paper we will be working with a Gaussian process with i.i.d. components \(X :\Omega \times [0,T] \rightarrow {\mathbb {R}}^d\) where \(\Omega = C([0,T],{\mathbb {R}}^d)\), \(X_t(\omega ) {:}{=}\omega (t)\), \({\mathcal {F}}_t {:}{=}\sigma (X_{s}: 0 \le s \le t)\). We assume X to be centred, i.e. \({\mathbb {E}} X \equiv 0\), and for it to have deterministic initial condition \(X_0 = 0\). We will write \(X_{st} {:}{=}X_t - X_s\) for the increments of X. By Gaussianity, the probability measure \({\mathbb {P}}\) on \(\Omega \) is characterised by the covariance function of X

$$\begin{aligned} R :[0,T]^2 \rightarrow {\mathbb {R}}^d \otimes {\mathbb {R}}^d, \quad R(s,t) {:}{=}{\mathbb {E}}[X_s \otimes X_t]. \end{aligned}$$
(6)

We will denote \(R(\,\cdot \,)\) the variance function of X, i.e. \(R(t) {:}{=}R(t,t)\). The independence hypothesis implies that R is a diagonal matrix \(R^{\alpha \beta } = \updelta ^{\alpha \beta } R^{\alpha \alpha }\), and the fact that they are identically distributed, \(R^{\alpha \beta } = \updelta ^{\alpha \beta }R^{11}\) will be determined by a single scalar function, which by abuse of notation we will also call R. Although our results can be conjectured to continue to hold in the case in which the components are not identically distributed, our proof will make essential use of this assumption. We define

$$\begin{aligned} \begin{aligned} R(\Delta (s,t))&{:}{=}R(t) - R(s) \\ R(\Delta (s,t),v)&{:}{=}R(t,v) - R(s,v) = {\mathbb {E}}[X_{st} \otimes X_v] \\ R(\Delta (s,t),\Delta (u,v))&{:}{=}R(t,v) + R(s,u) - R(t,u) - R(s,v)= {\mathbb {E}}[X_{st} \otimes X_{uv}] \end{aligned} \end{aligned}$$
(7)

for \(u,v,s,t \in [0,T]\). Note that \(R(\Delta (s,t)) \ne R(\Delta (s,t),\Delta (s,t))\).

We assume X and R satisfy the conditions that make it possible to consider the signature of X, \(\mathcal {S}(X)\), defined by the limit in \(L^2\) of Stieltjes iterated integrals of smooth or piecewise-linear approximations of X, and carry out Malliavin calculus: these are existence of rough path lift and complementary Cameron-Martin regularity [9, Conditions 2] and non-degeneracy of R [9, Conditions 3]. More elementary conditions that imply these may be found, for instance, in [10, 11]. The expected signatures of such processes characterise their law, i.e. if Y is any other process with a well-defined signature \(\mathcal {S}(Y)_{0T}\) (as a \(\mathcal {G}({\mathbb {R}}^d)\)-valued random variable) and \({\mathbb {E}} \mathcal {S}(X)_{0T} = {\mathbb {E}}\mathcal {S}(Y)_{0T}\), then X and Y are equal in law: see [13, Example 6.7], a consequence, among other things, of the greedy estimate [12]. We refer the reader to [14] for a treatment of the theory in the case of more general processes, whose expected signatures may not directly characterise the law of the process.

We will denote \(\mathcal {S}^N(X)\) the signature of X truncated at level N (i.e. its projection onto \(\bigoplus _{n = 0}^N ({\mathbb {R}}^d)^{\otimes n}\)) and \(\mathcal {S}(X)^{(n)}\) the n-th level of the signature (i.e. its projection onto \(({\mathbb {R}}^d)^{\otimes n}\)). The signature of a process, as that of a path, satisfies two important algebraic relations. The first is the Chen identity, namely that \(\mathcal {S}(X)_{su} \otimes \mathcal {S}(X)_{ut} = \mathcal {S}(X)_{st}\). The second is the shuffle identity: letting \(\{e_1,\ldots ,e_d\}\) denote the canonical basis of \({\mathbb {R}}^d\), and using coordinate notation, i.e. \(S^{\gamma _1\ldots \gamma _n} {:}{=}\langle e_{\gamma _1} \otimes \cdots \otimes e_{\gamma _n}, S\rangle \) for \(S \in T(\!({\mathbb {R}}^d)\!)\) and \(\gamma _1,\ldots ,\gamma _n \in [d] {:}{=}\{1,\ldots ,d\}\) (and extending linearly), for \(0 \le s \le t \le T\) it holds that

(8)

where denotes “shuffling” the tuples \(\alpha _1\ldots \alpha _m\) and \(\beta _1\ldots \beta _n\), i.e. summing over all ways of permuting their concatenation \(\alpha _1\ldots \alpha _m\beta _1\ldots \beta _n\) whilst preserving the order of each. For further details see, for example, [28].

In addition to the standard conditions on R, we will have to assume a certain amount of smoothness of R together with bounds on its derivatives; the reasons for such hypotheses will be made clear in due course. We assume \(R(\,\cdot ,\,\cdot \,)\) is \(C^2\) on the open simplex \(\Delta [s,t] {:}{=}\{0< s< t < T\}\) and continuous on \([0,T]^2\), and that \(R(\,\cdot \,)\) is \(C^1\) on (0, T). The lack of smoothness assumptions of \(R(\,\cdot ,\,\cdot \,)\) on the diagonal \(\{s = t\}\) is crucial for the inclusion of \((1/4,1/2] \ni H\)-fBm, which does not even have first partial derivatives on it. Furthermore, we assume there exists an \(H \in (0,1)\) with the property that the sample paths of X are either H-Hölder, or are K-Hölder for all \(K < H\); for fBm H will coincide with the Hurst parameter, but the letter H will be used for more general processes to denote the Hölder exponent/supremum of exponents. This the rough path above X will be of finite 1/H-variation or of finite p-variation for all \(p> 1/H\).

We also need some quantitative estimates on the derivatives of R. Here and throughout the paper, the constant of proportionality implied by the use of \(\lesssim \) may only depend on TH and other general characteristics of the process X. We require

$$\begin{aligned} |\partial _{12}R(s,t)|&\lesssim (t-s)^{2H-2} \qquad \text {on } \Delta [0,T] \end{aligned}$$
(9)
$$\begin{aligned} \big |\tfrac{1}{2} R'(t) - \partial _2R(s,t)\big |&\lesssim (t-s)^{2H-1} \qquad \text {on } [0,T]^2 \setminus \{s = t\} \end{aligned}$$
(10)
$$\begin{aligned} |R'(t)|&\lesssim t^{2H-1} \qquad \qquad \ \ \hspace{0.05em} \text {on } (0,T] \end{aligned}$$
(11)

where \(\partial _2\) denotes partial differentiation w.r.t. the second component and \(\partial _{12}\) denotes second-order mixed partial differentiation. Since R is not smooth on the diagonal, the following estimate for on-diagonal square increments of the covariance function, which already appeared in [15], must be required separately:

$$\begin{aligned} R(\Delta (s,t), \Delta (s,t)) \lesssim (t-s)^{2H} \end{aligned}$$
(12)

We move onto the treatment of Malliavin calculus for X. We let \({\mathcal {H}}\) be the Hilbert space given by the completion of the following \({\mathbb {R}}\)-linear span of elementary functions \([0,T] \rightarrow {\mathbb {R}}^d\), or equivalently \([0,T] \times [d] \rightarrow {\mathbb {R}}\):

$$\begin{aligned} {\mathcal {E}} {:}{=}\, \text {span}_{\mathbb {R}}\{ {\mathbb {1}}^\gamma _{[0,t)} \mid t \in [0,T], \ \gamma = 1,\ldots , d\} \end{aligned}$$
(13)

w.r.t. the inner product

$$\begin{aligned} \langle \mathbb {1}^\alpha _{[0,s)}, {\mathbb {1}}^\beta _{[0,t)} \rangle _\mathcal {H} {:}{=}R^{\alpha \beta }(s,t). \end{aligned}$$
(14)

Because of independence of components, \({\mathcal {H}}\) is equal to an orthogonal direct sum \({\mathcal {H}}^1 \oplus \ldots \oplus {\mathcal {H}}^d\), and because of equal distribution the direct summands are all equal. Elements of \({\mathcal {H}}\) should be viewed as admissible deterministic integrands for \(\text {d}X\), which are represented as Cauchy sequences of elementary integrands in \({\mathcal {E}}\). This framework allows us to view the process as an isometry

$$\begin{aligned} X :{\mathcal {H}} \rightarrow L^2\Omega ,\quad {\mathbb {1}}^\gamma _{[0,t)} \mapsto X^\gamma _t \end{aligned}$$
(15)

often called an isonormal Gaussian process.

The multiple Wiener integral

$$\begin{aligned} \delta ^m :{\mathcal {H}}^{\odot m} \rightarrow L^2\Omega \end{aligned}$$
(16)

is the operator defined by the adjoint property (which more generally characterises the divergence operator, when required on random arguments f)

$$\begin{aligned} \forall Z \in {\mathbb {D}}^{m,2} \quad E[ Z \delta ^m(f) ] = {\mathbb {E}}[\langle \mathcal {D}^m Z, f \rangle _{{\mathcal {H}}^{\otimes m}}] \end{aligned}$$
(17)

where

$$\begin{aligned} \mathcal {D}^m :{\mathbb {D}}^{m,2} \rightarrow L^2(\Omega ,{\mathcal {H}}^{\odot m}) \end{aligned}$$
(18)

is the mth Malliavin derivative, defined as

$$\begin{aligned} \mathcal {D}^m f(X^{\gamma _1}_{t_1}, \ldots , X^{\gamma _n}_{t_n}) {:}{=}\sum _{k_1,\ldots ,k_m = 1}^n \partial _{k_1, \ldots ,k_m} f(X^{\gamma _1}_{t_1}, \ldots , X^{\gamma _n}_{t_n}) \mathbb {1}_{[0,t_{k_{1}})}^{\gamma _{k_{1}}} \otimes \cdots \otimes \mathbb {1}_{[0,t_{k_{n}})}^{\gamma _{k_{n}}}\nonumber \\ \end{aligned}$$
(19)

for \(f \in C^\infty ({\mathbb {R}}^n)\) with derivatives (including the 0th) of polynomial growth, and extended as a closed operator to a certain domain \({\mathbb {D}}^{m,2}\). \({\mathcal {H}}^{\odot m}\) denotes the subspace of \({\mathcal {H}}^{\otimes m}\) (the tensor product taken in the category of Hilbert spaces) of symmetric tensors. \(\mathcal {D}^m\) takes a square-integrable random variable and returns a random element of \({\mathcal {H}}^{\odot m}\), which in case of membership to \({\mathcal {E}}^{\odot m}\) (or otherwise a function member of \({\mathcal {H}}^{\odot m}\) in the sense of Definition 1.1 below) will be a function of m (time, index) pairs. Note that, while \(\delta \) is symmetric in the sense that it is left invariant by permuting (time, index) jointly, it is not symmetric if only time variables or indices are permuted (e.g. it is possible to use \(\delta \) to define a Lévy area—see Example 2.6 below). When \(\mathcal {D}^mZ\) is a function, as in the case (19), we denote its evaluation on m (time, index) pairs \(\mathcal {D}_{(u_1,\gamma _1),\ldots ,(u_m,\gamma _m)}Z\); occasionally it may make more sense to suppress the indices in the notation, in which case we can just write \(\mathcal {D}_{u_1,\ldots ,u_m}Z\). We may extend \(\delta \) to a map \(\delta ^m {:}{=}{\mathcal {H}}^{\otimes m} \rightarrow L^2\Omega \) by pre-composing with symmetrisation, and we have for \(f,g \in {\mathcal {H}}^{\otimes m}\)

$$\begin{aligned} {\mathbb {E}}[\delta ^m(f) \delta ^n(g)]= \updelta ^{mn} \sum _{\sigma \in {\mathfrak {S}}_m} \langle f, \sigma _*g \rangle _{{\mathcal {H}}^{\otimes m}}. \end{aligned}$$
(20)

This implies that multiple Wiener integration defines an isometry

$$\begin{aligned} \delta ^{\scriptscriptstyle \bullet } :\bigoplus _{m = 0}^\infty {\mathcal {H}}^{\odot m} \xrightarrow {\cong } L^2 \Omega \end{aligned}$$
(21)

where the source is given the degree-wise rescaled inner product \((f,g) \mapsto m!\langle f,g \rangle _{{\mathcal {H}}^{\otimes m}}\) for fg of the same degree and zero otherwise, and \(\Omega \) is endowed with the sigma algebra generated by the process \(X_{t \in [0,T]}\). The image of the m-th Wiener integral operator, the space of the random variables \(\delta ^m(f)\) with f ranging in \({\mathcal {H}}^{\odot m}\), is called the m-th Wiener chaos of X. We denote it \({\mathscr {W}}^m\) and the m-th Wiener chaos projection \({\mathcalligra{w}}^m :L^2 \Omega \twoheadrightarrow {\mathscr {W}}^m\). Note that \({\mathcalligra{w}}^0 = {\mathbb {E}}\) with values in \({\mathscr {W}}^0 = {\mathbb {R}}\), while \({\mathscr {W}}^1\) is given by linear functionals of X. We thus have the Wiener chaos decomposition \(L^2 \Omega = \bigoplus _{m = 0}^\infty {\mathscr {W}}^m\) which means it is possible to represent any random variable in \(L^2\Omega \) (measurable w.r.t. to the sigma-algebra generated by X) as an \(L^2\)-absolutely convergent series

$$\begin{aligned} L^2\Omega \ni Z = \sum _{m = 0}^\infty {\mathcalligra{w}}^m Z,\qquad \Vert Z \Vert _{L^2}^2 = \sum _{m = 0}^\infty \Vert {\mathcalligra{w}}^m Z \Vert _{L^2}^2 = \sum _{m = 0}^\infty m! \Vert f^m \Vert _{{\mathcal {H}}^{\otimes m}}^2 \end{aligned}$$
(22)

where \(f^m = (\delta ^m)^{-1} \circ {\mathcalligra{w}}^m (Z)\). The map \((\delta ^m)^{-1} \circ {\mathcalligra{w}}^m\) admits an expression in terms of the Malliavin derivative: this is Stroock’s formula, which states that for \(Z \in {\mathbb {D}}^{m,2}\)

$$\begin{aligned} (\delta ^m)^{-1} \circ {\mathcalligra{w}}^m (Z) = \frac{1}{m!} {\mathbb {E}}[\mathcal {D}^m Z] . \end{aligned}$$
(23)

As a consequence, if \(Z \in {\mathbb {D}}^{\infty ,2} {:}{=}\bigcap _{m = 0}^\infty {\mathbb {D}}^{m,2}\) we can write its Wiener chaos decomposition as the series

$$\begin{aligned} Z = \sum _{m = 0}^\infty \frac{1}{m!} \delta ^m {\mathbb {E}}[ \mathcal {D}^m Z]. \end{aligned}$$
(24)

We continue calling elements of \({\mathcal {E}}^{\otimes m}\) elementary functions, in light of the fact that they can be identified with functions \(([0,T] \times [d])^m \rightarrow {\mathbb {R}}\) by the mapping

$$\begin{aligned} \mathbb {1}^{\gamma _1}_{[0,t_1)} \otimes \cdots \otimes \mathbb {1}^{\gamma _m}_{[0,t_m)} \mapsto \mathbb {1}^{\gamma _1,\ldots ,\gamma _m}_{[0,t_1) \times \cdots \times [0,t_m)}. \end{aligned}$$
(25)

This is the map given by the product of the Kronecker deltas \(\updelta ^{\gamma _1}_\cdot \cdots \updelta ^{\gamma _m}_\cdot \) and the indicator function on the m-cube \([0,t_1) \times \cdots \times [0,t_m)\), each \(\updelta \) paired with the respective time variable. Since \({\mathcal {E}}^{\otimes m}\) is dense in \({\mathcal {H}}^{\otimes m}\), elements of the latter may be identified as equivalence classes of Cauchy sequences in \({\mathcal {E}}^{\otimes m}\). While \({\mathcal {H}}^{\otimes m}\) is not, in general, a space of functions, it is possible to uniquely associate elements of \({\mathcal {H}}^{\otimes m}\) to certain measurable functions \(([0,T] \times [d])^m \rightarrow {\mathbb {R}}\) as follows:

Definition 1.1

(Functions as elements of \({\mathcal {H}}^{\otimes m}\)). For a function \(f :([0,T] \times [d])^m \rightarrow {\mathbb {R}}\) we will write \(f \in {\mathcal {H}}^{\otimes m}\) if there exist a Cauchy sequence \((f_n)_n \subset {\mathcal {E}}^{\otimes m}\), uniformly bounded as a sequence of functions (according to the identification (25)), with \(f_n \rightarrow f\) a.e. In this case we will say that f represents \(\lim f_n \in {\mathcal {H}}^{\otimes m}\). If f represents \(\phi , \psi \in {\mathcal {H}}^{\otimes m}\) then \(\phi = \psi \): this is an immediate consequence of the following

Lemma 1.2

Let \((f_n)_n\) be as in the above definition with \(f = 0\). Then \(f_n \rightarrow 0\) in \({\mathcal {H}}^{\otimes m}\).

Proof

Let

$$\begin{aligned} f_n = \sum _{\gamma _1,\ldots ,\gamma _m = 1}^d f_{n;\gamma _1,\ldots ,\gamma _m} \mathbb {1}^{\gamma _1,\ldots ,\gamma _m} \end{aligned}$$

with \(f_{n;\gamma _1,\ldots ,\gamma _m} :[0,T]^m \rightarrow {\mathbb {R}}\). Then \(f_n \rightarrow 0\) a.e. if and only if \(f_{n;\gamma _1,\ldots ,\gamma _m} \rightarrow 0\) a.e. for each \((\gamma _1,\ldots ,\gamma _m) \in [d]^m\). Keeping in mind that \({\mathcal {H}} \cong ({\mathcal {H}}^1)^{\bigoplus d}\) we may therefore assume \(d = 1\) and suppress indices. Following [23, p. 588], we test the sequence with elementary functions: letting \(f_n = \sum _{s^n_1,\ldots ,s^n_m} f_n^{s^n_1,\ldots ,s^n_m}\mathbb {1}_{[0,s^n_1)\times \ldots \times [0,s^n_m)}\) and \({\mathcal {E}}^{\otimes m} \ni g = \sum _{t_1,\ldots ,t_m} g^{t_1,\ldots ,t_m}\mathbb {1}_{[0,t_1)\times \ldots \times [0,t_m)}\) with \(f_n^{s^n_1,\ldots ,s^n_m}, g^{t_1,\ldots ,t_m} \in {\mathbb {R}}\) uniformly bounded (and the sums finite) we have that

$$\begin{aligned}&\langle f_n, g \rangle _{{\mathcal {H}}^{\otimes m}} \\&\quad = \sum _{\begin{array}{c} s^n_1,\ldots ,s^n_m \\ t_1,\ldots ,t_m \end{array}}f_n^{s^n_1,\ldots ,s^n_m} g^{t_1,\ldots ,t_m} R(s^n_1,t_1) \cdots R(s^n_m,t_m) \\&\quad =\sum _{t_1,\ldots ,t_m} g^{t_1,\ldots ,t_m} \int _{((0,T] \setminus \{t_1\}) \times \cdots \times ((0,T] \setminus \{t_m\})}\\&\qquad f_n(s_1,\ldots ,s_m) \partial _1 R(s_1,t_1) \cdots \partial _1 R(s_m,t_m) \text {d}s_1 \cdots \text {d}s_m. \end{aligned}$$

(10) and (11) imply that the integrands are absolutely and uniformly bounded by \( [|t_1-s_1|^{2\,H-1} \vee s_1^{2\,H-1}] \cdots [|t_m-s_m|^{2\,H-1} \vee s_m^{2\,H-1}]\) (up to a constant), which is integrable on \(((0,T] {\setminus } \{t_1\}) \times \cdots \times ((0,T] {\setminus } \{t_m\})\). By dominated convergence \(\langle \phi , g \rangle _{{\mathcal {H}}^{\otimes m}} = \lim \langle f_n, g \rangle _{{\mathcal {H}}^{\otimes m}} = 0\), where \(\phi {:}{=}\lim f_n\) in \({\mathcal {H}}^{\otimes m}\), and \(\phi = 0\) follows from the fact that g ranges in a dense set. \(\square \)

In light of the aforementioned non-degeneracy condition on X, we also expect the converse to hold: if \(\phi \in {\mathcal {H}}^{\otimes m}\) is represented by the functions fg in the above sense, then \(f = g\) a.e. An example of a degenerate stochastic process, for which this property would not hold, is given by taking any process X and concatenating it with itself path by path; the resulting covariance function R would be invariant under transposing the intervals [0, T) and [T, 2T). We also note that, in specific cases, it is possible to describe \({\mathcal {H}}\) explicitly: if X is a fractional Brownian motion with Hurst parameter \(H \in (0,1)\), the identity on \({\mathcal {E}}\) induces an isomorphism between \({\mathcal {H}}\) and the Sobolev space \(W^{1/2-H, 2}\) [24], which is a space of functions for \(H \in (0,1/2]\) but not for \(H \in (1/2,1)\).

We will mostly be considering Wiener integrals on simplices, which has the effect of quotienting out symmetry of the operator \(\delta ^m\). We will often resort to integral notation, e.g. if \(\mathbb {1}^{\alpha \beta }_{\Delta [s,t]} \in {\mathcal {H}}^{\otimes 2}\) (the function that maps \(((u,\gamma ),(v,\delta )) \mapsto \updelta ^{\alpha \gamma }\updelta ^{\beta \delta } \mathbb {1}_{s< u< v < t}\)) in the sense of Definition 1.1, we will write \(\delta ^2(\mathbb {1}^{\alpha \beta }_{\Delta [s,t]}) {=}{:}\int _{s< u< v < t} \delta X_u^\alpha \delta X^\beta _v\) to be the limit in \(L^2\) of \(\delta ^2(f_n)\). Wiener integrals of elements of \({\mathcal {E}}^{\otimes m}\), on the other hand, can be computed explicitly by using the adjoint property (17): for example, it can be checked that

$$\begin{aligned} \delta ^2(\mathbb {1}^{\alpha }_{[s,t)} \otimes \mathbb {1}^{\beta }_{[u,v)}) = X^{\alpha }_{st} X^{\beta }_{uv} - R^{\alpha \beta }(\Delta (s,t),\Delta (u,v)). \end{aligned}$$

The more general formula involves multivariate analogues of the Hermite polynomials (see [34, §2.7.2] and [17, p.244]). When X is a Gaussian martingale (but not necessarily if it is only a semimartingale), multiple Wiener integration on the simplex coincides with iterated Wiener-Itô integration.

3 The main result, some consequences

We begin this section with some more notation. We denote \([n] {:}{=}\{1,\ldots ,n\}\) the set with n elements. We will be concerned with iterated integrals on the n-simplex \(\Delta ^n[s,t] {:}{=}\{(u_1,\ldots ,u_n) \mid s< u_1< \ldots< u_n < t \}\). Because such integrals will involve the covariance function, integration variables will sometimes come in pairs. For \(m, n \in {\mathbb {N}}\) we denote \({\mathcal {P}}^n_m\) the collection of partitions of subsets of [n] of cardinality \(n-m\) into sets of cardinality 2. Note that this means \({\mathcal {P}}^n_m = \varnothing \) whenever \(n \ne m \ (\text {mod} \ 2)\) or \(m > n\), but \({\mathcal {P}}^n_n\) has precisely one element, \(\varnothing \): the empty set admits the empty collection of subsets as a partition, which vacuously belongs to \({\mathcal {P}}^n_n\). For example, \(Q {:}{=}\{\{1,4\},\{3,8\},\{5,6\}\} \in {\mathcal {P}}^8_2\) viewed as a partition of the set \(\{1,3,4,5,6,8\} \subseteq [8]\). For \(P \in {\mathcal {P}}^n_m\) we will denote \({\overline{P}} {:}{=}[n] {\setminus } \cup P\) (in the partition of the above example, \({\overline{Q}} = \{2,7\}\)).

It will convenient to use graphical notation to denote such objects, and for reasons that will become apparent shortly, for a pair \(\{i,j\}\) with \(i \le j\) we will distinguish between the consecutive case \(j = i+1\) and the non-consecutive one \(j > i+1\). The partition \(Q \in {\mathcal {P}}^8_2\) above is represented by

(26)

We will refer to such graphics as diagrams. We have drawn one node for each \(i \in [n]\) that is not paired with a consecutive integer, and one node for each consecutive pair (in this case only \(\{5,6\}\)); when counting nodes, a node corresponding to such a pair should be thought as having double weight. In our example, the 5th node actually counts for positions 5 and 6. With this convention, for each non-consecutive pair \(\{i,j\}\) we have drawn an arc connecting the two nodes of positions i and j, and for each node corresponding to a consecutive pair we have drawn a line going upwards. Nodes that do not have a line or arc entering them correspond to elements of \({\overline{P}}\), and we will call them single. Note that, by construction, there is never an arc between two consecutive nodes: this will be critical for convergence of the associated integrals described below. In the next section, we will be particularly concerned with maximal sequences of consecutive pairings, i.e. collections of pairings \(\{k,k+1\},\ldots ,\{k+l,k+l+1\} \in P\) with \(l \ge 0\) and s.t. \(\{k-2,k-1\},\{k+l+2,k+l+3\} \not \in P\).

Now, given \(P \in {\mathcal {P}}^n_m\), \(0 \le s \le t \le T\) and \(\gamma _1,\ldots ,\gamma _n \in [d]\) we associate to it a continuous function \(P_{st}^{\gamma _1,\ldots ,\gamma _n} :\Delta ^m[s,t] \times [d]^m \rightarrow {\mathbb {R}}\) by integrating over as many variables as there are non-single nodes in the diagram that represents P: call this number, which equals twice the number of non-consecutive pairs in P plus the number of consecutive ones, \(\#P\). This explains our choice for the above notation: each node either corresponds to an integration variable or to a free variable, i.e. a variable of which \(P_{st}^{\gamma _1,\ldots ,\gamma _n}\) is a function. We use the shorthands

$$\begin{aligned} \begin{aligned} R(\text {d}u_i, \text {d}u_j)&{:}{=}\partial _{12}R(u_i,u_j) \text {d}u_i \text {d}u_j \\ \tfrac{1}{2} R(\text {d}u_{h+1}) - R(u_{h-1},\text {d}u_{h+1})&{:}{=}\big [\tfrac{1}{2} R'(u_{h+1}) - \partial _{2}R(u_{h-1},u_{h+1})\big ] \text {d}u_{h+1}\text {,} \end{aligned} \end{aligned}$$
(27)

and the former will only be used when \(j > i+1\). Crucially, we are defining the second case as \(\frac{1}{2} R(\text {d}u_{h+1}) - R(u_{h-1},\text {d}u_{h+1})\), not as \(R(u_{h+1},\text {d}u_{h+1}) - R(u_{h-1},\text {d}u_{h+1})\), since this would be ill-defined in many cases (including \(1/2>H\)-fBm) because \(R(\,\cdot , \,\cdot \,)\) may not admit partial derivatives on the diagonal. On the other hand, we are assuming that the variance function \(R(\,\cdot \,)\) is differentiable.

Definition 2.1

(\(P^{\gamma _1,\ldots ,\gamma _n}_{st}\)) For \(\gamma _1,\ldots ,\gamma _n \in [d]\), \(0 \le s \le t \le T\) and \(P \in {\mathcal {P}}^n_m\) define

$$\begin{aligned} \begin{aligned} P_{st}^{\gamma _1,\ldots ,\gamma _n}(u_k \mid k \in {\overline{P}})&{:}{=}\prod _{k \in {\overline{P}}} \mathbb {1}^{\gamma _k} \cdot \int _{\Delta ^{\#P}[s,t]} \prod _{\begin{array}{c} \{i,j\} \in P \\ |j-i|>1 \end{array}} R^{\gamma _i \gamma _j}(\text {d}u_i,\text {d}u_j) \\ {}&\quad \cdot \prod _{\{h,h+1\} \in P} \big [ \tfrac{1}{2} R^{\gamma _h \gamma _{h+1}}(\text {d}u_{h+1}) - R^{\gamma _h \gamma _{h+1}}(u_{h-1},\text {d}u_{h+1}) \big ] \end{aligned} \nonumber \\ \end{aligned}$$
(28)

as a function \(([0,T] \times [d])^m \rightarrow {\mathbb {R}}\) extended with the value 0 outside \(\Delta ^m[s,t]\).

The variables \(u_k\) with \(k \in {\overline{P}}\) are supplied as arguments, so in fact this is an integral over a disjoint union of up to \(m+1\) simplices (fewer if some of the elements of \({\overline{P}}\) are consecutive). The kth index in \([d]^n\) is given as argument to \(\mathbb {1}^{\gamma _k}\) as a Kronecker delta: this means that \(P_{st}^{\gamma _1,\ldots ,\gamma _n}\) vanishes on all but one element of \([d]^m\). The reason why we still consider \(P_{st}^{\gamma _1,\ldots ,\gamma _n}\) as a function on \([d]^m\) is that this is necessary to view it as an element of \({\mathcal {H}}^{\otimes m}\); nevertheless, when the indices are fixed it will sometimes be convenient to just think of it as a function of m times. If \(m = 0\), \(P^{\gamma _1,\ldots ,\gamma _n}_{st}\) is just a real number.

Remark 2.2

The presence of the second type of integrand in Definition 2.1 is the reason for the smoothness assumptions on the variance and covariance functions, which are not to be found in most of the literature on these topics: this is because it would be difficult to define integrals such as \(\int _{s< u< v < t} \big [\tfrac{1}{2} R(\text {d}u) - R(s,\text {d}u)\big ]\big [\tfrac{1}{2} R(\text {d}v) - R(u,\text {d}v)\big ]\) as iterated Young integrals, without taking derivatives, since the variable u in its undifferentiated form appears after the integrator \(\frac{1}{2} R(\text {d}u) - R(s,\text {d}u)\); this of course is no longer an issue under our smoothness hypotheses, thanks to which the above integral is defined as the Lebesgue integral on the simplex \(\int _{s< u< v < t} \big [\tfrac{1}{2} R'(u) - \partial _2 R(s,u)\big ]\big [\tfrac{1}{2} R'(v) - \partial _2 R(u,v)\big ] \text {d}u \text {d}v\).

When P is represented by a diagram, we will decorate the nodes with labels. For example, the integral associated to (26) with labelling \(\alpha ,\ldots ,\vartheta \) is given by

This is viewed as a function of the variables \(u_2,u_6\) ranging on the simplex \(\Delta ^2[s,t]\), each paired with an index variable, which must respectively be equal to \(\beta \), \(\eta \) for the expression not to vanish. The variable \(u_5\) has been skipped, since it is the first term in the consecutive pair \(\{5,6\}\). We will show that integrals defined in this fashion are a.e. limits of Cauchy sequences in \({\mathcal {E}}^{\otimes m}\), which therefore uniquely represent elements of \({\mathcal {H}}^{\otimes m}\) according to Definition 1.1. When taking multiple Wiener integrals of them, the indices corresponding to the nodes that represent free variables will become the coordinate processes that are being integrated against, e.g.

We are now ready to state the main theorem.

Theorem 2.3

(Wiener chaos expansion of the signature of a Gaussian process). Given \(m,n \in {\mathbb {N}}\), \(P \in {\mathcal {P}}^n_m\), \(\gamma _1,\ldots ,\gamma _n \in [d]\), \(0 \le s \le t \le T\), it holds that \(P^{\gamma _1,\ldots ,\gamma _n}_{st} \in {\mathcal {H}}^{\otimes m}\) in the sense of Definition 1.1, and the mth Wiener chaos projection of the signature of X is given by

$$\begin{aligned} {\mathcalligra{w}}^m \mathcal {S}(X)^{\gamma _1,\ldots ,\gamma _n}_{st} = \sum _{P \in {\mathcal {P}}^n_m} \delta ^m P^{\gamma _1,\ldots ,\gamma _n}_{st}. \end{aligned}$$
(29)

In particular, notice that \({\mathcalligra{w}}^m \mathcal {S}(X)^{\gamma _1,\ldots ,\gamma _n}_{st}\) can only be non-zero when \(m \le n\) and \(m \equiv n \ (\text { mod \ 2})\). The most important case of this result is when \(m = 0\):

Corollary 2.4

(Expected signature of a Gaussian process). With notation as above, we have

$$\begin{aligned} {\mathbb {E}} \mathcal {S}(X)^{\gamma _1,\ldots ,\gamma _n}_{st} = \sum _{P \in {\mathcal {P}}^n_0} P^{\gamma _1,\ldots ,\gamma _n}_{st}. \end{aligned}$$
(30)

Remark 2.5

(Eliminating variables). While convergence rules out always considering integrands of the first type in (27) (which would mean allowing diagrams with arcs between consecutive nodes), one may wonder whether it is possible to only consider integrands of the second type, i.e. by integrating out one variable per pair and thus simplifying the presentation of the formula. This, however, is not possible in general, because of the additional constraint that requires two consecutive variables not to be both integrated out (for the expression to make sense as an integral). It is not difficult to see, for example, that in the following diagram

at most two variables can be integrated out (unless the remaining integral can be solved or simplified analytically). Luckily, the only case in which it is necessary for convergence to integrate out certain variables (as specified in the second case of (27)), is when there are consecutive pairs: this is always possible, even when more than one pair in a row is consecutive, since we may always pick the first variable to integrate out (as done here—one could equivalently have chosen the second). Of course, there is always some number of additional variables that can be eliminated, but we do not immediately see a way of doing this in a maximal way that is canonical.

Example 2.6

(The Wiener chaos decomposition of \(\mathcal {S}^3(X)_{st}\)). We give the explicit expression for the Wiener chaos expansion of the signature truncated at level 3. These terms are especially significant, considering that they are the ones that define the rough path when \(1/4 < H \le 1/3\): higher signature terms can be derived in a pathwise fashion by Lyons’s extension theorem without involving probability. We represent each signature term as a sum of their Wiener chaos projections in ascending order; in particular the sum of all non-random terms constitutes the expectation of the left hand side.

In particular, notice how the expected signature of level 2 is given by the difference between the average of the variances and the covariance:

$$\begin{aligned} {\mathbb {E}} \mathcal {S}(X)^{\alpha \beta }_{st} = \frac{R^{\alpha \beta }(s) + R^{\alpha \beta }(t)}{2} - R^{\alpha \beta }(s,t) \end{aligned}$$
(31)

and that the statement that “the Itô and Stratonovich Lévy areas are equal” carries over to the Gaussian Wiener-rough setting, in the sense that

$$\begin{aligned} \frac{1}{2} \big (\mathcal {S}(X)^{\alpha \beta }_{st}-\mathcal {S}(X)^{\beta \alpha }_{st} \big ) = \frac{1}{2} \int _{s< u< v < t} \delta X^\alpha _u \delta X^\beta _v - \delta X^\beta _u \delta X^\alpha _v \end{aligned}$$
(32)

by symmetry of the covariance function.

Example 2.7

(\({\mathbb {E}}\mathcal {S}(X)^{(4)}\)). Corollary 2.4 at level 4 is given by

(33)

Using a clever transformation, [3, Theorem 34] are able to compute \({\mathbb {E}}\mathcal {S}(X)^{(2)}_{01}\) and \({\mathbb {E}}\mathcal {S}(X)^{(4)}_{01}\) for \(1/4<H\)-fBm. Their formulae are specific to the cases \(n = 2,4\) and X a fBm, and are quite different to those given by Theorem 2.3. That the two coincide is immediate at level 2 by (31), and in “Appendix A” we perform this check at level 4.

The following example shows how Theorem 2.3 has the potential to generate insight into numerics of numerical schemes for rough differential equations driven by Gaussian signals.

Example 2.8

(Itô–Taylor expansions for solutions to RDEs driven by Gaussian signals). Assume

$$\begin{aligned} \text {d}Y = V(Y) \text {d}\varvec{X}, \qquad Y_0 = y_0 \end{aligned}$$

is an RDE (rough differential equation) driven by the Gaussian rough path \(\varvec{X}\) (defined by the first 1, 2 or 3 levels of \(\mathcal {S}(X)\), depending on how rough X is). Proceeding formally, and denoting by \(V_{\gamma _1} \cdots V_{\gamma _n}\) composition of vector fields (and using Einstein notation), we can then expand the solution Y as

$$\begin{aligned} Y_t&= \sum _{n = 0}^\infty V_{\gamma _1} \cdots V_{\gamma _n}(y_0) \mathcal {S}(X)_{0t}^{\gamma _1,\ldots ,\gamma _n} \\&= \sum _{n = 0}^\infty V_{\gamma _1} \cdots V_{\gamma _n}(y_0) \sum _{\begin{array}{c} 0 \le m \le n \\ m \equiv n \ \text {mod} \ 2 \end{array}} {\mathcalligra{w}}^m \mathcal {S}(X)_{0t}^{\gamma _1,\ldots ,\gamma _n} \\&= \sum _{n = 0}^\infty V_{\gamma _1} \cdots V_{\gamma _n}(y_0) \sum _{\begin{array}{c} 0 \le m \le n \\ m \equiv n \ \text {mod} \ 2 \end{array}} \sum _{P \in {\mathcal {P}}^n_m} \delta ^m P^{\gamma _1,\ldots ,\gamma _n}_{0t} \\&= \sum _{m = 0}^\infty \sum _{\begin{array}{c} n \ge m \\ n \equiv m \ \text {mod} \ 2 \end{array}}^\infty V_{\gamma _1} \cdots V_{\gamma _n}(y_0) \sum _{P \in {\mathcal {P}}^n_m} \delta ^m P^{\gamma _1,\ldots ,\gamma _n}_{0t}. \end{aligned}$$

The expansion on the first line can be viewed as the extension to the Gaussian case of Stratonovich–Taylor series, the one on the last line can be viewed as that of Itô–Taylor series [25]. The latter has the advantage that its terms fit in well with the Wiener chaos decomposition of \(Y_t\), although it should be observed that \({\mathcalligra{w}}^m Y_t\) is represented as an infinite series, namely the second sum in the last line above. Also, this expansion cannot be expected to coincide with the Wiener chaos decomposition of \(Y_t\) if it is performed at times other than 0, with \(Y_0 = y_0\) deterministic. This is because, unless X is a martingale, the Wiener chaos isometries will not hold conditionally on \({\mathcal {F}}_s\).

Remark 2.9

(Stationarity and joint stationarity of increments). X is stationary if and only if we may write

$$\begin{aligned} R(s,t) = {\overline{R}}(t-s) \end{aligned}$$
(34)

for some function \({\overline{R}} :[0,T] \rightarrow {\mathbb {R}}^{d\times d}\). In this case we have

$$\begin{aligned} \begin{aligned}&\partial _{12} R( s, t) = -{\overline{R}}{}''(t-s), \quad R'( t) = 0, \quad \partial _2R(s, t) = {\overline{R}}{}'(t-s) \\ \implies \quad&\tfrac{1}{2} R'(t) - \partial _2R(s,t) = - {\overline{R}}'(t-s). \end{aligned} \end{aligned}$$
(35)

An example of a centred stationary Gaussian process is the stationary Ornstein–Uhlenbeck process \(e^{-t/2}W_{e^t}\) where W is a Brownian motion and \(t \in [0,T]\): its covariance function is \(R(s,t) = e^{-(t-s)/2}\) for \(s \le t\). This process however, strictly speaking, is not among those considered here, as it has random initial condition.

There is a much weaker property that results in a similar simplification. We will say that a stochastic process X has jointly stationary increments if for all \(s_1 \le t_1, \ldots , s_n \le t_n \) the distribution of the random vector of increments \((X_{s_1t_1},\ldots ,X_{s_nt_n})\) only depends on the differences \(t_1-s_1,\ldots ,t_n-s_n\) and \(s_2-s_1,\ldots , s_n-s_{n-1}\) (if \(n = 1\) the latter condition vanishes, and ordinary stationarity of increments is recovered). If X is Gaussian this need only be required for \(n = 2\), and if it holds we may write

$$\begin{aligned} R(\Delta (s,u), \Delta (t,v)) = {\mathbb {E}}[X_{su} \otimes X_{tv}] = {\widehat{R}}(u-s,v-t,t-s) \end{aligned}$$
(36)

for some function \({\widehat{R}} :[0,T]^3 \rightarrow {\mathbb {R}}^{d\times d}\). This property is satisfied by fBm, since if H is the Hurst parameter we have

$$\begin{aligned}&R(\Delta (s,u), \Delta (t,v))\\&\quad = \frac{1}{2} \big [ (t-u)^{2H} + (v-s)^{2H} - (t-s)^{2H} - (v-u)^{2H}\big ] \\&\quad = \frac{1}{2} \big [ \big ((t-s)- (u-s)\big )^{2H} + \big ((v-t) + (t-s)\big )^{2H}\\&\qquad - \big (t-s\big )^{2H} - \big ((v-t) + (t-s) - (u-s)\big )^{2H}\big ]. \end{aligned}$$

If X has jointly stationary increments

$$\begin{aligned} \partial _{12}R( s, t) = \lim _{\begin{array}{c} u \rightarrow s \\ v \rightarrow t \end{array}} \frac{ R(\Delta (s,u), \Delta (t,v))}{(v-t)(u-s)} = \partial _{12}{\widehat{R}}(0,0,t-s) . \end{aligned}$$
(37)

Although similar simplifications are not available for \(\partial _2 R(s, t)\) and \(R'( t)\) individually (as they are in the stationary case), they are for their difference: indeed, using that \(R(\,\cdot ,0) \equiv 0\), we have

$$\begin{aligned}&\tfrac{1}{2} R(t+h) - \tfrac{1}{2} R(t) - \big ( R(s,t+h) - R(s,t) \big )\\&\quad = \tfrac{1}{2} \big [ R(\Delta (s,t), \Delta (t,t+h)) + R(\Delta (s,t+h),\Delta (t,t+h)) \big ] \end{aligned}$$

which implies

$$\begin{aligned} \tfrac{1}{2} R'(t) - \partial _2R(s,t)&= \tfrac{1}{2} \partial _h|_{h = 0} \big [{\widehat{R}}(t-s,h,t-s) + {\widehat{R}}(t+h-s,h,t-s) \big ] \\&= \tfrac{1}{2}\partial _1{\widehat{R}}(t-s,0,t-s) + \partial _2{\widehat{R}}(t-s,0,t-s) . \end{aligned}$$

We therefore conclude that joint stationarity of increments, though a much more general property than stationarity, results in the same simplifications that are of relevance to Theorem 2.3, namely that \(\partial _{12}R( s, t)\) and \(\tfrac{1}{2} R'(t) - \partial _2R(s,t)\) only depend on \(t-s\). This can be of aid in simplifying the expression of the integrals in the formula for \({\mathcalligra{w}}^m \mathcal {S}(X)\), since it is possible to perform substitutions of the form \(v_{ij} = u_j - u_i\). It does not, however, guarantee that these integrals become analytically solvable, as simple examples show (e.g. the integral \(\int _0^1 v^{2\,H-1}(1-v)^{2\,H-1} \text {d}v\) appearing in “Appendix A”).

We now consider a few examples of Gaussian processes to which our results apply; in all cases, X will have i.i.d. components, and we will use R to denote the scalar covariance function of each component. Arguably the most important example of a stochastic process for which the signature has not yet been computed is fractional Brownian motion in the regime of negatively-correlated increments:

Example 2.10

(\((1/4,1/2) \ni H\)-fBm). Fractional Brownian motion with Hurst parameter \(H \in (0,1)\) (H-fBm), introduced in [32], is a scalar centred Gaussian process with covariance function

$$\begin{aligned} R(s,t) = \frac{1}{2} (t^{2H} + s^{2H} - (t-s)^{2H}), \quad s \le t. \end{aligned}$$
(38)

It is not a semimartingale unless \(H = 1/2\), in which case it is Brownian motion. Here we consider the case \(H \in (1/4,1/2)\): this is well known to satisfy the preliminary hypotheses required in Sect. 1, and the smoothness conditions and bounds are simple to verify. Indeed, the integrands of interest for the formula of Theorem 2.3 are given by (\(s \le t\))

$$\begin{aligned} \begin{aligned} \partial _{12}R( s, t)&= H(2H - 1)(t-s)^{2H - 2} \\ \tfrac{1}{2} R'(t) - \partial _2R(s,t)&= H (t-s)^{2H - 1}. \end{aligned} \end{aligned}$$
(39)

As predicted by Remark 2.9, these both are functions of \(t-s\).

Remark 2.11

(\((1/2,1) \ni H\)-fBm, [3]). If \(R(\,\cdot ,\,\cdot \,)\) is once differentiable on the diagonal, then

$$\begin{aligned} R'(t) = \frac{\text {d}}{\text {d}t} R(t,t) = 2 \partial _2 R(t,t) \end{aligned}$$

and we have

$$\begin{aligned}{} & {} \int _s^t \big [ \tfrac{1}{2} R'(v) - \partial _2R(s,v) \big ] \text {d}v \\{} & {} = \int _s^t \big [ \partial _2R(v,v) - \partial _2R(s,v) \big ] \text {d}v = \int _{s< u< v < t} \partial _{12}R(u,v) \text {d}u \text {d}v. \end{aligned}$$

By performing this substitution in Corollary 2.4 for the case of \(1/2 < H\)-fBm (this means always applying the first case in (27), i.e. allowing arcs between consecutive nodes, which replace lines), we recover the formula of [3, Theorem 31] (note that the symmetry factor—meant to factor out permutations of pairings and transpositions within each pair—is not present in our case, since we are summing over pairings and not permutations). Other examples of processes in a similar regularity regime are those Gaussian Volterra processes with strictly regular kernels considered in [7].

The following is another example of a fractional, non-semimartingale process.

Example 2.12

(The Riemann–Liouville process). Another centred continuous Gaussian process, originally introduced in [26] and subsequently in [32], is the Riemann–Liouville process with Hurst parameter \(H \in (0,1)\) (sometimes called “type-II fBm”), is a centred Gaussian process with covariance function [33, pp. 116–117]

$$\begin{aligned} \begin{aligned} R(s,t)&\underset{s<t}{=}\ \frac{1}{2}\bigg [t^{2H} + s^{2H} - \underbrace{2H(t-s)^{2H} \bigg ( \frac{1}{2H} + \int _0^{s/(t-s)} \big ((1+u)^{H-1/2} - u^{H-1/2}\big )^2\text {d}u \bigg )}_{= R(\Delta (s,t),\Delta (s,t))}\bigg ]. \end{aligned} \end{aligned}$$
(40)

Like fBm, this process specifies to Brownian motion when \(H = 1/2\) and is otherwise not a semimartingale. Their main difference between the two is that fBm has jointly stationary increments while for the Riemann–Liouville process not even single increments are stationary. We were not able to find a satisfactory expression for the derivatives of the covariance function of this process, and thus were not able to determine whether (for \(H > 1/4\)) it satisfies the conditions necessary for applying Theorem 2.3. However, we believe that examples such as this provide strong motivation for not confining our study to fBm and to allow for more general processes.

Another important restriction of the main result is the following case:

Remark 2.13

(Gaussian martingales, [16]). When X is a continuous Gaussian martingale, its quadratic variation coincides with its variance function (as can be seen by the fact that \(X_t^2 - R(t)\) is a martingale). The Dubins-Schwarz theorem then implies that X can be represented as the deterministically-reparametrised Brownian motion \(W_{R(t)}\). Assuming equal distribution of components, we can use this and the formula for the expected signature of Brownian motion (2) to compute

$$\begin{aligned} {\mathbb {E}} \mathcal {S}(X)_{st}^{\gamma _1,\ldots ,\gamma _{2n}} = \frac{R(\Delta (s,t))^n}{2^n n!} \updelta ^{\gamma _1 \gamma _2} \cdots \updelta ^{\gamma _{2n-1} \gamma _{2n}}. \end{aligned}$$
(41)

Since by martingality \(\partial _{12}R(s,t) = 0 = \partial _2R(s,t)\) on \(s < t\), Theorem 2.3 reduces to a sum of iterated integrals that only involve \(\frac{1}{2} R'\), which coincides with the above formula.

We conclude with two examples of centred, continuous Gaussian semimartingales which are not martingales and do not have stationary increments.

Example 2.14

(Brownian bridge returning to the origin). The Brownian Bridge returning to the origin at time T is a process whose law is given by disintegrating the Wiener measure on the event \(W_T = 0\), where W is a d-dimensional Brownian motion starting at the origin. It can be written either as

$$\begin{aligned} X_t = W_t - \frac{t}{T} W_T, \qquad t \in [0,T] \end{aligned}$$

or adaptedly as

$$\begin{aligned} X_t = (T-t) \int _0^t \frac{\text {d}W_s}{T-s}, \qquad t \in [0,T) \end{aligned}$$

(and \(X_T = 0\)). Its covariance function is given by

$$\begin{aligned} R(s,t) = s\Big ( 1-\frac{t}{T}\Big ), \qquad s \le t \end{aligned}$$
(42)

and the integrands of interest are thus

$$\begin{aligned} \begin{aligned} \partial _{12}R( s, t)&= -\frac{1}{T} \\ \tfrac{1}{2} R'(t) - \partial _2R(s,t)&= \frac{1}{2} - \frac{t-s}{T}. \end{aligned} \end{aligned}$$
(43)

It should be mentioned that X, as a process defined on [0, T], fails the non-degeneracy condition [9, p.2125]. This is, however, not a problem, as we can view it as defined on the interval \([0,T-\varepsilon \)] and obtain the signature terms \(\mathcal {S}(X)_{sT}\) through a limiting argument. The bounds of (9), which in this example and the one below only involve linear terms, are easily checked (and indeed the first is not even sharp). Note that the iterated integrals of (43) can all be solved explicitly as polynomials.

Example 2.15

(Centred Ornstein–Uhlenbeck processes started at 0). We consider an Ornstein–Uhlenbeck process with zero mean and deterministic initial condition, given by the Wiener-Itô integral

$$\begin{aligned} X_t = \sigma \int _0^t e^{-\theta (t-u)} \text {d}W_u \end{aligned}$$

with \(\sigma ,\theta \in (0,+\infty )\). Its covariance function is given by

$$\begin{aligned} R(s,t) = \frac{\sigma ^2}{2\theta } \big ( e^{-\theta (t-s)} - e^{-\theta (s+t)} \big ), \qquad s \le t \end{aligned}$$
(44)

and \(\partial _{12}R(\text {d}s, \text {d}t)\), \(\tfrac{1}{2} R'(t) - \partial _2R(s,t)\) can be computed directly. Once again, all conditions are satisfied (see [9, p2138]).

4 Proof of the main result

Recall that we are using \(\lesssim \) to denote inequalities whose constant of proportionality may only depend on TH and other properties of a fixed process X. Since most of the arguments presented in this section only concern bounds and convergence, we will suppress indices (i.e. treat the scalar case) most of the time, so as not to clutter the notation. Given \(P \in {\mathcal {P}}^n_m\), denote \(|P|_{st}\) the function \(\Delta ^m[s,t] \rightarrow {\mathbb {R}}\) defined analogously to Definition 2.1, but replacing each integrand \(\partial _{12}R(u,v)\) with \((v-u)^{2\,H-2}\) and each integrand \(\frac{1}{2} R'(v) - \partial _2 R(u,v)\) with \((v-u)^{2H-1}\). For example, if Q is the diagram of (26)

$$\begin{aligned}{} & {} |Q|_{st} = \mathbb {1}_{\Delta ^2[s,t]}(u_2,u_7)\\{} & {} \int _{\begin{array}{c} s< u_1< u_2 \\ u_2< u_3< u_4< u_6< u_7 \\ u_7< u_8 < t \end{array}} (u_4 - u_1)^{2H-2} (u_8 - u_3)^{2H-2} (u_6 - u_4)^{2H-1}\text {d}u_1 \text {d}u_3 \text {d}u_4 \text {d}u_6 \text {d}u_8. \end{aligned}$$

The following proposition guarantees that all the integrals considered in the main theorem are convergent.

Proposition 3.1

(Finite improper integrals). For \(m \le n\) and \(P \in {\mathcal {P}}^n_m\)

$$\begin{aligned} |P|_{st} \lesssim (t-s)^{(n-m)H} \end{aligned}$$
(45)

uniformly over \(\Delta ^m[s,t]\).

Proof

We proceed by induction on \(n-m\). When P only has single nodes (\(m = n\)) the statement is trivial. We will proceed by considering several cases for the last node in P; the simplest of these occurs when it is single: the statement follows immediately from the inductive hypothesis. For the next case, we will need the following bound:

$$\begin{aligned} \begin{aligned}&\int _{\Delta ^n[s,t]} (u_1 - s)^{2H - 1} \cdots (u_n - u_{n-1})^{2H - 1} \text {d}u_1 \cdots \text {d}u_n \\&\quad \lesssim \int _{\Delta ^{n-1}[s,t]} (u_1 - s)^{2H - 1} \cdots (u_{n-1} - u_{n-2})^{2H - 1} (t - u_{n-1})^{2H} \text {d}u_1 \cdots \text {d}u_{n-1}\\&\quad \le (t-s)^{2H}\int _{\Delta ^{n-1}[s,t]} (u_1 - s)^{2H - 1} \cdots (u_{n-1} - u_{n-2})^{2H - 1} \text {d}u_1 \cdots \text {d}u_{n-1} \\&\quad \lesssim \cdots \lesssim (t-s)^{2nH}. \end{aligned} \end{aligned}$$

For a diagram C whose last node is the right endpoint of an arc, using the bound above we have

where \(|C|_{su_0}'\) equals the integral representing \(|C|_{su_0}\) with the only difference that we are not integrating w.r.t. the variable \(u_0\) in \((u_0-r)^{2\,H-2}\), which represents the arc that terminates at the last node of C. Similarly, if the last node in C is single, we have

where C is not differentiated since it terminates in a node representing a free variable, \(u_0\). We now consider arcs: assume there are i arcs/lines within A, j within B, and that there are k arcs between nodes in A and nodes in B (collectively represented below by the dashed arc). Let \(A^\circ \) and \(B^\circ \) denote the diagrams given by eliminating such arcs from A and B: the nodes that have become single as a result now represent free variables, which we call \(w_1,\ldots ,w_k\), \(z_1,\ldots ,z_k\). We first consider the case in which \(j>0\):

where we have used \(2H(j+1)-1 \ge 4H-1 >0\) since \(H >1/4\). Note that the absolute values in the third-last expression can be removed by separately considering the cases \(H >1/2\) and \(H < 1/2\). Assume instead \(j=0\): this means B must contain at least one node that is either single or paired with a node in A; it cannot be that \(B = \varnothing \) or the diagram would contain an arc between two consecutive nodes, which is ruled out. The case in which there is a node in B which is single (see Fig. 2) does not require \(H > 1/4\): letting r denote the free variable represented by such a node, and proceeding similarly to the above, we have

Finally, consider the case in which \(j = 0\) and \(k > 0\) (and B may have no single nodes):

Once again, the absolute values distinguish between \(H \lessgtr 1/2\). Expanding the product, we observe that three of the integrals feature products of different terms, each to the power of \(2H-1\): in these, at least one of \(z_k\) or u only appears once, which means this variable may be integrated out and the resulting term bounded (up to a constant) by \((t-s)^{2H}\), with the remaining integral solved similarly. The fourth integral instead is \(\int _{s< u< z < t} (z-u)^{4\,H-2} \text {d}u \text {d}z\) which is finite again thanks to \(H > 1/4\). This shows that we have \(\lesssim (t-s)^{2(i+k+1)H}\) in the above expression and concludes the proof. \(\square \)

Remark 3.2

(Modified |P|). We have stated the previous proposition under in the most natural manner; in particular note how, in the prototypical case of fBm, the integrals \(|P|_{st}\) are multiples of \(P_{st}\). We will, however, additionally need a slightly modified version of this result, in which the definition of |P| is changed as follows: maximal sequences

$$\begin{aligned} \int _{\Delta ^k[u,v]} (w_1 - u)^{2H - 1} \cdots (w_k - w_{k-1})^{2H - 1} \text {d}w_1 \cdots \text {d}w_k \end{aligned}$$

occurring in the middle of the expression for |P|, are replaced with their bound \((v-u)^{2kH}\), and each integrand \((v-u)^{2H-2}\) is replaced with \(((v-u) \wedge 1/2)^{2H-2}\). That the statement continues despite these modifications to hold is obvious for the first, and for the second it follows from the facts that all integrals are still convergent (by the same proof) and the 1/2 can be replaced with \(1/2 \wedge T\) and absorbed in the constant of proportionality.

Just like in [3], we approximate X piecewise linearly. Let \(X^\ell \) be a sequence of piecewise linear approximations of X along partitions \(\pi _\ell \) on [0, T] with step size that vanishes as \(\ell \rightarrow \infty \). It will be helpful to assume that the intervals in the mesh \(\pi _\ell \) all have the same length \(\varrho _\ell \); this simplifying assumption can be made because it is only necessary to show convergence along a sequence of such approximations, since it is known that the limit does not depend on the particular choice of \(\pi _\ell \) (or indeed on the type of piecewise smooth approximation in a broad class of these) [21, Ch. 15]. For \(t \in [0,T]\) we will write \(t_\ell ^-\) and \(t_\ell ^+\) to respectively denote the endpoints a and b of the interval of \(\pi _\ell \) s.t. \(t \in [a,b)\). Explicitly, \(X^\ell \) and its piecewise-defined derivative are given by

$$\begin{aligned} \begin{aligned} X^\ell _t&= X_{t^-_\ell } + \varrho ^{-1}_\ell (t-t^-_\ell ) X_{t^-_\ell t^+_\ell } \\ \dot{X}^\ell _t&= \varrho ^{-1}_\ell X_{t^-_\ell t^+_\ell } \end{aligned} \end{aligned}$$
(46)

where, as usual, \(X_{ab} {:}{=}X_b - X_a\) denotes the increment. In order to use Stroock’s formula (23), we will be considering Malliavin derivatives of the signature of the piecewise-linear interpolations of X,

$$\begin{aligned} \mathcal {S}(X^\ell )^{\gamma _1,\ldots ,\gamma _n}_{st} = \int _{\Delta ^n[s,t]} \dot{X}^{\ell ;\gamma _1}_{u_1} \cdots \dot{X}^{\ell ;\gamma _n}_{u_n} \text {d}u_1 \cdots \text {d}u_n \text {,} \end{aligned}$$

which in turn requires us to consider those of the single factors:

$$\begin{aligned} \mathcal {D}_{v} \dot{X}_u^{\ell ;\gamma } = \varrho ^{-1}_\ell \mathbb {1}^\gamma _{[u^-_\ell ,u^+_\ell )}(v) = \varrho ^{-1}_\ell \mathbb {1}^\gamma _{[v^-_\ell ,v^+_\ell )}(u). \end{aligned}$$
(47)

For \(P \in {\mathcal {P}}^n_m\), we provide a discretised analogue to Definition 2.1:

Definition 3.3

(\(P^{\ell ;\gamma _1,\ldots ,\gamma _n}_{st}\)). For \(\gamma _1,\ldots ,\gamma _n \in [d]\), \(0 \le s \le t \le T\) and \(P \in {\mathcal {P}}^n_m\) define

$$\begin{aligned} \begin{aligned}&P_{st}^{\ell ;\gamma _1,\ldots ,\gamma _n}(v_k \mid k \in {\overline{P}}) \\&{:}{=}\int _{\Delta ^{n}[s,t]} \prod _{\{i,j\} \in P} {\mathbb {E}}[\dot{X}^{\ell ;\gamma _i}_{u_i} \dot{X}^{\ell ;\gamma _j}_{u_j}] \text {d}u_i \text {d}u_j \cdot \prod _{k \in {\overline{P}}} \varrho _\ell ^{-1} \mathbb {1}^{\gamma _k}_{[v^-_{k;\ell }v^+_{k;\ell })}(u_k)\text {d}u_k. \end{aligned} \end{aligned}$$
(48)

as an element of \({\mathcal {E}}^{\otimes m}\), whose arguments are given to the functions \(\mathbb {1}^{\gamma _k}_{[u^-_{k;\ell }u^+_{k;\ell })}\) with \(k \in {\overline{P}}\).

Note how the above definition, unlike Definition 2.1 does not distinguish between consecutive and non-consecutive pairings: this will only become important in the limit. Moreover, we are integrating over all n variables, including the \(u_k\) with \(k \in {\overline{P}}\): this is because the time arguments of the function, \(v_k\), are supplied separately, with the respective index variables supplied as arguments to \(\updelta _{\gamma _k}\), \(k \in {\overline{P}}\). The functions \(P^\ell _{st}\) are summands in the expression of which we want to compute the limit:

Lemma 3.4

(Expected Malliavin derivatives of signature approximations).

$$\begin{aligned} {\mathbb {E}}\mathcal {D}^m \mathcal {S}(X^\ell )_{st}^{\gamma _1,\ldots ,\gamma _n} = m!\sum _{P \in {\mathcal {P}}^n_m} P_{st}^{\ell ;\gamma _1,\ldots ,\gamma _n} \ \in {\mathcal {E}}^{\odot m} \end{aligned}$$
(49)

Proof

This is a consequence of (46), (47), the (iterated) Leibniz rule for the Malliavin derivative and Wick’s formula for the mixed moments of a Gaussian vector (as it was already used in [2, Theorem 31]). The details are a matter of simple combinatorics; in particular note how, instead of summing over m! terms corresponding to the ways of permuting the m derivatives (for a fixed \(P \in {\mathcal {P}}^n_m\)), we are only including the term corresponding to the identity permutation and multiplying by m!, which identifies the same element of \({\mathcal {E}}^{\otimes m}\) up to symmetry. \(\square \)

In order to prove convergence, it is unfortunately not possible to argue by dominated convergence applied to Definition 3.3: this is because the factors in the integrand given by consecutive pairings \({\mathbb {E}}[\dot{X}^{\ell ;\gamma _i}_{u_i} \dot{X}^{\ell ;\gamma _{i+1}}_{u_{i+1}}]\) converge to non-integrable functions (e.g. \((v-u)^{2H-2}\) on \(\Delta ^2[s,t]\) for fBm) and the ones corresponding to Malliavin derivatives \(\varrho _\ell ^{-1} \mathbb {1}^{\gamma _k}_{[v^-_{k;\ell }v^+_{k;\ell })}(u_k)\text {d}u_k\) do not converge at all (in fact they converge, as distributions, to Dirac deltas \(\updelta _{v_k}\)). The reason that convergence holds is that all these quantities are integrated. To successfully exploit this, we will write each integral \(P^\ell _{st}\) as a nested integral, distinguishing between the three types of integrands:

$$\begin{aligned} \int (\text {non-consecutive pairings}) \int (\text {Malliavin derivatives}) \nonumber \\ \prod _{\begin{array}{c} \text {maximal} \\ \text {sequences} \end{array}} \int (\text {consecutive pairings}). \end{aligned}$$
(50)

The outer integral contains the product of all terms \({\mathbb {E}}[\dot{X}^{\ell ;\gamma _i}_{u_i} \dot{X}^{\ell ;\gamma _j}_{u_j}]\) with \(|j-i| > 1\). These are multiplied with the second integral, which integrates all factors coming from Malliavin derivatives. Finally, we partition the remaining integrands \({\mathbb {E}}[\dot{X}^{\ell ;\gamma _h}_{u_h} \dot{X}^{\ell ;\gamma _{h+1}}_{u_{h+1}}]\) into maximal sequences and integrate each individually: these integrals are integrands in the second integral, alongside the Malliavin derivatives. The operations of exchanging the order of integrals are all justified by Fubini’s theorem, considering that all integrals are actually finite sums. We illustrate all of this with a simple example: consider the diagram (suppressing indices)

According to Definition 3.3, we have

$$\begin{aligned}{} & {} P^\ell _{st}(v_1,v_2)\\{} & {} \quad = \int _{\Delta ^6[s,t]} \varrho _\ell ^{-2} \mathbb {1}_{[v^-_{2;\ell },v^+_{2;\ell })}(u_2) \mathbb {1}_{[v^-_{3;\ell },v^+_{3;\ell })}(u_3) {\mathbb {E}}[\dot{X}^\ell _{u_1} \dot{X}^\ell _{u_6}] {\mathbb {E}}[\dot{X}^\ell _{u_4} \dot{X}^\ell _{u_5}] \text {d}u_1 \text {d}u_4 \text {d}u_5 \text {d}u_6. \end{aligned}$$

Re-organising this expression as described in (50) we obtain

$$\begin{aligned}&\int _{s< u_1< u_6< t} {\mathbb {E}}[\dot{X}^\ell _{u_1} \dot{X}^\ell _{u_6}] \\&\qquad \bigg [\int _{u_1< u_2< u_3<u_6} \varrho _\ell ^{-2} \mathbb {1}_{[v^-_{2;\ell },v^+_{2;\ell })}(u_2) \mathbb {1}_{[v^-_{3;\ell },v^+_{3;\ell })}(u_3) \\&\qquad \qquad \bigg [\int _{u_3< u_4< u_5 < u_6} {\mathbb {E}}[\dot{X}^\ell _{u_4} \dot{X}^\ell _{u_5}]\text {d}u_4 \text {d}u_5 \bigg ] \text {d}u_2 \text {d}u_3 \bigg ] \text {d}u_1 \text {d}u_6. \end{aligned}$$

Note that the domain of integration of the innermost integral can be described in terms of variables of the two outer integrals: this extends to the case in which there is more than one maximal sequence, by maximality, and is crucial for the factorisation into integrals over maximal sequences to be possible.

The reason for the nested rewriting of (50) is that it will be possible to show convergence of the integrals over maximal sequences, then by a separate argument infer the convergence of the middle integral, and finally by dominated convergence conclude that the outer integrals converge. We preface the proof of convergence with a few lemmas; the first of these considers the case of a single consecutive pairing, and will form the base case of an induction that handles maximal sequences of arbitrary length.

Lemma 3.5

(One consecutive pairing).

$$\begin{aligned} \lim _{\ell \rightarrow \infty } \int _{s< u< v < t} {\mathbb {E}}[\dot{X}_u^\ell \dot{X}_v^\ell ] \text {d}u\text {d}v = \frac{1}{2} {\mathbb {E}}[X^2_{st}] = \int _s^t \big [\tfrac{1}{2} R'(v) - \partial _2R(s,v) \big ] \text {d}v \end{aligned}$$

and the convergents are uniformly bounded by \(\lesssim (t-s)^{2H}\).

Proof

Considering that \(\dot{X}^\ell \) is a piecewise-constant, and that the integral on the right is therefore a finite sum, we can write

$$\begin{aligned} \int _{s< u< v< t} {\mathbb {E}}[\dot{X}^\ell _u \dot{X}^\ell _v] \text {d}u \text {d}v&= {\mathbb {E}} \int _{s< u< v < t} \dot{X}^\ell _u \dot{X}^\ell _v \text {d}u \text {d}v\\ {}&=\frac{1}{2} {\mathbb {E}}[(X_{st}^\ell )^2] \\&\xrightarrow {\ell \rightarrow \infty }\frac{1}{2} {\mathbb {E}}[X_{st}^{ 2}] \\&=\frac{1}{2} R(\Delta (s,t), \Delta (s,t)) \\&=\frac{R(s) + R(t)}{2} - R(s,t) \\&=\int _s^t \big [\tfrac{1}{2} R'(v) - \partial _2R(s,v) \big ] \text {d}v \end{aligned}$$

where we have used that \((X^\ell _{st})^2 \xrightarrow {\ell \rightarrow \infty } X_{st}^2\) in \(L^2\). For the second statement, we rely on the first two identities above and distinguish between the cases \(s^-_\ell = t^-_\ell \) and \(s^-_\ell < t^-_\ell \): in the former we have, using (12)

$$\begin{aligned} |{\mathbb {E}}[(X^\ell _{st})^2]| = |{\mathbb {E}}[\varrho _\ell ^{-2} (t-s)^2 X_{s^-_\ell s^+_\ell }^2]| \lesssim \varrho _\ell ^{2H-2} (t-s)^2 \le (t-s)^{2H} \end{aligned}$$

since

$$\begin{aligned} \Big (\frac{t-s}{\varrho _\ell } \Big )^{2-2H} \le 1 \end{aligned}$$

by \(H < 1\) and \(t-s \le \varrho _\ell \). Let now \(s^-_\ell < t^-_\ell \):

$$\begin{aligned} |{\mathbb {E}}[(X^\ell _{st})^2]|&= |{\mathbb {E}}[(X^\ell _{ss^+_\ell } + X^\ell _{s^+_\ell t^-_\ell } + X^\ell _{t^-_\ell t})^2]| \\&= \Big |{\mathbb {E}}\Big [ \big (\varrho ^{-1}(s^+_\ell - s)X^\ell _{s s^+_\ell } + X_{s^+_\ell t^-_\ell } + \varrho ^{-1}(t-t^-_\ell ) X_{t^-_\ell t}\big )^2 \Big ]\Big | \\&\lesssim \varrho ^{-2}(s^+_\ell - s)^2 {\mathbb {E}}[X_{ss^+_\ell }^2] + {\mathbb {E}}[X_{s^+_\ell t^-_\ell }^2] + \varrho ^{-2} (t-t^-_\ell )^2 {\mathbb {E}}[X_{t^-_\ell t}^2] \\&\lesssim (s^+_\ell - s)^{2H} + (t^-_\ell - s^+_\ell )^{2H} + (t-t^-_\ell )^{2H} \\&\lesssim (t-s)^{2H} \end{aligned}$$

by \({\mathcalligra{l}}^2\)-Jensen’s inequality, the previous case, and again (12). \(\square \)

The case of several consecutive pairings is more difficult to handle, and in Proposition 3.9 convergence of these terms will be bootstrapped from terms that only contain shorter sequences of consecutive pairings, and the above single case, by means of an inductive argument. It is worth remarking that the plausible strategy of handling these integrands together with the others by integrating only one of the variables fails:

Remark 3.6

(Lack of convergence of \({\mathbb {E}}[X_{uv}^\ell \dot{X}_v^\ell {]}\)). One way of dealing with sequences of consecutive pairings is by rewriting them as

$$\begin{aligned} \begin{aligned}&\int _{s< u_1< v_1< \ldots< u_n< v_n < t} {\mathbb {E}}[\dot{X}^\ell _{u_1} \dot{X}^\ell _{v_1}]\cdots {\mathbb {E}}[\dot{X}^\ell _{u_n} \dot{X}^\ell _{v_n}] \text {d}u_1 \text {d}v_1 \cdots \text {d}u_n \text {d}v_n \\&\quad = \int _{\Delta ^n[s,t]} {\mathbb {E}}[X^\ell _{sv_1} \dot{X}^\ell _{v_1}] {\mathbb {E}}[X^\ell _{v_1 v_2} \dot{X}^\ell _{v_2}] \cdots {\mathbb {E}}[X^\ell _{v_{n-1} v_k} \dot{X}^\ell _{v_n}] \text {d}v_1 \cdots \text {d}u_n. \end{aligned} \end{aligned}$$
(51)

This has the benefit of expressing the convergents as integrals over n, and not 2n, variables. The problem with this strategy is that it does not hold that \({\mathbb {E}}[X_{uv}^\ell \dot{X}_v^\ell ] \xrightarrow {\ell \rightarrow \infty } \frac{1}{2} R'(v) - \partial _2R(u,v)\): a simple calculation reveals

$$\begin{aligned}&{\mathbb {E}}[X_{uv}^\ell \dot{X}_v^\ell ] \\&\quad = \varrho ^{-1}_\ell \Big [ \big (1-\varrho ^{-1}_\ell (v-v^-_\ell )\big )R(v^-_\ell , \Delta (v^-_\ell , v^+_\ell )) + \varrho ^{-1}_\ell (v-v^-_\ell ) R(v^+_\ell ,\Delta (v^-_\ell ,v^+_\ell )) \Big ] \\&\qquad -\varrho ^{-1}_\ell \Big [ \big (1-\varrho ^{-1}_\ell (u-u^-_\ell )\big )R(u^-_\ell , \Delta (v^-_\ell , v^+_\ell )) + \varrho ^{-1}_\ell (u-u^-_\ell ) R(u^+_\ell ,\Delta (v^-_\ell ,v^+_\ell )) \Big ]. \end{aligned}$$

While the second term converges to \(\partial _2R(u,v)\) (e.g. by the intermediate value theorem applied on the interval \([v^-_\ell ,v^+_\ell ]\)), the first does not converge in general. To see why, it suffices to take X to be Brownian motion and \(\pi _\ell \) to by a diadic sequence: the first term on the right above is then equal to \(\varrho ^{-1}_\ell (v-v^-_\ell )\) which is indeterminate in view of the fact that for v in a set of full Lebesgue measure its decimal expansion contains infinitely many 00’s and 11’s. The fractional case with \(H < 1/2\) appears even worse behaved, i.e. divergent in a possibly indeterminate fashion.

We now move outward in (50) and prove a lemma that will guarantee convergence of the middle integral, conditional on the convergence of the inner ones.

Lemma 3.7

Let \(f_\ell :[0,T]^m \rightarrow {\mathbb {R}}\) be a uniformly bounded sequence of functions that are continuous and piecewise smooth on the mesh \(\pi _\ell \). Assume that \(f_\ell \) converges to \(f :[0,T]^m \rightarrow {\mathbb {R}}\) uniformly. Then

$$\begin{aligned} \begin{aligned}&\int _{\Delta ^m[s,t]} f_\ell (u_1,\ldots ,u_m) \varrho ^{-m}_\ell \\&\quad \prod _{k = 1}^m \mathbb {1}_{[v^-_{k;\ell },v^+_{k;\ell })}(u_k) \text {d}u_1,\ldots ,\text {d}u_m \xrightarrow {\ell \rightarrow \infty }\mathbb {1}_{\Delta ^m[s,t]}(v_1,\ldots ,v_m) f(v_1,\ldots ,v_m) \end{aligned} \end{aligned}$$

where the convergence is a.e. in the variables \((v_1,\ldots ,v_m) \in [0,T]^m\). Moreover, the convergents are uniformly bounded by \(\sup _\ell \Vert f_\ell \Vert _\infty \).

Proof

The second statement holds by uniform boundedness of \(f_\ell \) and the fact that

$$\begin{aligned} \int _{\Delta ^m[s,t]} \varrho ^{-m}_\ell \prod _{k = 1}^m \mathbb {1}_{[v^-_{k;\ell },v^+_{k;\ell })}(u_k) \text {d}u_1,\ldots ,\text {d}u_n \le 1 . \end{aligned}$$

We will prove pointwise convergence on the subset

$$\begin{aligned}{}[0,T]^m_* {:}{=}\{(v_1,\ldots ,v_m) \in [0,T]^m \mid v_i \ne v_j \text { for } i \ne j\} \end{aligned}$$

of \([0,T]^m\) of full Lebesgue measure. For \((v_1,\ldots ,v_m) \in [0,T]^m_*\), we may, without loss of generality, start the sequence when \(\ell \) is already large enough so that \([v^-_{i;\ell },v^+_{i;\ell }) \cap [v^-_{j;\ell },v^+_{j;\ell }) = \varnothing \) for \(i \ne j\), where we are including \(v_0 {:}{=}s\) and \(v_{m+1} {:}{=}t\) in this requirement. By the mean value theorem applied individually to each \(u_k\), there exist \(w_{k;\ell } \in (v^-_{k;\ell },v^+_{k;\ell })\) s.t.

$$\begin{aligned} \int _{\Delta ^m[s,t]} f_\ell (u_1,\ldots ,u_m) \varrho ^{-m}_\ell \prod _{k = 1}^m \mathbb {1}_{[v^-_{k;\ell },v^+_{k;\ell })}(u_k) \text {d}u_1,\ldots ,\text {d}u_m \\ = \mathbb {1}_{\Delta ^m[s,t]}(v_1,\ldots ,v_m) f_\ell (w_{1;\ell },\ldots ,w_{m;\ell }) \end{aligned}$$

and

$$\begin{aligned}&|\mathbb {1}_{\Delta ^m[s,t]}(v_1,\ldots ,v_m) f_\ell (w_{1;\ell },\ldots ,w_{m;\ell }) - \mathbb {1}_{\Delta ^m[s,t]}(v_1,\ldots ,v_m)f(v_1,\ldots ,v_m)|\\&\quad \le \mathbb {1}_{\Delta ^m[s,t]}(v_1,\ldots ,v_m) \big [|f_\ell (w_{1;\ell },\ldots ,w_{m;\ell }) - f(w_{1;\ell },\ldots ,w_{m;\ell })| \\&\qquad +|f(w_{1;\ell },\ldots ,w_{m;\ell }) - f(v_1,\ldots ,v_m)|\big ] \\&\quad \le \mathbb {1}_{\Delta ^m[s,t]}(v_1,\ldots ,v_m) \big [\Vert f_\ell - f \Vert _\infty + \omega ^f_{(v_1,\ldots ,v_m)}(\varrho _\ell ) \big ] \end{aligned}$$

where \(\omega ^f_{(v_1,\ldots ,v_m)}\) is the modulus of continuity of f at the point \((v_1,\ldots ,v_m)\). Both summands on the right hand side above vanish in the limit of \(\ell \rightarrow \infty \), the first by uniform convergence and the second by continuity of the uniform limit of continuous functions. \(\square \)

The next two results constitute the core of our argument. They both rely on the same induction used to reduce the length of consecutive pairings, the base case of which is provided by Lemma 3.5. To illustrate it at level 4, letting Y be a stochastic process (which below will be taken to be \(X^\ell \) and X) we have for \(\alpha \ne \beta \)

$$\begin{aligned} 2 {\mathbb {E}} \mathcal {S}(Y)^{\alpha \alpha \beta \beta }_{st} = {\mathbb {E}}\mathcal {S}(Y)^{\alpha \alpha }_{st} \cdot {\mathbb {E}} \mathcal {S}(Y)^{\beta \beta }_{st} - 2 {\mathbb {E}} \mathcal {S}(Y)^{\alpha \beta \alpha \beta }_{st} - 2{\mathbb {E}} \mathcal {S}(Y)^{\alpha \beta \beta \alpha }_{st} \end{aligned}$$

by the shuffle property (8), using identical distribution of components to group together \(2{\mathbb {E}} \mathcal {S}(Y)^{\alpha \alpha \beta \beta }_{st} = {\mathbb {E}} \mathcal {S}(Y)^{\alpha \alpha \beta \beta }_{st} + {\mathbb {E}} \mathcal {S}(Y)^{\beta \beta \alpha \alpha }_{st}\) (and similar on the right hand side), and using independence of components to write \({\mathbb {E}}[\mathcal {S}(Y)_{st}^{\alpha \alpha }\mathcal {S}(Y)_{st}^{\beta \beta }] = {\mathbb {E}}\mathcal {S}(Y)^{\alpha \alpha }_{st} \cdot {\mathbb {E}} \mathcal {S}(Y)^{\beta \beta }_{st}\). While the left hand side contains a sequence of two consecutive pairs, only sequences of consecutive pairs of length one appear on the right.

Lemma 3.8

(Dominating function). For \(P \in {\mathcal {P}}^n_m\) it holds that the integrand of the outermost integral of \(P^\ell _{st}\) expressed in the nested form (50), is absolutely bounded by an integrable function, uniformly in \(\ell \) and on \(\Delta ^m[s,t]\), so that \(|P^\ell _{st}|\lesssim (t-s)^{2(n-m)H'}\) for any \(1/4< H' < H\).

Proof

We begin by bounding expectations corresponding to non-consecutive pairings. As done in the proof of [3, Theorem 31], we now consider the terms

$$\begin{aligned} {\mathbb {E}}[\dot{X}_u^\ell \dot{X}^\ell _v] = \varrho ^{-2}_\ell R(\Delta (u^-_\ell , u^+_\ell ),\Delta (v^-_\ell , v^+_\ell )) \end{aligned}$$

in three different cases: for \(u^-_\ell = v^-_\ell \)

$$\begin{aligned} |{\mathbb {E}}[\dot{X}^\ell _u \dot{X}^\ell _v]| \lesssim \varrho _\ell ^{2H-2} \le (v-u)^{2H-2}. \end{aligned}$$

By Cauchy-Schwarz the same estimate as above holds in the case \(u^+_\ell = v^-_\ell \), with a constant in the second inequality given by the fact that \(v-u \le 2\varrho _\ell \). Let \(u^+_\ell < v^-_\ell \): we have, by (9) and for any \(H'\) as in the statement

$$\begin{aligned} |{\mathbb {E}}[\dot{X}_u^\ell \dot{X}^\ell _v]|&= \varrho ^{-2}_\ell \bigg |\int _{[u^-_\ell , u^+_\ell ] \times [v^-_\ell , v^+_\ell ]} \partial _{12}R(u,v) \text {d}u \text {d}v \bigg | \\&\lesssim \varrho ^{-2}_\ell \int _{[u^-_\ell , u^+_\ell ] \times [v^-_\ell , v^+_\ell ]} (v-u)^{2H-2} \text {d}u \text {d}v \\&\le (v_\ell ^- - u_\ell ^+)^{2H-2} \\&\lesssim ((v^+_\ell -u^-_\ell ) \wedge 1/2)^{2H'-2} \\&\le ((v-u)\wedge 1/2)^{2H'-2} \end{aligned}$$

In the second-last inequality we have used that there exists some L s.t. for all \(\ell \ge L\)

$$\begin{aligned} \vartheta ^{2H-2} \le (\vartheta + 2\varrho _\ell )^{2H'-2} \end{aligned}$$

for all \(\vartheta \in [\varrho _\ell , 1/2]\).

We now consider terms corresponding to maximal sequences of consecutive pairings, i.e.

$$\begin{aligned} \int _{s< u_1< v_1< \ldots< u_k< v_k < t} {\mathbb {E}}[\dot{X}^\ell _{u_1} \dot{X}^\ell _{v_1}]\cdots {\mathbb {E}}[\dot{X}^\ell _{u_k} \dot{X}^\ell _{v_k}] \text {d}u_1 \text {d}v_1 \cdots \text {d}u_k \text {d}v_k. \end{aligned}$$
(52)

It is always possible (e.g. by Kolmogorov’s extension theorem) to add independent components to X. With this in mind, by Wick’s theorem we may write the above integral as \({\mathbb {E}}\mathcal {S}(X^\ell )_{st}^{\alpha _1 \alpha _1 \ldots \alpha _k \alpha _k}\) with \(\alpha _i \ne \alpha _j\) for all \(i \ne j\). By the shuffle identity (8) we have

$$\begin{aligned} \begin{aligned}&\sum _{h = 0}^{n} \mathcal {S}(X^\ell )_{st}^{\alpha _1\alpha _1 \ldots \alpha _h \alpha _h \beta \beta \alpha _{h+1} \alpha _{h+1} \ldots \alpha _k\alpha _k} \\&\quad =\mathcal {S}(X^\ell )_{st}^{\alpha _1 \alpha _1 \ldots \alpha _k \alpha _k} \mathcal {S}(X^\ell )_{st}^{\beta \beta } - \sum _{0 \le i< j \le k} \mathcal {S}(X^\ell )_{st}^{\alpha _1 \alpha _1 \ldots \alpha _i \alpha _i \beta \alpha _{i+1} \alpha _{i+1} \ldots \alpha _j \alpha _j \beta \alpha _{j+1} \alpha _{j+1} \ldots \alpha _k \alpha _k} \\&\qquad -\sum _{0 \le i< j \le k} \mathcal {S}(X^\ell )_{st}^{\alpha _1 \alpha _1 \ldots \alpha _i \alpha _i \beta \alpha _{i+1} \alpha _{i+1} \ldots \alpha _{j+1} \beta \alpha _{j+1} \ldots \alpha _k \alpha _k} \\&\qquad -\sum _{0 \le i< j \le k} \mathcal {S}(X^\ell )_{st}^{\alpha _1 \alpha _1 \ldots \alpha _i \beta \alpha _i \ldots \alpha _j \alpha _j \beta \alpha _{j+1} \alpha _{j+1} \ldots \alpha _k \alpha _k} \\&\qquad - \sum _{0 \le i < j \le k}\mathcal {S}(X^\ell )_{st}^{\alpha _1 \alpha _1 \ldots \alpha _i \beta \alpha _i \ldots \alpha _j \beta \alpha _j \ldots \alpha _k \alpha _k} \\&\qquad - \sum _{h = 0}^k \Big (\mathcal {S}(X^\ell )_{st}^{\alpha _1 \alpha _1 \ldots \alpha _h \beta \alpha _h \beta \alpha _{h+1}\alpha _{h+1} \ldots \alpha _k \alpha _k} + \mathcal {S}(X^\ell )_{st}^{\alpha _1 \alpha _1 \ldots \alpha _{h-1}\alpha _{h-1} \beta \alpha _h\beta \alpha _h \ldots \alpha _k \alpha _k} \Big ). \end{aligned} \end{aligned}$$
(53)

When shuffling we have separated the cases in which all \(\alpha _h \alpha _h\) and \(\beta \beta \) occur as consecutive pairs, from those in which at least one such pair is separated. We now take expectations: note that both independence and equal distribution of components are used.

$$\begin{aligned} (k+1){\mathbb {E}}\mathcal {S}(X^\ell )_{st}^{\alpha _1 \alpha _1 \ldots \alpha _k \alpha _k \beta \beta } = {\mathbb {E}}\mathcal {S}(X^\ell )_{st}^{\alpha _1 \alpha _1 \ldots \alpha _k \alpha _k} \cdot \tfrac{1}{2}{\mathbb {E}}[(X^\ell )^2_{st}] - \sum _Q Q^\ell _{st} \end{aligned}$$
(54)

where we are summing over a finite number of diagrams Q whose longest sequence of consecutive pairings contains k pairs or fewer.

We now prove the statement in the case in which P has no single nodes, by induction on n. For \(n = 0\), \(P^\ell _{st} = \varnothing ^\ell _{st} \equiv 1\) there is nothing to show. Let \(n \ge 1\), and assume we have rewritten the integral according to (50) (where the middle integral may be skipped, since there are no Malliavin derivatives). If P is not given by a sequence of n/2 consecutive pairs, all maximal sequences of consecutive pairs in P consist of fewer than n pairs, and that thus the inductive hypothesis applies to them: this means that for each such sequence Q with k pairs, \(|Q^\ell _{uv}| \lesssim (v-u)^{2kH'}\). Using the bounds for the first two types of integrand derived in the first part of this proof, the statement for P then follows from Proposition 3.1 applied in the modified case of Remark 3.2 and with exponent \(H'\). Assume now \(n = 2(k+1)\) and let P be given by the diagram consisting of \(k+1\) consecutive pairs: the only thing needed to conclude the induction is the bound. This follows from (54) thanks to the inductive hypothesis and the boundedness statement of Lemma 3.5.

Finally, we consider the general case in which P may have single nodes. This follows again by writing \(P^\ell _{st}\) in nested form, bounding terms corresponding to non-consecutive pairings as done above, and bounding the middle integral in (50) thanks to the boundedness statement of Lemma 3.7. When invoking this lemma, \(f_\ell \) is going to be a product of terms of the form (52) (with the extrema s and t replaced with variables \(u_i\) and \(u_j\) already integrated in the outer or middle integral), which as already proved is bounded by \(\lesssim (t-s)^{2H'k}\): this yields the required bound overall. \(\square \)

Proposition 3.9

(Convergence). The functions \([0,T]^m \rightarrow {\mathbb {R}}\) of Definition 3.3 individually converge a.e. to those of Definition 2.1: for \(P \in {\mathcal {P}}^n_m\) it holds that

$$\begin{aligned} P^\ell _{st} \xrightarrow {\ell \rightarrow \infty } P_{st}. \end{aligned}$$
(55)

Moreover \(|P_{st}| \lesssim |P|_{st}\) (the integrals of Proposition 3.1) uniformly on \(\Delta ^m[s,t]\).

Proof

The inequality is an absolute estimate of \(P_{st}\) using (9) and (10). The structure of the proof of the first statement closely mirrors that of the previous lemma: we first consider the case in which P does not have single nodes. For \(u^-_\ell < v^-_\ell \)

$$\begin{aligned} {\mathbb {E}}[\dot{X}_u^\ell \dot{X}^\ell _v] = \varrho ^{-2}_\ell R(\Delta (u^-_\ell ,u^+_\ell ),\Delta (v^-_\ell ,v^+_\ell )) = \partial _{12}R({\overline{u}}, {\overline{v}}) \end{aligned}$$

for some \({\overline{u}} \in (u^-_\ell ,u^+_\ell )\), \({\overline{v}} \in (v^-_\ell ,v^+_\ell )\), by the intermediate value theorem applied twice. Pointwise convergence \({\mathbb {E}}[\dot{X}_u^\ell \dot{X}^\ell _v] \rightarrow \partial _{12}R(u,v)\) then holds by continuity of \(\partial _{12}R\) and thanks to the fact that for any \(u < v\) there exists L s.t. \(u^-_\ell < v^-_\ell \) for all \(\ell \ge L\). This takes care of convergence of terms corresponding to non-consecutive pairings (of course, the same holds for consecutive pairings, but is not useful since \(\partial _{12}R(u,v)\) may not be integrable in this case).

We now proceed by induction on n. For \(n = 0\) there is nothing to prove, so let \(n \ge 1\) and first consider the case in which P is not given by a sequence of n/2 consecutive pairs: the statement follows by dominated convergence applied to the outer integral in (50), by the above and the inductive hypothesis applied to sequences of consecutive nodes of length less than n, in conjunction with Lemma 3.8. Let now \(n = 2(k+1)\) and let P be given by the diagram consisting of \(k+1\) consecutive pairs: recalling the argument (and indexing notation) of the previous proof, we have \(P_{st}^\ell = {\mathbb {E}}\mathcal {S}(X^\ell )_{st}^{\alpha _1 \alpha _1 \ldots \alpha _k \alpha _k \beta \beta }\), which is convergent since \(\mathcal {S}(X^\ell )_{st} \rightarrow \mathcal {S}(X)_{st}\) in \(L^2\). By the same calculation of (53) applied to X instead of to \(X^\ell \), and taking expectations

$$\begin{aligned} \begin{aligned}&(k+1)\lim _{\ell \rightarrow \infty } P_{st}^\ell \\&\quad =(k+1){\mathbb {E}}\mathcal {S}(X)_{st}^{\alpha _1 \alpha _1 \ldots \alpha _k \alpha _k \beta \beta } \\&\quad = {\mathbb {E}}\mathcal {S}(X)_{st}^{\alpha _1 \alpha _1 \ldots \alpha _k \alpha _k} \cdot \tfrac{1}{2}{\mathbb {E}}[(X)^2_{st}] \\&\qquad - \sum _{0 \le i< j \le k} {\mathbb {E}}\mathcal {S}(X)_{st}^{\alpha _1 \alpha _1 \ldots \alpha _i \alpha _i \beta \alpha _{i+1} \alpha _{i+1} \ldots \alpha _j \alpha _j \beta \alpha _{j+1} \alpha _{j+1} \ldots \alpha _k \alpha _k} \\&\qquad -\sum _{0 \le i< j \le k} {\mathbb {E}}\mathcal {S}(X)_{st}^{\alpha _1 \alpha _1 \ldots \alpha _i \alpha _i \beta \alpha _{i+1} \alpha _{i+1} \ldots \alpha _{j+1} \beta \alpha _{j+1} \ldots \alpha _k \alpha _k} \\&\qquad -\sum _{0 \le i< j \le k} {\mathbb {E}}\mathcal {S}(X)_{st}^{\alpha _1 \alpha _1 \ldots \alpha _i \beta \alpha _i \ldots \alpha _j \alpha _j \beta \alpha _{j+1} \alpha _{j+1} \ldots \alpha _k \alpha _k} \\&\qquad - \sum _{0 \le i < j \le k}{\mathbb {E}}\mathcal {S}(X)_{st}^{\alpha _1 \alpha _1 \ldots \alpha _i \beta \alpha _i \ldots \alpha _j \beta \alpha _j \ldots \alpha _k \alpha _k} \\&\qquad - \sum _{h = 0}^k \Big ({\mathbb {E}}\mathcal {S}(X)_{st}^{\alpha _1 \alpha _1 \ldots \alpha _h \beta \alpha _h \beta \alpha _{h+1}\alpha _{h+1} \ldots \alpha _k \alpha _k} + {\mathbb {E}}\mathcal {S}(X)_{st}^{\alpha _1 \alpha _1 \ldots \alpha _{h-1}\alpha _{h-1} \beta \alpha _h\beta \alpha _h \ldots \alpha _k \alpha _k} \Big ). \end{aligned} \end{aligned}$$
(56)

We now expand the product: by the inductive hypothesis and Lemma 3.5, and using Fubini’s theorem we have (setting \(u_0 =s = w_0\))

$$\begin{aligned}&{\mathbb {E}}\mathcal {S}(X)_{st}^{\alpha _1 \alpha _1 \ldots \alpha _k \alpha _k} \cdot {\mathbb {E}}[X^2_{st}] \\&\quad = \int _{s< u_1< \ldots< u_k< t} \big [ \tfrac{1}{2} R'(u_1) - \partial _2R(u_0,u_1) \big ] \cdots \big [ \tfrac{1}{2} R'(u_k) - \partial _2R(u_{k-1},u_k) \big ] \text {d}u_1 \cdots \text {d}u_k \\&\qquad \cdot \int _s^t \big [\tfrac{1}{2} R'(v) - \partial _2R(s,v) \big ] \text {d}v \\&\quad = \int _{\begin{array}{c} s< u_1< \ldots< u_k<t \\ s< v< t \end{array}} \big [ \tfrac{1}{2} R'(u_1) - \partial _2R(u_0,u_1) \big ] \cdots \big [ \tfrac{1}{2} R'(u_k) - \partial _2R(u_{k-1},u_k) \big ] \\&\qquad \cdot \big [\tfrac{1}{2} R'(v) - \partial _2R(s,v) \big ] \text {d}u_1 \cdots \text {d}u_k \text {d}v \\&\quad =\sum _{j = 0}^k\int _{s< u_1< \ldots< u_j< v< u_{j+1}< \ldots< u_k < t}\big [ \tfrac{1}{2} R'(u_1) - \partial _2R(u_0,u_1) \big ] \cdots \big [ \tfrac{1}{2} R'(u_k) - \partial _2R(u_{k-1},u_k) \big ] \\&\qquad \cdot \big [\tfrac{1}{2} R'(v) - \partial _2R(s,v) \big ] \text {d}u_1 \cdots \text {d}u_k \text {d}v. \end{aligned}$$

Note that the use of Fubini’s theorem is justified in view of (10) applied to absolutely bound each integral above, and Proposition 3.1. Writing

$$\begin{aligned} \partial _2R(\Delta (x,y),z) {:}{=}\partial _2R(y,z) - \partial _2R(x,z) = \int _x^y \partial _{12}R(w,z) \text {d}w \text {,} \end{aligned}$$

we expand each summand:

It now follows by substitution into the sum \(\sum _{j = 0}^k\) and simplifying in (56) that \(\lim _\ell P^\ell _{st} = P_{st}\).

Finally, we consider diagrams that contain single nodes. In order to invoke Lemma 3.7 we must argue that \(f_\ell \rightarrow f\) uniformly (uniform boundedness holds by the previous lemma). This again follows from the fact that \(f_\ell \) can be written as a product of expected signatures of \(X^\ell \), each of which converges uniformly in \(\ell \) as a function of its extrema: recalling the notations for truncation and projection introduced in Sect. 1 and the definition of inhomogeneous p-variation distance [21, §8.1.2], we have

for \(p > (1/H) \vee n\), where we have used [18, Theorem 1]. The statement now follows once again by dominated convergence and Fubini’s theorem. \(\square \)

We are ready to put it all together:

Proof of Theorem 2.3

$$\begin{aligned} {\mathcalligra{w}}^m \mathcal {S}(X)_{st}^{\gamma _1,\ldots ,\gamma _n}&= \frac{1}{m!}\delta ^m \big ( {\mathbb {E}}\mathcal {D}^m \mathcal {S}(X)_{st}^{\gamma _1,\ldots ,\gamma _n} \big ) \end{aligned}$$
(57)
$$\begin{aligned}&= \frac{1}{m!}\delta ^m \lim _{\ell \rightarrow \infty } \big ( {\mathbb {E}}\mathcal {D}^m \mathcal {S}(X^\ell )_{st}^{\gamma _1,\ldots ,\gamma _n} \big ) \end{aligned}$$
(58)
$$\begin{aligned}&= \delta ^m \sum _{P \in {\mathcal {P}}^n_m} \lim _{\ell \rightarrow \infty }P_{st}^{\ell ;\gamma _1,\ldots ,\gamma _n} \end{aligned}$$
(59)
$$\begin{aligned}&= \sum _{P \in {\mathcal {P}}^n_m} \delta ^m P^{\gamma _1,\ldots ,\gamma _n}_{st} \end{aligned}$$
(60)

In (57) we have used Stroock’s formula (23), which is possible since \(\mathcal {S}(X)_{st}^{\gamma _1,\ldots ,\gamma _n} \in {\mathbb {D}}^{\infty ,2}\): this is because \(\mathcal {S}(X^\ell )_{st} \rightarrow \mathcal {S}(X)_{st}\) in \(\bigoplus _{k \le n}{\mathscr {W}}^k\) which is closed in \(L^2 \Omega \). In (58) we have used that convergence of \(\mathcal {S}(X^\ell )_{st}\) actually holds in \({\mathbb {D}}^{\infty ,2}\), since the norm of \({\mathbb {D}}^{\infty ,2}\) is dominated by the \(L^2\) norm in bounded Wiener chaos [35, Proposition 1.2.2]. (59) uses Lemma 3.4 and (60) is just the statement (required by our definition of membership of a function to \({\mathcal {H}}^{\otimes m}\) Definition 1.1) that \(P_{st}^{\ell ;\gamma _1,\ldots ,\gamma _n}\) converges a.e. boundedly to \(P^\ell _{st}\), which holds by Proposition 3.9 and Lemma 3.8. As argued in the previous two proofs, \(P_{st}^{\gamma _1,\ldots ,\gamma _n}\) can always be expressed as the expected signature evaluated on a word, up to augmenting X with independent copies of itself: this can be used to infer that each \(P_{st}^{\gamma _1,\ldots ,\gamma _n}\)—not just their sum—belongs to \({\mathbb {D}}^{\infty ,2}({\mathcal {H}}^{\otimes m})\). This concludes the proof of the main result. \(\square \)

5 Conclusions and further directions

By providing a single formula for the expected signature of fractional Brownian motion that holds for any Hurst parameter \(H \in (1/4,1)\), this article closes a gap in the literature left open by [3]. Along the way, we have had opportunity to consider numerous other aspects of our computation, such as similar formulae for higher levels of the Wiener chaos expansion of the signature, and other examples of Gaussian processes.

We believe this work recommends a variety of applications and further investigations. First and foremost, it would be interesting to write stochastic Taylor expansions as suggested by Example 2.8, under precise conditions on the vector fields, and by providing bounds on the mean square error. Making this calculation rigorous and providing precise asymptotic estimates such as those in [37] would be an interesting result, which could be applied to approximation problems for Gaussian RDEs on manifolds such as those considered in [1] for SDEs (although for this precise problem, the joint signature \(\mathcal {S}(X,t)\) would have to be considered). A further step would involve proving conditional versions of the results in this paper, which would make it possible to estimate the error generated by multiple steps in an Euler scheme.

The fact that (e.g. for fBm) the integral \({\mathbb {E}}\mathcal {S}(X)^{\alpha _1\alpha _1 \cdots \alpha _k\alpha _k}_{st}\) with \(\alpha _i \ne \alpha _j\) is actually convergent for any \(H > 0\) raises the question of whether something can be said about the sequence \(\mathcal {S}(X^\ell )^{\alpha _1\alpha _1 \cdots \alpha _k\alpha _k}_{st}\), i.e. by considering the particular word on which \(\mathcal {S}(X^\ell )\), which is not convergent in probability for \(H \le 1/4\), is evaluated.

It would be interesting to express the expected signature of a Gaussian process as the exponential of a formal series of tensors, thus computing its signature cumulants [8]: this is how the expected signature of Brownian motion (2) is usually presented (with the series a finite sum), but the analogous formulation for Gaussian processes that are not martingales appears more difficult to write down.

A more computational goal, though not one that appears trivial, is to explicitly compute Theorem 2.3 for certain semimartingales, such as the Brownian bridge, for which the integrals are all analytically solvable. An interesting question is how the relationship between Brownian motion and Brownian bridge is reflected by their expected signatures. It would also be helpful to see whether similar formulae to ours can be made available for non-centred Gaussian processes, e.g. general Ornstein–Uhlenbeck processes. Finally, it would be interesting to try to apply the main theorem to the Riemann–Liouville process Example 2.12.