On the Wiener Chaos Expansion of the Signature of a Gaussian Process

We compute the Wiener chaos decomposition of the signature for a class of Gaussian processes, which contains fractional Brownian motion (fBm) with Hurst parameter H ∈ (1 / 4 , 1) . At level 0 , our result yields an expression for the expected signature of such processes, which determines their law [CL16CL16]. In particular, this formula simultaneously extends both the one for 1 / 2 < H -fBm [BC07BC07] and the one for Brownian motion ( H = 1 / 2 ) [Faw03Faw03], to the general case H > 1 / 4 , thereby resolving an established open problem. Other processes studied include continuous and centred Gaussian semimartingales


Introduction
The signature of a path X : is a series of tensors which, up to "retracings", determines the image of X [22,6].The probabilistic counterpart to this result states that, in many cases of interest, the law of a stochastic process is determined by its expected signature [13], which is therefore seen to play a role for processes analogous to that of moments for random variables.The best-known example of an explicit formula for the expected signature of a stochastic process occurs in the case of Brownian motion: calling {e 1 , . . ., e d } the canonical basis of R d , we have This identity was first shown by [16,31], and later proved in a variety of different ways [2,20].The expected signature of Brownian motion has also been studied in the case in which the process is stopped upon hitting the boundary of a domain [29,5,27].
In [3] the authors derive an integral expression for the expected signature of fractional Brownian motion (fBm) with Hurst parameter H ∈ (1/2, 1).This result was extended in [7,4] to a more general class of Gaussian Volterra processes with sample paths that are more regular than Brownian motion, with the formula for the expected signature written in terms of the Volterra kernel.The method used involves a piecewise-linear interpolation of the paths of the process X, which reduces the calculation to that of a sum of mixed Gaussian moments, to which Wick's theorem applies, followed by a convergence argument.The expression in [3] does not, however, yield the correct prediction for the case of Brownian motion H = 1/2.When H < 1/2 it involves integrals that do not converge at all, and new ideas are needed to obtain a formula.On a technical level, the reason for these differences can be seen by considering the expression for the expected signature of a scalar 1/2 < H-fBm X at level 2: calling R(s, t) := E[X s X t ] the covariance function of X, the formula states that

ES(X)
(2) Integrating either of the two variables generates an evaluation (v − u) 2H−1 | u=v , which is only finite when H > 1/2 and indeterminate when H = 1/2.In fact, approximating X with a sequence of piecewise linear processes (X ℓ ) ℓ∈N one obtains a sequence of integrals (actually finite sums) s<u<v<t E[ Ẋℓ u Ẋℓ v ]dudv which converges to the above double integral when H > 1/2, to (t − s)/2 when H = 1/2 (as predicted by ( 2)), and continues to converge to (t−s) 2H /2 for 1/4 < H ≤ 1/2.When H ≤ 1/4 the iterated integrals (in particular the Lévy area) of smooth approximations of X do not converge in mean square, and other techniques (e.g.[36]) must be relied upon to define a rough path, and hence a signature.These rough paths present a number of differences with the canonical one defined for H > 1/4, and are therefore not considered in this paper.v ]dudv with X ℓ the sequence piecewise linear interpolations of X on a partition.On the left we have chosen H = 2/3, and the sequence of integrals converges to a finite improper integral, whereas on the right H = 1/3 and the on-and off-diagonal contributions diverge to opposite infinities.(The plots are oriented in different ways and the z-axis is rescaled, both for improved visibility.)This graphic has been created using Wolfram Mathematica.
What is needed to obtain a formula for the expected signature that also works in the case of negativelycorrelated increments 1/4 < H < 1/2 is a way of expressing the indeterminacy "∞−∞" explained in Figure 1.The trick for doing this is simple to describe: integrate out the first variable in (3) and, calling R(t) := R(t, t) the variance function of X, note that for H > 1/2 we have s<u<v<t We have replaced ∂ 2 R(v, v) with 1  2 R ′ (v), which can be done by symmetry of R: This is relevant to the case of (1/4, 1/2) ∋ H-fBm since, while ) is the infinite evaluation discussed earlier, the last integral in ( 4) is perfectly well defined.These integrands can be chained together on simplices, e.g.s<u<v<t [ ]dudv, and combined with the other types of integrand ∂ 12 R(w, z), to yield a formula that is very similar to that of [3], but continues to be convergent for 1/4 < H < 1/2 and agrees with (2) for H = 1/2.
Showing that the formula obtained by such substitution actually coincides with the expected signature for X in a broad class of Gaussian processes -essentially those Gaussian rough paths introduced in [15,30,19] with the imposition of a few additional smoothness and regularity requirements on the (co)variance function -is the main focus of this paper.In fact, our main result will prove a formula for the full Wiener chaos expansion of S(X), the 0 th level of which is the expectation.As far as we know, the expression for the positive chaos projections of the signature is not to be found in the literature even in the classical case of Brownian motion.While the expression of the positive levels of Wiener chaos is very similar in spirit to that of the 0 th , it requires us to use some Malliavin calculus in the setting of 1-parameter Gaussian processes, and results in technical complications in the proof of convergence.The main additional ingredients needed are Stroock's formula for the m-th Wiener chaos projection and a novel definition of multiple Wiener integral of a function.For the latter, it should be noted that while multivariate, deterministic integrands for Gaussian noise naturally live in a certain Hilbert space (which for fBm can be identified with a Sobolev space), we are interested in integrating functions of multiple times, i.e. [0,T ] m f (t 1 , . . ., t m )dX γ 1 t 1 • • • dX γm tm in a Skorokhod-type sense: this is achieved by approximating f with elementary integrands, and showing independence of the approximation.Computing the Wiener chaos projections of the signature of a Gaussian process X has the benefit of expressing S(X) as a sum of terms that are orthogonal in L 2 , something that has the potential to be used for various types of numerical calculations, e.g.estimates of Euler expansions for Gaussian rough differential equations.It should be mentioned that, while (in the cases considered) the expected signature already determines the law of X and therefore that of the Wiener chaos projections of S(X), it does not appear obvious how one may obtain the latter from the former directly.While fBm is the main example of a process for which our calculation is novel, we briefly also consider centred, continuous Gaussian semimartingales, such as the Brownian bridge returning to the origin and centred Ornstein-Uhlenbeck processes with deterministic initial condition.
As in the main reference article [3], the technique that underlies our proof is piecewise-linear approximation of X.The arguments needed to prove the result are however much more involved, for three essential reasons.First is the fact that we must perform and justify the substitution (4), which requires novel arguments for convergence; even proving finiteness of the integrals in the main formula requires more sophisticated bounds in the 1/4 < H < 1/2 case than it does in the H > 1/2 case (see Figure 2 for the simplest example of an observation that must be made when H < 1/2).Second is that Malliavin derivatives are involved for positive levels of the Wiener chaos and third is that our arguments must accommodate a wider class of Gaussian processes.While the substitution (4) may seem very natural, it does not emerge obviously from the proof that we have given here, and must instead be guessed in advance.Indeed, it is worth mentioning that the way in which we first derived the statement of the main result involved an entirely different approach, which made use of the Skorokhod-rough integral conversion formula [10,11], applied recursively to the RDE for the signature.Figure 2: A graphic representing the contour plot of (t − s) 2H−2 on {0 < s < t < 1} (on the left) and {0 < s < u < t < 1} with u ∈ [0, 1] fixed (on the right): the integral of the former is improper on the whole diagonal, while that of the later only at a point: when 0 < H < 1/2, only the latter converges.This graphic has been created using Wolfram Mathematica.
The outline of this proof can be found in the second named author's PhD thesis [17,Ch. 5].While this approach has the drawback of generating further technical problems, reason for which it is not the one presented here, it has the advantage of leading up constructively to the main formula.
This paper is organised as follows: in Section 1 we briefly introduce the class of Gaussian processes considered and the Malliavin calculus framework for them; we then use this language to identify functions as multiple Wiener integrands.In Section 2 we state the main result Theorem 2.3 and discuss a few consequences and examples that follow; in Section 3 we prove the main result; in Conclusions and further directions we outline some aspects that could be tackled in further research.Finally, it should be mentioned that in [3], in addition to the expected signature of 1/2 < H-fBm, the authors also compute the expected signature at levels 2 and 4 for 1/4 < H-fBm in a manner that does not obviously generalise to different processes or higher levels; while not necessary in our proofs, it is sensible to verify that our main result agrees with this calculation: this check is performed in Appendix A.

Background on Malliavin calculus for Gaussian processes
In this section we introduce the class of Gaussian processes to which this paper applies, establish some notation, and give a brief overview of the tools of Malliavin calculus that are necessary in the proof of the main result.We follow [35,34] for the general Malliavin calculus framework, [23] for its aspects that pertain to Gaussian processes indexed by a time parameter, and [9,10,11] for aspects regarding the rough path lifts of such processes.
Throughout this paper we will be working with a Gaussian process with i.i.d.components We assume X to be centred, i.e.EX ≡ 0, and for it to have deterministic initial condition X 0 = 0. We will write X st := X t − X s for the increments of X.By Gaussianity, the probability measure We will denote R( • ) the variance function of X, i.e.R(t) := R(t, t).The independence hypothesis implies that R is a diagonal matrix R αβ = δ αβ R αα , and the fact that they are identically distributed, R αβ = δ αβ R 11 will be determined by a single scalar function, which by abuse of notation we will also call R.Although our results can be conjectured to continue to hold in the case in which the components are not identically distributed, our proof will make essential use of this assumption.We define We assume X and R satisfy the conditions that make it possible to consider the signature of X, S(X), defined by the limit in L 2 of Stieltjes iterated integrals of smooth or piecewise-linear approximations of X, and carry out Malliavin calculus: these are existence of rough path lift and complementary Cameron-Martin regularity [9, Conditions 2] and non-degeneracy of R [9, Conditions 3].More elementary conditions that imply these may be found, for instance, in [10,11].The expected signatures of such processes characterise their law, i.e. if Y is any other process with a well-defined signature S(Y ) 0T (as a G(R d )-valued random variable) and ES(X) 0T = ES(Y ) 0T , then X and Y are equal in law: see [13,Example 6.7], a consequence, among other things, of the greedy estimate [12].We refer the reader to [14] for a treatment of the theory in the case of more general processes, whose expected signatures may not directly characterise the law of the process.
In addition to the standard conditions on R, we will have to assume a certain amount of smoothness of R together with bounds on its derivatives; the reasons for such hypotheses will be made clear in due course.We assume R( • , • ) is C 2 on the open simplex ∆[s, t] := {0 < s < t < T } and continuous on [0, T ] 2 , and that R( • ) is C 1 on (0, T ).The lack of smoothness assumptions of R( • , • ) on the diagonal {s = t} is crucial for the inclusion of (1/4, 1/2] ∋ H-fBm, which does not even have first partial derivatives on it.Furthermore, we assume there exists an H ∈ (0, 1) with the property that the sample paths of X are either H-Hölder, or are K-Hölder for all K < H; for fBm H will coincide with the Hurst parameter, but the letter H will be used for more general processes to denote the Hölder exponent/supremum of exponents.This the rough path above X will be of finite 1/H-variation or of finite p-variation for all p > 1/H.
We also need some quantitative estimates on the derivatives of R.Here and throughout the paper, the constant of proportionality implied by the use of ≲ may only depend on T, H and other general characteristics of the process X.We require where ∂ 2 denotes partial differentiation w.r.t. the second component and ∂ 12 denotes second-order mixed partial differentiation.Since R is not smooth on the diagonal, the following estimate for on-diagonal square increments of the covariance function, which already appeared in [15], must be required separately: We move onto the treatment of Malliavin calculus for X.We let H be the Hilbert space given by the completion of the following R-linear span of elementary functions w.r.t. the inner product Because of independence of components, H is equal to an orthogonal direct sum H 1 ⊕ . . .⊕ H d , and because of equal distribution the direct summands are all equal.Elements of H should be viewed as admissible deterministic integrands for dX, which are represented as Cauchy sequences of elementary integrands in E. This framework allows us to view the process as an isometry often called an isonormal Gaussian process.
The multiple Wiener integral is the operator defined by the adjoint property (which more generally characterises the divergence operator, when required on random arguments f ) where is the m th Malliavin derivative, defined as for f ∈ C ∞ (R n ) with derivatives (including the 0 th ) of polynomial growth, and extended as a closed operator to a certain domain D m,2 .H ⊙m denotes the subspace of H ⊗m (the tensor product taken in the category of Hilbert spaces) of symmetric tensors.D m takes a square-integrable random variable and returns a random element of H ⊙m , which in case of membership to E ⊙m (or otherwise a function member of H ⊙m in the sense of Definition 1.1 below) will be a function of m (time,index) pairs.Note that, while δ is symmetric in the sense that it is left invariant by permuting (time,index) jointly, it is not symmetric if only time variables or indices are permuted (e.g. it is possible to use δ to define a Lévy area -see Example 2.6 below).When D m Z is a function, as in the case (19), we denote its evaluation on m (time,index) pairs D (u 1 ,γ 1 ),...,(um,γm) Z; occasionally it may make more sense to suppress the indices in the notation, in which case we can just write D u 1 ,...,um Z.We may extend δ to a map δ m := H ⊗m → L 2 Ω by pre-composing with symmetrisation, and we have for This implies that multiple Wiener integration defines an isometry where the source is given the degree-wise rescaled inner product (f, g) → m!⟨f, g⟩ H ⊗m for f, g of the same degree and zero otherwise, and Ω is endowed with the sigma algebra generated by the process X t∈[0,T ] .The image of the m-th Wiener integral operator, the space of the random variables δ m (f ) with f ranging in H ⊙m , is called the m-th Wiener chaos of X.We denote it W m and the m-th Wiener chaos projection w m : L 2 Ω ↠ W m .Note that w 0 = E with values in W 0 = R, while W 1 is given by linear functionals of X.We thus have the Wiener chaos decomposition L 2 Ω = ∞ m=0 W m which means it is possible to represent any random variable in L 2 Ω (measurable w.r.t. to the sigma-algebra generated by X) as an L 2 -absolutely convergent series where The map (δ m ) −1 • w m admits an expression in terms of the Malliavin derivative: this is Stroock's formula, which states that for Z ∈ D m,2 we can write its Wiener chaos decomposition as the series We continue calling elements of E ⊗m elementary functions, in light of the fact that they can be identified with functions This is the map given by the product of the Kronecker deltas , each δ paired with the respective time variable.Since E ⊗m is dense in H ⊗m , elements of the latter may be identified as equivalence classes of Cauchy sequences in E ⊗m .While H ⊗m is not, in general, a space of functions, it is possible to uniquely associate elements of H ⊗m to certain measurable functions Definition 1.1 (Functions as elements of H ⊗m ).For a function f : ([0, T ] × [d]) m → R we will write f ∈ H ⊗m if there exist a Cauchy sequence (f n ) n ⊂ E ⊗m , uniformly bounded as a sequence of functions (according to the identification (25)), with f n → f a.e.In this case we will say that f represents lim f n ∈ H ⊗m .If f represents ϕ, ψ ∈ H ⊗m then ϕ = ψ: this is an immediate consequence of the following Lemma 1.2.Let (f n ) n be as in the above definition with f = 0. Then . Keeping in mind that H ∼ = (H 1 ) d we may therefore assume d = 1 and suppress indices.Following [23, p.588], we test the sequence with elementary functions: letting , g t 1 ,...,tm ∈ R uniformly bounded (and the sums finite) we have that (10) and (11) imply that the integrands are absolutely and uniformly bounded by . By dominated convergence ⟨ϕ, g⟩ H ⊗m = lim⟨f n , g⟩ H ⊗m = 0, where ϕ := lim f n in H ⊗m , and ϕ = 0 follows from the fact that g ranges in a dense set.■ In light of the aforementioned non-degeneracy condition on X, we also expect the converse to hold: if ϕ ∈ H ⊗m is represented by the functions f, g in the above sense, then f = g a.e.An example of a degenerate stochastic process, for which this property would not hold, is given by taking any process X and concatenating it with itself path by path; the resulting covariance function R would be invariant under transposing the intervals [0, T ) and [T, 2T ).We also note that, in specific cases, it is possible to describe H explicitly: if X is a fractional Brownian motion with Hurst parameter H ∈ (0, 1), the identity on E induces an isomorphism between H and the Sobolev space W 1/2−H,2 [24], which is a space of functions for H ∈ (0, 1/2] but not for H ∈ (1/2, 1).
We will mostly be considering Wiener integrals on simplices, which has the effect of quotienting out symmetry of the operator δ m .We will often resort to integral notation, e.g. if Wiener integrals of elements of E ⊗m , on the other hand, can be computed explicitly by using the adjoint property (17): for example, it can be checked that ) .The more general formula involves multivariate analogues of the Hermite polynomials (see [34, §2.7.2] and [17, p.244]).When X is a Gaussian martingale (but not necessarily if it is only a semimartingale), multiple Wiener integration on the simplex coincides with iterated Wiener-Itô integration.

The main result, some consequences
We begin this section with some more notation.We denote [n] := {1, . . ., n} the set with n elements.We will be concerned with iterated integrals on the n-simplex Because such integrals will involve the covariance function, integration variables will sometimes come in pairs.For m, n ∈ N we denote P n m the collection of partitions of subsets of [n] of cardinality n − m into sets of cardinality 2. Note that this means P n m = ∅ whenever n ̸ = m (mod 2) or m > n, but P n n has precisely one element, ∅: the empty set admits the empty collection of subsets as a partition, which vacuously belongs to P n n .For example, Q := {{1, 4}, {3, 8}, {5, 6}} ∈ P 8 2 viewed as a partition of the set {1, 3, 4, 5, 6, 8} ⊆ [8].For P ∈ P n m we will denote P := [n] \ ∪P (in the partition of the above example, Q = {2, 7}).It will convenient to use graphical notation to denote such objects, and for reasons that will become apparent shortly, for a pair {i, j} with i ≤ j we will distinguish between the consecutive case j = i + 1 and the non-consecutive one j > i + 1.The partition Q ∈ P 8  2 above is represented by We will refer to such graphics as diagrams.We have drawn one node for each i ∈ [n] that is not paired with a consecutive integer, and one node for each consecutive pair (in this case only {5, 6}); when counting nodes, a node corresponding to such a pair should be thought as having double weight.In our example, the 5 th node actually counts for positions 5 and 6.With this convention, for each non-consecutive pair {i, j} we have drawn an arc connecting the two nodes of positions i and j, and for each node corresponding to a consecutive pair we have drawn a line going upwards.Nodes that do not have a line or arc entering them correspond to elements of P , and we will call them single.Note that, by construction, there is never an arc between two consecutive nodes: this will be critical for convergence of the associated integrals described below.In the next sec- R by integrating over as many variables as there are non-single nodes in the diagram that represents P : call this number, which equals twice the number of non-consecutive pairs in P plus the number of consecutive ones, #P .This explains our choice for the above notation: each node either corresponds to an integration variable or to a free variable, i.e. a variable of which P γ 1 ,...,γn st is a function.We use the shorthands and the former will only be used when j > i + 1. Crucially, we are defining the second case as as a function The variables u k with k ∈ P are supplied as arguments, so in fact this is an integral over a disjoint union of up to m + 1 simplices (fewer if some of the elements of P are consecutive).The k th index in [d] n is given as argument to 1 γ k as a Kronecker delta: this means that P γ 1 ,...,γn st vanishes on all but one element of [d] m .The reason why we still consider P γ 1 ,...,γn st as a function on [d] m is that this is necessary to view it as an element of H ⊗m ; nevertheless, when the indices are fixed it will sometimes be convenient to just think of it as a function of m times.If m = 0, P γ 1 ,...,γn st is just a real number.
Remark 2.2.The presence of the second type of integrand in Definition 2.1 is the reason for the smoothness assumptions on the variance and covariance functions, which are not to be found in most of the literature on these topics: this is because it would be difficult to define integrals such as s<u<v<t 1 2 R(du) − R(s, du) 1  2 R(dv) − R(u, dv) as iterated Young integrals, without taking derivatives, since the variable u in its undifferentiated form appears after the integrator 1  2 R(du) − R(s, du); this of course is no longer an issue under our smoothness hypotheses, thanks to which the above integral is defined as the Lebesgue integral on the simplex s<u<v<t When P is represented by a diagram, we will decorate the nodes with labels.For example, the integral associated to (26) with labelling α, . . ., ϑ is given by This is viewed as a function of the variables u 2 , u 6 ranging on the simplex ∆ 2 [s, t], each paired with an index variable, which must respectively be equal to β, η for the expression not to vanish.The variable u 5 has been skipped, since it is the first term in the consecutive pair {5, 6}.We will show that integrals defined in this fashion are a.e.limits of Cauchy sequences in E ⊗m , which therefore uniquely represent elements of H ⊗m according to Definition 1.1.When taking multiple Wiener integrals of them, the indices corresponding to the nodes that represent free variables will become the coordinate processes that are being integrated against, e.g.
We are now ready to state the main theorem.
In particular, notice that w m S(X) γ 1 ,...,γn st can only be non-zero when m ≤ n and m ≡ n (mod 2).The most important case of this result is when m = 0: Corollary 2.4 (Expected signature of a Gaussian process).With notation as above, we have Remark 2.5 (Eliminating variables).While convergence rules out always considering integrands of the first type in (27) (which would mean allowing diagrams with arcs between consecutive nodes), one may wonder whether it is possible to only consider integrands of the second type, i.e. by integrating out one variable per pair and thus simplifying the presentation of the formula.This, however, is not possible in general, because of the additional constraint that requires two consecutive variables not to be both integrated out (for the expression to make sense as an integral).It is not difficult to see, for example, that in the following diagram at most two variables can be integrated out (unless the remaining integral can be solved or simplified analytically).Luckily, the only case in which it is necessary for convergence to integrate out certain variables (as specified in the second case of ( 27)), is when there are consecutive pairs: this is always possible, even when more than one pair in a row is consecutive, since we may always pick the first variable to integrate out (as done here -one could equivalently have chosen the second).Of course, there is always some number of additional variables that can be eliminated, but we do not immediately see a way of doing this in a maximal way that is canonical.
Example 2.6 (The Wiener chaos decomposition of S 3 (X) st ).We give the explicit expression for the Wiener chaos expansion of the signature truncated at level 3.These terms are especially significant, considering that they are the ones that define the rough path when 1/4 < H ≤ 1/3: higher signature terms can be derived in a pathwise fashion by Lyons's extension theorem without involving probability.We represent each signature term as a sum of their Wiener chaos projections in ascending order; in particular the sum of all non-random terms constitutes the expectation of the left hand side.

S(X)
In particular, notice how the expected signature of level 2 is given by the difference between the average of the variances and the covariance: and that the statement that "the Itô and Stratonovich Lévy areas are equal" carries over to the Gaussian Wienerrough setting, in the sense that by symmetry of the covariance function.
Example 2.7 (ES(X) (4) ).Corollary 2.4 at level 4 is given by Using a clever transformation, [3,Theorem 34] are able to compute ES(X) (2) 01 and ES(X) 01 for 1/4 < H-fBm.Their formulae are specific to the cases n = 2, 4 and X a fBm, and are quite different to those given by Theorem 2.3.That the two coincide is immediate at level 2 by (31), and in Appendix A we perform this check at level 4.
The following example shows how Theorem 2.3 has the potential to generate insight into numerics of numerical schemes for rough differential equations driven by Gaussian signals. is an RDE (rough differential equation) driven by the Gaussian rough path X (defined by the first 1, 2 or 3 levels of S(X), depending on how rough X is).Proceeding formally, and denoting by V γ 1 • • • V γn composition of vector fields (and using Einstein notation), we can then expand the solution Y as The expansion on the first line can be viewed as the extension to the Gaussian case of Stratonovich-Taylor series, the one on the last line can be viewed as that of Itô-Taylor series [25].The latter has the advantage that its terms fit in well with the Wiener chaos decomposition of Y t , although it should be observed that w m Y t is represented as an infinite series, namely the second sum in the last line above.Also, this expansion cannot be expected to coincide with the Wiener chaos decomposition of Y t if it is performed at times other than 0, with Y 0 = y 0 deterministic.This is because, unless X is a martingale, the Wiener chaos isometries will not hold conditionally on F s .Remark 2.9 (Stationarity and joint stationarity of increments).X is stationary if and only if we may write for some function R : [0, T ] → R d×d .In this case we have An example of a centred stationary Gaussian process is the stationary Ornstein-Uhlenbeck process e −t/2 W e t where W is a Brownian motion and t ∈ [0, T ]: its covariance function is R(s, t) = e −(t−s)/2 for s ≤ t.This process however, strictly speaking, is not among those considered here, as it has random initial condition.
There is a much weaker property that results in a similar simplification.We will say that a stochastic process X has jointly stationary increments if for all s 1 ≤ t 1 , . . ., s n ≤ t n the distribution of the random vector of increments (X s 1 t 1 , . . ., X sntn ) only depends on the differences t 1 −s 1 , . . ., t n −s n and s 2 −s 1 , . . ., s n −s n−1 (if n = 1 the latter condition vanishes, and ordinary stationarity of increments is recovered).If X is Gaussian this need only be required for n = 2, and if it holds we may write for some function R : [0, T ] 3 → R d×d .This property is satisfied by fBm, since if H is the Hurst parameter we have If X has jointly stationary increments Although similar simplifications are not available for ∂ 2 R(s, t) and R ′ (t) individually (as they are in the stationary case), they are for their difference: indeed, using that R( • , 0) ≡ 0, we have which implies We therefore conclude that joint stationarity of increments, though a much more general property than stationarity, results in the same simplifications that are of relevance to Theorem 2.3, namely that ∂ 12 R(s, t) and This can be of aid in simplifying the expression of the integrals in the formula for w m S(X), since it is possible to perform substitutions of the form v ij = u j − u i .It does not, however, guarantee that these integrals become analytically solvable, as simple examples show (e.g. the integral We now consider a few examples of Gaussian processes to which our results apply; in all cases, X will have i.i.d.components, and we will use R to denote the scalar covariance function of each component.Arguably the most important example of a stochastic process for which the signature has not yet been computed is fractional Brownian motion in the regime of negatively-correlated increments: Example 2.10 ((1/4, 1/2) ∋ H-fBm).Fractional Brownian motion with Hurst parameter H ∈ (0, 1) (H-fBm), introduced in [32], is a scalar centred Gaussian process with covariance function It is not a semimartingale unless H = 1/2, in which case it is Brownian motion.Here we consider the case H ∈ (1/4, 1/2): this is well known to satisfy the preliminary hypotheses required in Section 1, and the smoothness conditions and bounds are simple to verify.Indeed, the integrands of interest for the formula of Theorem 2.3 are given by (s ≤ t) As predicted by Remark 2.9, these both are functions of t − s.
and we have By performing this substitution in Corollary 2.4 for the case of 1/2 < H-fBm (this means always applying the first case in (27), i.e. allowing arcs between consecutive nodes, which replace lines), we recover the formula of [3,Theorem 31] (note that the symmetry factor -meant to factor out permutations of pairings and transpositions within each pair -is not present in our case, since we are summing over pairings and not permutations).Other examples of processes in a similar regularity regime are those Gaussian Volterra processes with strictly regular kernels considered in [7].
The following is another example of a fractional, non-semimartingale process.
Example 2.12 (The Riemann-Liouville process).Another centred continuous Gaussian process, originally introduced in [26] and subsequently in [32], is the Riemann-Liouville process with Hurst parameter H ∈ (0, 1) (sometimes called "type-II fBm"), is a centred Gaussian process with covariance function [33, p.116-117] . ( 40) Like fBm, this process specifies to Brownian motion when H = 1/2 and is otherwise not a semimartingale.Their main difference between the two is that fBm has jointly stationary increments while for the Riemann-Liouville process not even single increments are stationary.We were not able to find a satisfactory expression for the derivatives of the covariance function of this process, and thus were not able to determine whether (for H > 1/4) it satisfies the conditions necessary for applying Theorem 2.3.However, we believe that examples such as this provide strong motivation for not confining our study to fBm and to allow for more general processes.
Another important restriction of the main result is the following case: Remark 2.13 (Gaussian martingales, [16]).When X is a continuous Gaussian martingale, its quadratic variation coincides with its variance function (as can be seen by the fact that X 2 t − R(t) is a martingale).The Dubins-Schwarz theorem then implies that X can be represented as the deterministically-reparametrised Brownian motion W R(t) .Assuming equal distribution of components, we can use this and the formula for the expected signature of Brownian motion (2) to compute Since by martingality ∂ 12 R(s, t) = 0 = ∂ 2 R(s, t) on s < t, Theorem 2.3 reduces to a sum of iterated integrals that only involve 1 2 R ′ , which coincides with the above formula.We conclude with two examples of centred, continuous Gaussian semimartingales which are not martingales and do not have stationary increments.

Example 2.14 (Brownian bridge returning to the origin). The Brownian Bridge returning to the origin at time
T is a process whose law is given by disintegrating the Wiener measure on the event W T = 0, where W is a d-dimensional Brownian motion starting at the origin.It can be written either as or adaptedly as (and X T = 0).Its covariance function is given by and the integrands of interest are thus It should be mentioned that X, as a process defined on [0, T ], fails the non-degeneracy condition [9, p.2125].This is, however, not a problem, as we can view it as defined on the interval [0, T − ε] and obtain the signature terms S(X) sT through a limiting argument.The bounds of ( 9), which in this example and the one below only involve linear terms, are easily checked (and indeed the first is not even sharp).Note that the iterated integrals of (43) can all be solved explicitly as polynomials.

Proof of the main result
Recall that we are using ≲ to denote inequalities whose constant of proportionality may only depend on T, H and other properties of a fixed process X.Since most of the arguments presented in this section only concern bounds and convergence, we will suppress indices (i.e.treat the scalar case) most of the time, so as not to clutter the notation.Given P ∈ P n m , denote |P | st the function ∆ m [s, t] → R defined analogously to Definition 2.1, but replacing each integrand 26) The following proposition guarantees that all the integrals considered in the main theorem are convergent.Proof.We proceed by induction on n − m.When P only has single nodes (m = n) the statement is trivial.We will proceed by considering several cases for the last node in P ; the simplest of these occurs when it is single: the statement follows immediately from the inductive hypothesis.For the next case, we will need the following bound: For a diagram C whose last node is the right endpoint of an arc, using the bound above we have where |C| ′ su 0 equals the integral representing |C| su 0 with the only difference that we are not integrating w.r.t. the variable u 0 in (u 0 − r) 2H−2 , which represents the arc that terminates at the last node of C. Similarly, if the last node in C is single, we have where C is not differentiated since it terminates in a node representing a free variable, u 0 .We now consider arcs: assume there are i arcs/lines within A, j within B, and that there are k arcs between nodes in A and nodes in B (collectively represented below by the dashed arc).Let A • and B • denote the diagrams given by eliminating such arcs from A and B: the nodes that have become single as a result now represent free variables, which we call w 1 , . . ., w k , z 1 , . . ., z k .We first consider the case in which j > 0: where we have used 2H(j + 1) − 1 ≥ 4H − 1 > 0 since H > 1/4.Note that the absolute values in the third-last expression can be removed by separately considering the cases H > 1/2 and H < 1/2.Assume instead j = 0: this means B must contain at least one node that is either single or paired with a node in A; it cannot be that B = ∅ or the diagram would contain an arc between two consecutive nodes, which is ruled out.The case in which there is a node in B which is single (see Figure 2) does not require H > 1/4: letting r denote the free variable represented by such a node, and proceeding similarly to the above, we have Finally, consider the case in which j = 0 and k > 0 (and B may have no single nodes): Once again, the absolute values distinguish between H ≶ 1/2.Expanding the product, we observe that three of the integrals feature products of different terms, each to the power of 2H − 1: in these, at least one of z k or u only appears once, which means this variable may be integrated out and the resulting term bounded (up to a constant) by (t − s) 2H , with the remaining integral solved similarly.The fourth integral instead is s<u<z<t (z − u) 4H−2 dudz which is finite again thanks to H > 1/4.This shows that we have ≲ (t − s) 2(i+k+1)H in the above expression and concludes the proof.■ Remark 3.2 (Modified |P |).We have stated the previous proposition under in the most natural manner; in particular note how, in the prototypical case of fBm, the integrals |P | st are multiples of P st .We will, however, additionally need a slightly modified version of this result, in which the definition of |P | is changed as follows: maximal sequences occurring in the middle of the expression for |P |, are replaced with their bound (v−u) 2kH , and each integrand That the statement continues despite these modifications to hold is obvious for the first, and for the second it follows from the facts that all integrals are still convergent (by the same proof) and the 1/2 can be replaced with 1/2 ∧ T and absorbed in the constant of proportionality.
Just like in [3], we approximate X piecewise linearly.Let X ℓ be a sequence of piecewise linear approximations of X along partitions π ℓ on [0, T ] with step size that vanishes as ℓ → ∞.It will be helpful to assume that the intervals in the mesh π ℓ all have the same length ϱ ℓ ; this simplifying assumption can be made because it is only necessary to show convergence along a sequence of such approximations, since it is known that the limit does not depend on the particular choice of π ℓ (or indeed on the type of piecewise smooth approximation in a broad class of these) [21,Ch. 15].For t ∈ [0, T ] we will write t − ℓ and t + ℓ to respectively denote the endpoints a and b of the interval of π ℓ s.t.t ∈ [a, b).Explicitly, X ℓ and its piecewise-defined derivative are given by where, as usual, X ab := X b − X a denotes the increment.In order to use Stroock's formula (23), we will be considering Malliavin derivatives of the signature of the piecewise-linear interpolations of X, which in turn requires us to consider those of the single factors: For P ∈ P n m , we provide a discretised analogue to Definition 2.1: as an element of E ⊗m , whose arguments are given to the functions Note how the above definition, unlike Definition 2.1 does not distinguish between consecutive and nonconsecutive pairings: this will only become important in the limit.Moreover, we are integrating over all n variables, including the u k with k ∈ P : this is because the time arguments of the function, v k , are supplied separately, with the respective index variables supplied as arguments to δ γ k , k ∈ P .The functions P ℓ st are summands in the expression of which we want to compute the limit: Lemma 3.4 (Expected Malliavin derivatives of signature approximations).
Proof.This is a consequence of ( 46), (47), the (iterated) Leibniz rule for the Malliavin derivative and Wick's formula for the mixed moments of a Gaussian vector (as it was already used in [2, Theorem 31]).The details are a matter of simple combinatorics; in particular note how, instead of summing over m! terms corresponding to the ways of permuting the m derivatives (for a fixed P ∈ P n m ), we are only including the term corresponding to the identity permutation and multiplying by m!, which identifies the same element of E ⊗m up to symmetry.■ In order to prove convergence, it is unfortunately not possible to argue by dominated convergence applied to Definition 3.3: this is because the factors in the integrand given by consecutive pairings E[ Ẋℓ;γ i u i Ẋℓ;γ i+1 u i+1 ] converge to non-integrable functions (e.g.(v − u) 2H−2 on ∆ 2 [s, t] for fBm) and the ones corresponding to Malliavin derivatives do not converge at all (in fact they converge, as distributions, to Dirac deltas δ v k ).The reason that convergence holds is that all these quantities are integrated.To successfully exploit this, we will write each integral P ℓ st as a nested integral, distinguishing between the three types of integrands: The outer integral contains the product of all terms E[ Ẋℓ;γ i u i Ẋℓ;γ j u j ] with |j − i| > 1.These are multiplied with the second integral, which integrates all factors coming from Malliavin derivatives.Finally, we partition the remaining integrands E[ Ẋℓ;γ h u h Ẋℓ;γ h+1 u h+1 ] into maximal sequences and integrate each individually: these integrals are integrands in the second integral, alongside the Malliavin derivatives.The operations of exchanging the order of integrals are all justified by Fubini's theorem, considering that all integrals are actually finite sums.We illustrate all of this with a simple example: consider the diagram (suppressing indices) According to Definition 3.3, we have Re-organising this expression as described in (50) we obtain Note that the domain of integration of the innermost integral can be described in terms of variables of the two outer integrals: this extends to the case in which there is more than one maximal sequence, by maximality, and is crucial for the factorisation into integrals over maximal sequences to be possible.The reason for the nested rewriting of (50) is that it will be possible to show convergence of the integrals over maximal sequences, then by a separate argument infer the convergence of the middle integral, and finally by dominated convergence conclude that the outer integrals converge.We preface the proof of convergence with a few lemmas; the first of these considers the case of a single consecutive pairing, and will form the base case of an induction that handles maximal sequences of arbitrary length.Lemma 3.5 (One consecutive pairing).
and the convergents are uniformly bounded by ≲ (t − s) 2H .
Proof.Considering that Ẋℓ is a piecewise-constant, and that the integral on the right is therefore a finite sum, we can write where we have used that (X ℓ st ) 2 ℓ→∞ − −− → X 2 st in L 2 .For the second statement, we rely on the first two identities above and distinguish between the cases s − ℓ = t − ℓ and s − ℓ < t − ℓ : in the former we have, using ( 12) by l 2 -Jensen's inequality, the previous case, and again (12).■ The case of several consecutive pairings is more difficult to handle, and in Proposition 3.9 convergence of these terms will be bootstrapped from terms that only contain shorter sequences of consecutive pairings, and the above single case, by means of an inductive argument.It is worth remarking that the plausible strategy of handling these integrands together with the others by integrating only one of the variables fails: ).One way of dealing with sequences of consecutive pairings is by rewriting them as This has the benefit of expressing the convergents as integrals over n, and not 2n, variables.The problem with this strategy is that it does not hold that While the second term converges to ∂ 2 R(u, v) (e.g. by the intermediate value theorem applied on the interval ), the first does not converge in general.To see why, it suffices to take X to be Brownian motion and π ℓ to by a diadic sequence: the first term on the right above is then equal to ϱ −1 ℓ (v − v − ℓ ) which is indeterminate in view of the fact that for v in a set of full Lebesgue measure its decimal expansion contains infinitely many 00's and 11's.The fractional case with H < 1/2 appears even worse behaved, i.e. divergent in a possibly indeterminate fashion.
We now move outward in (50) and prove a lemma that will guarantee convergence of the middle integral, conditional on the convergence of the inner ones.Lemma 3.7.Let f ℓ : [0, T ] m → R be a uniformly bounded sequence of functions that are continuous and piecewise smooth on the mesh π ℓ .Assume that f ℓ converges to f : [0, T ] m → R uniformly.Then where the convergence is a.e. in the variables (v 1 , . . ., v m ) ∈ [0, T ] m .Moreover, the convergents are uniformly bounded by sup ℓ ∥f ℓ ∥ ∞ .
Proof.The second statement holds by uniform boundedness of f ℓ and the fact that We will prove pointwise convergence on the subset where ω f (v 1 ,...,vm) is the modulus of continuity of f at the point (v 1 , . . ., v m ).Both summands on the right hand side above vanish in the limit of ℓ → ∞, the first by uniform convergence and the second by continuity of the uniform limit of continuous functions.■ The next two results constitute the core of our argument.They both rely on the same induction used to reduce the length of consecutive pairings, the base case of which is provided by Lemma 3.5.To illustrate it at level 4, letting Y be a stochastic process (which below will be taken to be X ℓ and X) we have for Proof.We begin by bounding expectations corresponding to non-consecutive pairings.As done in the proof of [3,Theorem 31], we now consider the terms By Cauchy-Schwarz the same estimate as above holds in the case u + ℓ = v − ℓ , with a constant in the second inequality given by the fact that v − u ≤ 2ϱ ℓ .Let u + ℓ < v − ℓ : we have, by (9) and for any H ′ as in the statement In the second-last inequality we have used that there exists some L s.t. for all ℓ ≥ L ϑ We now consider terms corresponding to maximal sequences of consecutive pairings, i.e.
It is always possible (e.g. by Kolmogorov's extension theorem) to add independent components to X.With this in mind, by Wick's theorem we may write the above integral as ES(X ℓ ) α 1 α 1 ...α k α k st with α i ̸ = α j for all i ̸ = j.By the shuffle identity (8) we have When shuffling we have separated the cases in which all α h α h and ββ occur as consecutive pairs, from those in which at least one such pair is separated.We now take expectations: note that both independence and equal distribution of components are used.
where we are summing over a finite number of diagrams Q whose longest sequence of consecutive pairings contains k pairs or fewer.We now prove the statement in the case in which P has no single nodes, by induction on n.For n = 0, P ℓ st = ∅ ℓ st ≡ 1 there is nothing to show.Let n ≥ 1, and assume we have rewritten the integral according to (50) (where the middle integral may be skipped, since there are no Malliavin derivatives).If P is not given by a sequence of n/2 consecutive pairs, all maximal sequences of consecutive pairs in P consist of fewer than n pairs, and that thus the inductive hypothesis applies to them: this means that for each such sequence Q with k pairs, |Q ℓ uv | ≲ (v − u) 2kH ′ .Using the bounds for the first two types of integrand derived in the first part of this proof, the statement for P then follows from Proposition 3.1 applied in the modified case of Remark 3.2 and with exponent H ′ .Assume now n = 2(k + 1) and let P be given by the diagram consisting of k + 1 consecutive pairs: the only thing needed to conclude the induction is the bound.This follows from (54) thanks to the inductive hypothesis and the boundedness statement of Lemma 3.5.
Finally, we consider the general case in which P may have single nodes.This follows again by writing P ℓ st in nested form, bounding terms corresponding to non-consecutive pairings as done above, and bounding the middle integral in (50) thanks to the boundedness statement of Lemma 3.7.When invoking this lemma, f ℓ is going to be a product of terms of the form (52) (with the extrema s and t replaced with variables u i and u j already integrated in the outer or middle integral), which as already proved is bounded by ≲ (t − s) Proof.The inequality is an absolute estimate of P st using ( 9) and (10).The structure of the proof of the first statement closely mirrors that of the previous lemma: we first consider the case in which P does not have single nodes.For u We now proceed by induction on n.For n = 0 there is nothing to prove, so let n ≥ 1 and first consider the case in which P is not given by a sequence of n/2 consecutive pairs: the statement follows by dominated convergence applied to the outer integral in (50), by the above and the inductive hypothesis applied to sequences of consecutive nodes of length less than n, in conjunction with Lemma 3.8.Let now n = 2(k + 1) and let P be given by the diagram consisting of k + 1 consecutive pairs: recalling the argument (and indexing notation) of the previous proof, we have , which is convergent since S(X ℓ ) st → S(X) st in L 2 .By the same calculation of (53) applied to X instead of to X ℓ , and taking expectations We now expand the product: by the inductive hypothesis and Lemma 3.5, and using Fubini's theorem we have (setting

Conclusions and further directions
By providing a single formula for the expected signature of fractional Brownian motion that holds for any Hurst parameter H ∈ (1/4, 1), this article closes a gap in the literature left open by [3].Along the way, we have had opportunity to consider numerous other aspects of our computation, such as similar formulae for higher levels of the Wiener chaos expansion of the signature, and other examples of Gaussian processes.We believe this work recommends a variety of applications and further investigations.First and foremost, it would be interesting to write stochastic Taylor expansions as suggested by Example 2.8, under precise conditions on the vector fields, and by providing bounds on the mean square error.Making this calculation rigorous and providing precise asymptotic estimates such as those in [37] would be an interesting result, which could be applied to approximation problems for Gaussian RDEs on manifolds such as those considered in [1] for SDEs (although for this precise problem, the joint signature S(X, t) would have to be considered).A further step would involve proving conditional versions of the results in this paper, which would make it possible to estimate the error generated by multiple steps in an Euler scheme.
The fact that (e.g. for fBm) the integral ES(X) with α i ̸ = α j is actually convergent for any H > 0 raises the question of whether something can be said about the sequence S(X ℓ ) α 1 α 1 •••α k α k st , i.e. by considering the particular word on which S(X ℓ ), which is not convergent in probability for H ≤ 1/4, is evaluated.
It would be interesting to express the expected signature of a Gaussian process as the exponential of a formal series of tensors, thus computing its signature cumulants [8]: this is how the expected signature of Brownian motion ( 2) is usually presented (with the series a finite sum), but the analogous formulation for Gaussian processes that are not martingales appears more difficult to write down.
A more computational goal, though not one that appears trivial, is to explicitly compute Theorem 2.3 for certain semimartingales, such as the Brownian bridge, for which the integrals are all analytically solvable.An interesting question is how the relationship between Brownian motion and Brownian bridge is reflected by their expected signatures.It would also be helpful to see whether similar formulae to ours can be made available for non-centred Gaussian processes, e.g.general Ornstein-Uhlenbeck processes.Finally, it would be interesting to try to apply the main theorem to the Riemann-Liouville process Example 2.12.

A Equivalence with [3] for the expected signature of fBm at level 4
In [3,Theorem 34] the authors check that the explicit integral expressions for ES(X) (n) 01 with n = 2, 4, previously derived for X a fractional Brownian motion of Hurst parameter H ∈ (1/2, 1), continue to be valid for H ∈ (1/4, 1).This is done by performing transformations on ES(X ℓ ) 01 before passing to the limit.This calculation is specific to levels 2 and 4 and to fBm, and for this reason the expression for the expected signature is not immediately comparable to that obtained as a special case of Theorem 2.3.We devote this appendix to checking that the two agree.
Level 2 is easy to check, since (31) reduces to δ αβ /2.Referring to Example 2.7, we consider level 4 using (39): starting with the first integral above, we have where the last identity can be verified by showing that the difference of the two integrands is odd about the point u = 1/2, which in turn is seen by observing that has zero derivative and vanishes at u = 1/2.This shows equality with [3, coefficient of the first term of Γ 2 H in Corollary 33].We proceed with the second integral in Example 2.7: s<u<v<w<t R(du, dw) 1  2 R(dv) − R(u, dv) dudvdw = H 2 (2H − 1) .
For the third integral we have In the integration by parts we have used that lim z→v + (z 2H − v 2H )(z − v) 2H−1 = 0 which can be shown by using that for 1/4 < H < 1/2 0 ≤ (z for 0 < v < z and H < 1/2.In the last identity we have used a similar symmetry argument as the one used in the first calculation, solved trivial integrals and rearranged terms.Note how this calculation would have been simpler if H ≥ 1/2 since it would not have been necessary to integrate by parts to avoid integrating (z − v) 2H−2 (cf.[3,Lemma 32]).

Figure 1 :
Figure 1: Here we compare the two behaviours, corresponding to H > 1/2 and H < 1/2, of 0<u<v<1 E[ Ẋℓ u Ẋℓv ]dudv with X ℓ the sequence piecewise linear interpolations of X on a partition.On the left we have chosen H = 2/3, and the sequence of integrals converges to a finite improper integral, whereas on the right H = 1/3 and the on-and off-diagonal contributions diverge to opposite infinities.(The plots are oriented in different ways and the z-axis is rescaled, both for improved visibility.)This graphic has been created using Wolfram Mathematica.

Theorem 2 . 3 (
Wiener chaos expansion of the signature of a Gaussian process).Given m, n ∈ N, P ∈ P n m , γ 1 , . . ., γ n ∈ [d], 0 ≤ s ≤ t ≤ T , it holds that P γ 1 ,...,γn st ∈ H ⊗m in the sense of Definition 1.1, and the m th Wiener chaos projection of the signature of X is given by

st by the shuffle property ( 8 )Lemma 3 . 8 (
, using identical distribution of components to group together 2ES(Y ) ααββ st = ES(Y ) ααββ st + ES(Y ) ββαα st (and similar on the right hand side), and using independence of components to write E[S(Y ) αα st S(Y ) ββ st ] = ES(Y ) αα st • ES(Y ) ββ st .While the left hand side contains a sequence of two consecutive pairs, only sequences of consecutive pairs of length one appear on the right.Dominating function).For P ∈ P n m it holds that the integrand of the outermost integral of P ℓ st expressed in the nested form (50), is absolutely bounded by an integrable function, uniformly in ℓ and on ∆ m [s, t], so that |P ℓ st | ≲ (t − s) 2(n−m)H ′ for any 1/4 < H ′ < H.
by the intermediate value theorem applied twice.Pointwise convergence E[ Ẋℓ ∂ 12 R(u, v) then holds by continuity of ∂ 12 R and thanks to the fact that for any u < v there exists L s.t.u − ℓ < v − ℓ for all ℓ ≥ L. This takes care of convergence of terms corresponding to nonconsecutive pairings (of course, the same holds for consecutive pairings, but is not useful since ∂ 12 R(u, v) may not be integrable in this case).