Projections of SDEs onto Submanifolds

In [ABF19] the authors define three projections of Rd-valued stochastic differential equations (SDEs) onto submanifolds: the Stratonovich, Ito-vector and Ito-jet projections. In this paper, after a brief survey of SDEs on manifolds, we begin by giving these projections a natural, coordinate-free description, each in terms of a specific representation of manifold-valued SDEs. We proceed by deriving formulae for the three projections in ambient $\mathbb R^d$-coordinates. We use these to show that the Ito-vector and Ito-jet projections satisfy respectively a weak and mean-square optimality criterion for small t: this is achieved by solving constrained optimisation problems. These results confirm, but do not rely on the approach taken in [ABF19], which is formulated in terms of weak and strong Ito-Taylor expansions. In the final section we exhibit examples showing how the three projections can differ, and explore alternative notions of optimality.


Introduction
Consider the following problem: we are given an autonomous ODĖ in R d , and a smooth embedded manifold M → R d . Let π be the metric projection of a tubular neighbourhood of M onto M (see ( ) below). We seek an M -valued ODE, i.e. a vector eld F on M , tangent at each point to M , with the property that the solution tȯ is optimal in the sense that the rst coe cient of the Taylor expansion in t = 0 of either is minimised for any initial condition X 0 = Y 0 = y 0 ∈ M . This requirement represents the slowest possible divergence of Y from the original solution X (resp. from its metric projection on M ), subject to the constraint of Y arising as the solution of a closed form ODE on M . It is an easy exercise (using ( ) below) to check that these optimisation problems both result in the same solution, which consists in F (y) being the orthogonal projection of the vector F (y) onto the tangent space T y M . The paper [ABF ], which is motivated by applications to non-linear ltering, explores an extension of this problem to the case of SDEs. The optimality criteria ( ) do not carry over in a straightforward fashion, and are formulated through the machinery of weak and strong Itô-Taylor expansions. In this chapter we tackle the same problem through a di erent perspective, which we proceed to describe.
In Section we begin with a survey of SDEs on manifolds. Here we introduce three ways of representing them: the Stratonovich, Schwartz-Meyer (or -jet) and Itô representations. The rst and second have the advantage of not requiring a connection on the tangent bundle of the manifold, the second and third are de ned in terms of the Itô integral, while the rst and third have vector coe cients. Focusing on the di usion case, we show how to pass from one representation to another. In Section we prepare the framework for manifolds M embedded in R d . These are entirely general Riemannian manifolds, due to the Nash embedding theorem, and have the advantage of being describable using ambient coordinates. We use this framework to study the equations introduced in the previous section, on embedded manifolds. In Section we associate to each manifold-valued SDE representation a natural projection, which gives rise to an SDE on a submanifold: the Stratonovich projection (de ned by projecting the Stratonovich coe cients), the Itô-jet projection (de ned by projecting the Schwartz morphism, or -jet, which de nes the SDE), and the Itô-vector projection (de ned by projecting the Itô coe cients, and interpreting the resulting equation w.r.t. the Riemannian connection on the embedded submanifold). These projections coincide with the ones introduced in [ABF ], but are given a more solid theoretical underpinning, which sheds light on their analytic and probabilistic properties. We then derive formulae for the three projections, preferring ambient coordinates to local coordinates. In Section we formulate the optimality criteria satis ed by the Itô-vector and Itô-jet projections using respectively an explicit weak and mean-square formulation, instead of invoking Itô-Taylor expansions as done in [ABF ]. This has the advantage of representing a more tangible property of the solution, and is accompanied by an argument, based on martingale estimates, used to deal with the problem of the solution exiting the tubular neighbourhood of M . Our main theorems Theorem . and Theorem . replicate the ndings [ABF , Theorem . and Theorem . ] in this new setting. The fact that the Stratonovich projection does not satisfy either of these optimality criteria is a con rmation of the fact that Itô calculus on manifolds can be of great interest. In Section we provide examples showing that the three projections are genuinely distinct, we prove the Itô projections are optimal also when formulating the optimality criteria using M 's intrinsic geometry, and explore notions of optimality that are satis ed by the more naïve Stratonovich projection.
Although the material presented here overlaps to a signi cant degree with the ideas of [ABF ], this paper -the contents of which also appear in the third author's PhD thesis [Fer ] -is entirely selfcontained. Moreover, we believe the framework chosen here has a number of advantages of which we hope to make use in future work, as described in Conclusions and further directions.

SDEs on manifolds
We begin this chapter with a primer on manifold-valued SDEs. Since manifolds, unlike Euclidean space, do not come naturally equipped with coordinates, especially not global ones, the challenge is to express an SDE using intrinsic, coordinate-free notions. Equivalently, one can de ne an SDE locally in an arbitrary chart, and show that the property of a process of being a solution does not depend on the chart. The coordinate-free de nition of a time-homogeneous ODE on a smooth, m-dimensional manifold M is well known: this consists of a tangent vector eld, i.e. a section of the tangent bundle of M , V ∈ ΓT M . We will denote Γ the set of sections of a bre bundle, i.e. the smooth right inverses to the bundle projection. A solution to the ODE de ned by V is a smooth curve X, de ned on some interval of R, with the property thatẊ t = V Xt for all t. This is a coordinate-free de nition, and in a chart ϕ : U → R m (U open set in M ) it corresponds to requiring that, writing ϕ(X t ) = ϕ X t and V x = ϕ V k x ∂ x ϕ k , we have ϕẊ k t = ϕ V k Xt for all t for which both sides are de ned. Notice the sum over k: this is the Einstein summation convention, which we will use throughout this thesis whenever possible; also, ∂ x ϕ k are the elements of the basis of T x M de ned by the chart ϕ: In this section we will give similar descriptions of Stratonovich and Itô (non path-dependent) SDEs on manifolds. From now on we will avoid the ϕ superscripts when no ambiguity occurs, e.g. the previous identity will be writtenẊ k t = V k Xt . We begin with the Stratonovich case, following mainly [É , Ch. VII], although the topic is well known. As for the familiar R d -valued case we will also need a driving semimartingale, which, given the context we are working in can be taken to be valued in another manifold N , of dimension n. Given a stochastic setup (Ω, F · , P ) satisfying the usual conditions, a continuous adapted stochastic process Z : Ω × R ≥0 → N is said to be a semimartingale if, for all f ∈ C ∞ N , f (Z) is a semimartingale. Just as for the ODE case, what is needed to de ne a Stratonovich SDE in M driven by Z is a section of some vector bundle: in this case, however, the bundle is no longer just T M , but Hom(T N, T M ) → M × N , i.e. the vector bundle of linear maps from T N to T M . An element F ∈ ΓHom(T N, T M ) corresponds to a smooth map in local coordinates (this requires choosing a chart both on N and on M ) as dX k t = F k γ (X t , Z t ) • dZ γ t on random intervals that make both sides of the expression well de ned. We will always use Greek letters as indices for the driving process, and Latin letters as indices for the solution. The key property that allows one to prove that the coordinate formulation of Stratonovich SDEs holds for all other charts (on the intersection of their respective domains) is that Stratonovich equations satisfy the rst order chain rule: clearly ( ) would not be similarly well de ned with Itô integration. One can also de ne a solution without invoking charts: this entails de ning a Stratonovich integral taking as integrator an M -valued semimartingale X and as integrand a previsible process H with values in the cotangent bundle of M and relatively compact image (locally bounded), s.t. at each t, H t is in the bre at X t : this yields an R-valued semimartingale which we can write as The angle brackets refer to dual pairing of vectors and covectors. This integral is characterised as being the unique map satisfying the following three properties Additivity. For all locally bounded previsible H, G above where df is the one-form given by taking the di erential of f . One can then use this integral to say that X solves ( ) if for all admissible integrands H (even just those arising as the evaluation of a one-form at X) where the * denotes dualisation.
Remark . (Autonomousness and explicitness). If N = R n we can call ( ) autonomous if F (z, x) does not depend on z, and if M = R m we can call it explicit if F (z, x) does not depend on x. However, in the general manifold setting these two concepts do not carry over, at least not unless N (resp. M ) is parallelisable, with a chosen trivialisation of its tangent bundle. An analogous consideration applies to other avours of SDEs introduced in this section.
Example . (Stratonovich di usion). An important example is the case where N = R ≥0 × R n and Z t = (t, W t ), W an n-dimensional Brownian motion, and F not depending explicitly on W . This means ( ) becomes . . , n. Stratonovich di usions are sections of the vector bundle i.e. elements of the vector space ΓDiff n Strat . Notice that the base space is not M × (R ≥0 × R n ), since independence of the Brownian motion allows us to forget the R n component.
We note that no additional structure on N and M , apart from their smooth atlas, is needed to de ne Stratonovich equations. Stratonovich SDEs are the most used in stochastic di erential geometry, as they behave well w.r.t. notions of rst order calculus: for instance, if there exists an embedded submanifold M of M such that F (y, z) maps to T y M for all z ∈ N and all y ∈ M , then the solution to the Stratonovich SDE de ned by F started on M will remain on M for the duration of its lifetime. This is evident from our intrinsic approach, by considering F | M ×N , but some authors who develop Stratonovich calculus on manifolds extrinsically prove this by showing that the distance between the solution and the manifold (embedded in Euclidean space) is zero [Hsu , Prop. . . ]. The existence and uniqueness of solutions to Stratonovich SDEs can be treated by using the Whitney embedding theorem to embed N and M in Euclidean spaces of high enough dimension, and smoothly extending F so that it vanishes outside a compact set containing the manifolds. Invoking the usual existence and uniqueness theorem (e.g. [Pro , Theorems -]), and the good behaviour of Stratonovich SDEs w.r.t. submanifolds, immediately proves that a unique solution exists up to a positive stopping time, provided F is smooth. We will mostly not be concerned with global-in-time existence in this thesis, although su cient conditions for such behaviour can usually be obtained by requiring global Lipschitz continuity w.r.t. complete Riemannian metrics.
We now pass to Itô theory on manifolds, as developed in [É , Ch.VI]. The di culty lies in the second order chain rule of the Itô integral. For this reason, we need to invoke structures of order higher than . Let the second order tangent bundle of M , TM , denote the bundle of second order di erential operators without a constant term, i.e. given a local chart ϕ containing x in its domain, an element of L x ∈ T x M consists of a map The coe cients L k x , L ij x obviously depend on ϕ, but their existence does not; moreover, requiring L ij x = L ji x ensures their uniqueness for the given chart ϕ. Note that if the L ij x 's vanish L x ∈ T x M . TM is given the unique topology and smooth structure that makes the projection TM → M , L x → x a locally trivial surjective submersion. Just as for the rst order case, there is an obvious notion of induced bundle map Tf : with the third term denoting symmetric tensor product, the rst map the obvious inclusion and the second map given by Roughly speaking, this means that TM is "noncanonically the direct sum of T M and T M T M ". This short exact sequence of course dualises to a short exact sequence of dual bundles. Elements of T * x M can always be represented as d x f , de ned by for some f ∈ C ∞ M (this is of course only true at a point: not all sections of TM are of the form df ). We now wish to de ne an Itô-type equation using second order tangent bundles instead of ordinary tangent bundles. For this we need a notion of eld of maps F(x, z) : T z N → T x M . Since the bundles in question are linear, it is tempting to allow F(x, z) to be an arbitrary linear map, but a more stringent condition is necessary to guarantee well-posedness: the correct requirement is that F(x, z) de ne a morphism of short exact sequences, i.e. a commutative diagram is then called a Schwartz morphism, and we can then view F as being the section of a sub-bre bundle Sch(N, M ) of Hom(TN, TM ) over M × N consisting of such maps, which we call the Schwartz bundle. Note that Sch(N, M ) is not closed under sum and scalar multiplication taken in the vector bundle Hom(TN, TM ), and thus can only be treated as a bre bundle. Now, given F ∈ ΓSch(N, M ), we will give a meaning to the SDE which we will call a Schwartz-Meyer equation. Heuristically, if X is an M -valued semimartingale the second order di erential dX t should be interpreted in local coordinates ϕ as where the rst di erential is an Itô di erential; this expression is seen to be invariant under change of charts, thanks to the Itô formula. Then, given charts ϕ in M and ϑ on N , and writing Computing the quadratic covariation matrix of X from the rst equation above, using the Kunita-Watanabe identity, and comparing with the second results in the requirement that which correspond precisely to the Schwartz condition ( ), and justi es this requirement. ( ) now reduces to its rst line, i.e. the Itô SDE on random intervals that make both sides of the expression well-de ned.
Example . (Schwartz-Meyer di usion). Proceeding as in Example . , but with Schwartz-Meyer equations, we can de ne the Schwartz-Meyer SDE where we can call F γ = σ γ the di usion coe cients, since they are elements of C ∞ (R ≥0 , ΓT M ); this also holds for γ = 0, but not for F αβ ∈ C ∞ (R ≥0 , ΓTM ). Therefore the coe cient of dt, the "drift", cannot be interpreted as a vector. Note that setting F γγ ≡ 0 does not guarantee that such coe cients will vanish w.r.t. another chart, since the transformation rule for them involves the F ij αβ 's which cannot vanish by the second Schwartz condition ( ); in other words, there is no way to do away with the non vector-valued drift in ( ). We can consider Schwartz Meyer di usions as being sections of the bre bundle This means that, similarly to the case of ( ) we are only considering F's that do not depend explicitly on the Brownian motion, and we are quotienting out the part that is not relevant for ( ).
Just as for Stratonovich SDEs, Schwartz-Meyer equations can also be seen to come from an integral Additivity. For all locally bounded previsible H, G above Associativity. For a real-valued, locally bounded adapted process λ Notice how Itô integration is used in the associativity axiom. The property of a process of being a solution of ( ) is then de ned in complete analogy to ( ). The recent paper [AB ] treats SDEs on manifolds using a representation which is similar to that of ( ), but which has a distinct advantage when it comes to numerical schemes. Here the authors focus on the autonomous di usion case, without explicitly taking time as a driver (N = R n , Z t = W t ), and take the eld of Schwartz morphisms F to be induced by a field of maps i.e. a smooth function f : In coordinates ϕ on M this amounts to with F 0 = 0 (note how the drift comes from the quadratic variation of Brownian motion, without having to require time as a driving process). This particular form of F is useful because it automatically de nes a numerical scheme for the solution of the SDE, similar to the Euler scheme, which cannot be de ned in a coordinate-free way on a manifold: the linear structure lacked by M is replaced with iterative interpolations along the f x 's. This also has the advantage of guaranteeing that if the maps are valued in M , so are all the approximations. "Itô-type" Di usions on manifolds have also been investigated by other authors, most notably by [BD , Ch. ] (although we refer to the more recent exposition [Gli , § . ]), who call the bundle Diff n Sch M the Itô bundle, and give a local description of it. Although we will not need this formulation in the following sections, we include a description of it to establish the link with the other approach. There are (at least) two ways of describing a bre bundle π : E → M : one is by simply exhibiting the manifolds E, M and the surjective submersion π, and by checking local triviality; this is the approach taken here. The second approach involves declaring the base space M , the structure group G (a Lie group), the typical bre F (a smooth manifold, carrying a left action of G by smooth maps) and a covering {U λ } λ of M together with maps g νµ : U µ ∩ U ν → G satisfying the cocycle conditions ∀λ, µ, ν g νµ g µλ = g νλ . Then the total space and bundle projection can be reconstructed by gluing all the U λ × F 's together according to the g νµ 's: Of course, the local description can be obtained from the ordinary one by xing a local trivialisation, a model for the bre, a Lie group capturing all transformations of the bres, etc. Now, we de ne the candidate bundle of Schwartz-Meyer di usions to have base space M × R ≥0 and typical bre Recall that we observed that the Schwartz bundle is not linear: this should rule out the usual choices G = GL(n, R), O(n), valid for vector bundles. Indeed, the transformation laws for Diff n Sch M are succinctly modelled by the Itô group where the trace is taken componentwise. Given an open covering {U λ } λ (consisting of, say, open balls) of M , and charts ϕ λ : U λ → R m , we de ne the Jacobian and Hessian of the change of coordinates. The isomorphism between the bundle that we have just described and Diff n Sch M is given by (notation as in ( )) There is a way of writing Itô equations on a manifold so that all the coe cients, drift included, are vectors. It involves considering the additional structure of a linear connection ∇ on M , i.e. a covariant derivative Equivalently, a connection is described through its Hessian These two data are equivalent and related by , the Hessian can be written as We will only be interested in connections modulo torsion, so it is not limiting for us to assume that a connection is symmetric or torsion-free, i.e. that its torsion tensor vanishes, or equivalently that its Hessian is valued in Γ(T * M T * M ). By far the most important example of such a connection is the Levi-Civita connection of a Riemannian metric g; in this case the Hessian takes the form ∇ 2 Torsion-free connections are relevant to our study of SDEs in that they correspond to the splittings of ( ), i.e. a linear left inverse q to i or a linear right inverse j to p The existence of the bundle maps j and q are equivalent to one another and to the the isomorphism (q, p) : TM → T M ⊕ (T M T M ) (this is the well-known splitting lemma [Hat , p. ], valid in the category of vector bundles). A torsion-free connection ∇ on M is equivalent to a splitting by setting Another way to view this correspondence is by j * d x f = ∇ 2 x f . Now, given symmetric connections on N and M , a eld of Schwartz morphisms F ∈ ΓSch(N, M ) can be viewed as a eld of block matrices One can then require that G ≡ 0, so that F reduces to F , which de nes the Itô equation Such equations have been considered in [É ]. The data needed to de ne this equation is the same as that involved in the de nition of the Stratonovich equation ( ), namely an element of ΓHom(T N, T M ), but the meaning of the equation depends on the connections on N and M . In local coordinates, using ( ) to specify F k αβ in ( ) to the case G ≡ 0, this equation takes the form Note that if the Christo el symbols on both manifolds vanish the above equation reduces to its rst line; however, unless a manifold is at a chart cannot in general be chosen so that the Christo el symbols vanish (except for at a single chosen point: these are called normal coordinates). Itô equations can be equivalently de ned through the Itô integral is a real-valued local martingale (the integral is to be interpreted as half the quadratic variation of X along the bilinear form ∇ 2 f ); this property coincides with the usual local martingale property when M is a vector space. In local coordinates an application of ( ) and ( ) shows that the local martingale property corresponds to the requirement that be a real-valued local martingale for each k. The Itô integral ( ) and Itô equations ( ) on manifolds behave well w.r.t. local martingales: if the integrand or driver is a local martingale, so is the integral or solution; this is again seen in local coordinates ( ).
In the following example we examine the case of di usions, de ned using Itô equations, in which the issue of the drift not being a vector is (partially) resolved: Example . (Itô di usion). Example . speci ed to the above case (M has a symmetric connection, G ≡ 0 in ( )) becomes the equation can legitimately be referred to as the "drift vector". Note however that in an arbitrary chart ϕ the drift will still carry a correction term: which reduces to the ordinary Itô lemma if M = R m and the chart ϕ is a di eomorphism of R m . The N Γ γ αβ 's do not appear since the driver is already valued in a Euclidean space. The data needed to de ne such an equation coincides with that needed for ( ), so we can de ne the bundle already de ned in ( ). Crucially, however, the Stratonovich and Itô calculi give di erent meanings to the equation de ned by a section of this bundle; in particular, a torsion-free connection on M is required in the latter case. The "Itô" and "Strat" therefore do not represent di erences in the bundles, which are identical, but only serve as a reminder of which calculus is being used to give the section the meaning of an SDE.
Itô equations on manifolds are the true generalisation of their Euclidean space-valued counterparts, but have the disadvantage of only being de ned w.r.t. a speci c connection. For instance, if F ∈ ΓDiff n Ito , M is Riemannian with M a Riemannian submanifold s.t. for all z and x ∈ M , F (z, x) maps to T x M , F does not in general de ne an Itô equation on M , since the Riemannian connection on M is not in general the restriction of that of M . However, F , seen as a eld of Schwartz morphisms, does de ne a Schwartz-Meyer equation on M (with a G term that is in general non-zero w.r.t. to the Riemannian connection on M ).
In the following table we summarise the advantages of these three ways of representing SDEs on manifolds: Stratonovich Schwartz-Meyer/ -jet Itô Does not require ∇ Uses Itô integration Coe cients are vectors It is natural to ask how these three types of equations are related to one another. In the case of di usions, there exists a commutative diagram of bijections All three a, b, c are the identity on the di usion coe cients. The behaviour of a, b, c on the Stratonovich, Schwartz-Meyer and Itô drifts is explained below Note that, while b and c depend on the connection, a does not.
Strat M if and only if X is a solution of aF , and the same for b, c). This is immediate by the expression of such equations in charts, by ( ) and the usual Itô-Stratonovich conversion formula.
Remark . . What makes Itô-Stratonovich conversion formulae di cult to state in the case of a general manifold-valued semimartingale driver Z, is that the change of calculus involves the emergence of new drivers which are not naturally valued in the manifold where Z is valued (the quadratic covariation of Z). Nevertheless, the map acan be de ned in this general setting [É , Lemma . ], though its inverse cannot canonically.

Manifolds embedded in R d
In this chapter we will mostly be concerned with manifolds embedded in R d : these can be studied using the extrinsic, canonical, R d -coordinates instead of non-canonical local ones. Let M be an m-dimensional smooth manifold embedded in R d . We assume M to be locally given by a non-degenerate Cartesian equation F (x) = 0: M can be described globally in this way if and only if it is closed and its embedding has trivial normal bundle; therefore, to preserve generality, we only assume F to be local. Throughout this chapter the letter x will denote a point in R d and the letter y a point in M . Thus F : This map can be seen to exist by using the normal exponential map de ned in [Pet , p. ], and is constant on the a ne (d − m)-dimensional slices of T which intersect M orthogonally: this is because the bre π −1 (y) coincides with the union of all geodesics in R d (i.e. straight line segments) which start at y, with initial velocity orthogonal to M , each taken for t in some open interval containing 0. It is important also to remember that π is unique given the embedding of M (on a thin enough T such that it is well de ned), whereas F is not canonically determined. In what follows we will be concerned with understanding which quantities are dependent on the chosen F and which instead only depend on the embedding of M . The only properties of π that we will need are that Di erentiating these (the second up to order ) we obtain If V y ∈ T M and X is a smooth curve s.t. X 0 = y andẊ 0 = V (y), di erentiating π(X t ) = X t results in Jπ(y) = V y : this shows that Jπ| T M = 1 T M . By a similar argument, the fact that π −1 (y) is a straight line segment that intersects M orthogonally implies that Jπ| T ⊥ M = 1 T ⊥ M (T ⊥ y M the normal bundle of M at y). These two statements mean that where P (y) : T y R d → T y M is the orthogonal projection onto the tangent bundle of M , which can be de ned in terms of F as The notation is borrowed from [CDL ]. Note that we can use F to de ne P, Q on a tubular neighbourhood of M , but these will only be independent of F on M . Q(y) : T y R d → T ⊥ y M is the orthogonal projection onto the normal bundle. Another consequence of ( ) (evaluated at y ∈ M ) that will be useful is that, for V y , W y ∈ T y R d , and denoting U y = P (y)U y , q Actually, to show that the third term statement in the rst line, we need a separate argument: Then only depends on f restricted to the a ne plane (or line) centred in y and spanned by A y , B y . Indeed, intending with A the extension of A y to a constant vector eld on U , we can write This is the directional derivative of g at y in the direction B y , and therefore only depends on the restriction of g to the a ne line span{B y }. But g(x) is itself a directional derivative, and only depends on f restricted to the a ne line span{A x }. Thus the whole expression only depends on f restricted to This shows that the term in question only depends on π restricted to span{ q V y , | W y }, which is the constant y map, whose derivatives therefore vanish.
Remark . . The other terms appearing in ( ) have a description that should be more familiar to di erential geometers: . Notice this is true independently of the chosen extension of W , | W to local vector elds, a priori needed to give the RHSs a meaning. The rst term is the second fundamental form of V y , W y [Lee , p. ], whereas the second term is the second fundamental tensor [Jos , Def. . . ]. If M is an open set of an a ne subspace of M , π is a linear map and both terms vanish. We prove the rst of the two equalities in ( ), the second is proved similarly: where the second equality follows from the fact that QW = 0 (and that the derivative is taken in a tangential direction, i.e. V y ∈ T y M ), and the last equality is given by ( ) below. Note that the terms of ( ) are extrinsic, in the sense that they depend on the embedding of M , unlike the Levi-Civita connection of the Riemannian metric on M , which is intrinsic to M . Finally, it will be necessary to consider the relationship between the derivatives of P, Q and the second derivatives of π. We di erentiate ( ) at time 0 along a smooth curve where we have used ( ).
We now consider a setup S = (Ω, F, P ) satisfying the usual conditions, W an n-dimensional Brownian motion de ned on S. Consider the W -driven di usion Stratonovich SDE As already discussed in Section , the natural condition on σ γ , b which guarantees that X will stay on M for its lifetime is their tangency to M : Our focus, however, will be mostly on the Itô SDE with smooth coe cients de ned in [0, +∞) × R d ; we do not assume them to be globally Lipschitz, so the solution might only exist up to a positive stopping time, not in general bounded from below by a positive deterministic constant. We are interested in deriving the "tangency condition" for the above SDE, i.e. a condition on the coe cients that will guarantee that the solution will not leave M . One way to impose this is to convert ( ) to Stratonovich form Now, given that Qσ α vanishes on M , all its directional derivatives along the tangent directions σ β will too, which gives, using ( ) We can thus reformulate the second equation in ( ) to obtain This is useful because it removes the reliance of this constraint on the derivatives of σ, and can be interpreted as saying that the di usion coe cients must be tangent to M and the Itô drift must instead lie on the space parallel to the tangent space of M , displaced by an amount which depends on the second fundamental form of M applied to the di usion coe cients.
Remark . (Tangency of a second-order di erential operator). ( ) can also be derived by writing the second order tangency condition for L k y ∂ y x k + L ij y ∂ 2 y x ij = L y ∈ T y R d to belong to T y M : this is done by writing T y πL y = L y in R d -coordinates as and then applying it to L y = σ γ (y, t), η(y, t), given in terms a eld of Schwartz morphisms F as Note that it would instead be incorrect to split F according to the Euclidean connection into a matrix with F and G terms as in ( ), and then to require that F and G map to T M , since the splitting of F according to the connection on M will be di erent, i.e. the diagram does not commute.
We now compute the Hessian for embedded M : where we have used ( ), ( ) to reduce this to a computation of directional derivatives, and nally ( ) (the argument is similar to ( )). R d ∇ 2 of course is just the ordinary Hessian. We can now compute M q, the splitting appearing in ( ) w.r.t. the connection on M : if T y M L y = L k y ∂ y x k + L ij y ∂ 2 y x ij , using ( ) yields Therefore the condition on an arbitrary Schwartz morphism of being Itô w.r.t. to the Riemannian connec- Compare this with the stronger condition of F of being Itô w.r.t. to the connection on R d , which is F k αβ (y, t) = 0. Thus, given an Itô equation F on M , de ned as in ( ) (σ γ = F γ , µ = F 0 ) we have that the drift in R d of such equation is given by µ k + 1 2 n γ=1 F k γγ , with the rst term tangent to M and the second orthogonal to M , and equal to 1 2 n γ=1 ∂π h ∂x i ∂x j σ i σ j γ , by Remark . and ( ). Therefore an Itô equation on M with coe cients σ γ , µ is read in ambient coordinates as Notice that the tangential part of the R d -drift, µ, is arbitrary, while its orthogonal part is determined by the di usion coe cients, and the condition that the solution remain on M . The notion of M -valued local martingale also has a description in terms of ambient coordinates [É , ¶ . ]: for an M -valued Itô process (such as the solution to ( )) the local martingale property is equivalent to requiring that the drift be orthogonal to M at each point (and thus determined by the di usion coe cients; for ( ) this means µ = 0). This condition is very reminiscent of the property of geodesics of having acceleration orthogonal to M [Lee , Lemma . ]. Using all ( ) and ( ) it is easy to verify that converting between Stratonovich, Schwartz-Meyer and Itô equations on M is equivalent when treating the equations as being valued in M or in R d . By this we mean that, denoting with Diff n Strat,M R d the bundle of Stratonovich equations on R d which restrict to equations on M (and analogously for the other two di usion bundles) the maps a, b, c of ( ) t into Figure : On the left a sample path of the solution to the Itô equation (blue) with the two di usion coe cients 2(x 2 + y 2 + z 2 ) −1 (−y, x, 0), 2(x 2 + y 2 + z 2 ) −1 (0, −z, y), which are tangent to S 2 → R 3 , zero drift and initial condition (0, 1, 0); in the same plot a sample path (using the same random seed) of the solution to the Stratonovich equation (green) de ned by the same vector elds and initial condition. The solution to the Itô equation drifts radially outwards, while the solution to the Stratonovich equation remains on S 2 . On the right we compare the same Stratonovich path with a sample path of the solution to the Itô equation (red) with the same di usion coe cients and initial condition, but with the orthogonal drift term necessary to keep the solution on S 2 ( ). The resulting solution is an S 2 -valued local martingale, while the solution to the Stratonovich equation is not: this is illustrated by plotting the vector eld on S 2 given by tangential component of the Itô drift possessed by the Stratonovich equation: this can be viewed as a manifold-valued drift component.
where vertical arrows denote restriction. An embedding argument immediately allows us to extend this assertion to the case where R d is substituted with a Riemannian manifold of which M is a Riemannian submanifold. This con rms there is no ambiguity in converting an M -valued SDE between its various forms.

Example . (Time dependent submanifold). Observe that the tangency conditions ( ) and ( ) can be written respectively as
for any smooth map π de ned on a tubular neighbourhood of M , with values in M , s.t. π| M = 1 M , by the same exact reasoning (for the Itô case we argue as in Remark . ). J π(y) is no longer the orthogonal projection P (y), but still restricts to the identity on T y M for y ∈ M , i.e. it has the property that ker(1 − J π) = T M on M . Allowing ourselves to consider all such tubular neighbourhood projections is useful in the following application. Given that we are considering time-dependent equations, it is very natural to also allow the submanifold M to be time-dependent. Making this precise entails considering a smooth (m + 1) We are looking for conditions on σ, b (resp. µ) which are su cient to guarantee the solution to ( ) (resp. ( )) X t to belong to M t for all t for which it is de ned. We then consider the R 1+d -valued process (t, X t ), which satis es the dynamics Then, given a thin enough tubular neighbourhood of M in R 1+d consider the map where π t is de ned as in ( ) for the manifold M t . Notice that this does not in general coincide with the Riemannian projection of a tubular neighbourhood onto M , which in general has no reason to preserve time, i.e. be expressible as a union of π t 's. The identity J πJ π = J π can be written in block matrix form as  where we are denotingπ t (y) = d dt π t (y): this implies that at each point y ∈ M t ,π t (y) ∈ T ⊥ y M t . This choice of the tubular neighbourhood projection will be further motivated later on, in Example . , Example . . In view of the above considerations, we can use it anyway to impose tangency of the SDE: this results in an unmodi ed condition on the di usion coe cients, and the conditions on the orthogonal components of the Stratonovich and Itô drifts are given respectively by which keep track of the evolution of M t in time.

Projecting SDEs
In Section we discussed three ways of representing SDEs on manifolds: Stratonovich, Schwartz-Meyer and Itô. In this section we will de ne, for each one of these representations, a natural projection of the SDE onto a submanifold. We will mostly take the ambient manifold to be R d , which will allow us to use the theory of the previous section to derive formulae for the projections in ambient coordinates. Remark . (The Itô-vector projection preserves local martingales). Although the Itô-vector projection is natural w.r.t. a smaller class of maps, it has the advantage of preserving the local martingale property: by this we mean that if the driver is a local martingale, so must the solution to the Itô-vector-projected SDE be. This is shown simply by the good behaviour of Itô equations w.r.t. manifold-valued local martingales.
Remark . . One might wonder whether it is possible to "push forward" SDEs according to an arbitrary smooth and surjective map f : D → D . If f is a surjective function admitting a smooth right inverse ι, then we may write the pushforward of, say, the Stratonovich SDE dX = F (X, Z) • dZ as dY = F (Z, ι(Y )) • dY . This condition on f essentially corresponds to the condition ( ). For general smooth surjective f (such as the bundle projection of a non-trivial principal bundle), however, we do not see a way of de ning a new closed form SDE on D .
We will now restrict our attention to the projections of R d -valued di usions onto the embedded manifold M . Focusing on di usions has the advantage of allowing us to use the maps ( ) to compare the projections. In other words we can ask if the vertical rectangles in the diagram commute (compare with ( ), in which the equations on top already restrict to equations on M ). We will show that they do not, and that all combinations of possibilities regarding their non-commutativity are possible. Examples of these cases are to be found in Subsection . below. We recall the notation V y := P (y)V y , q V y := Q(y)V y and begin by considering the R d -valued Stratonovich SDE ( ). By ( ) the coe cients of the Stratonovich projection of this SDE will just be the projected coe cients: Throughout this chapter we will use X for the initial SDE and Y to denote the projected SDE. Now assume we start with ( ), and want an Itô SDE on M . We can still use the Stratonovich projection by converting the SDE to Stratonovich form as in ( ), projecting as above, and converting back to Itô form (by ( ) this last conversion can be seen to occur interchangeably in M or in R d ). We have Using ( ) we can split µ in its orthogonal and tangential components: on M we have with implied evaluation of all terms at (y, t).
We now move on to the Itô-jet projection. Let F ∈ ΓDiff n Sch R d as in ( ), so that the Schwartz-Meyer equation it de nes coincides with the Itô equation ( ). We can then write ( ) using matrix notation as of which the rst line reads Remark . . We can write the Itô-jet-projected drift µ as the generator of the SDE, applied to the tubular neighbourhood projection π: In [AB ] the eld of Schwartz morphisms F is taken to be induced by a (time-homogeneous) eld of maps f as in ( ). In this approach we can use functoriality of T to write thus obtaining an SDE de ned by the eld of ( -jets of) maps given by projecting the original eld of maps onto M with the tubular neighbourhood projection π. Finally, we consider the Itô-vector projection of ( ). By ( ), in coordinates this amounts to projecting ( ) to the Itô SDE on M with di usion coe cients given by σ γ and drift To summarise, all three projections of the Itô equation ( ) agree on how to map the di usion coecients, and the orthogonal components of the drift terms will all be xed by the constraint ( ), while their tangential projections are given by (respectively Stratonovich, Itô-jet, Itô-vector) By calculations similar to ( ) we can compute the projections of ( ) in Stratonovich form: again, all three projections will orthogonally project the di usion coe cients, and behave as follows on the Stratonovich drifts.
From now on we will consider ( ) as being our starting point, unless otherwise mentioned, and thus refer to ( ) when comparing the three projections. We end this section with a brief comparison of the three projections, leaving a detailed analysis of their di erences to Subsection . . The three projections coincide if σ γ ∈ T M for γ = 1, . . . , n (which includes the ODE case σ γ = 0), in which case the di usion coe cients remain una ected, and the tangent component of the projected drift is simply given by µ. If σ γ ∈ T ⊥ M for γ = 1, . . . , n all three projections result in an ODE on M , and the Itô-jet and Itô-vector projections coincide. Another case in which the Itôjet and Itô-vector projections coincide is when the second derivatives of π vanish: this occurs in particular if M is embedded a nely, i.e. it coincides with some open set of an a ne space of R d . All three projections forget the orthogonal part of the (Itô or Stratonovich) drift. We observe from ( ) that the Itô-jet and Itô-vector projections of ( ) only depend on the values of the Itô-coe cients on M . The Stratonovich projection, instead, could additionally depend on the tangential components of the derivatives of the diffusion coe cients in the direction of their normal components. Naturally, the situation is reversed when projecting ( ): here it is the Stratonovich projection that only depends on the values of the coe cients on M , while the Itô-jet and -vector projections might depend on the mentioned derivative term.
Example . (The projections in the case M time-dependent). Recalling Example . (and the map π dened therein) we may ask whether there is a way to consider the three SDE projections in the case of M time-dependent. The most natural way to de ne this is to consider, as done in ( ), the joint equation satis ed by (t, X t ), project its coe cients in the three ways onto M , thus obtaining a solution of the form (t, Y t ): this uses that π 0 (t, y) = t (with time the 0 th coordinate), which is instead not necessarily satis ed by the Riemannian tubular neighbourhood projection onto M . It is easily checked that the formulae ( ) for the tangential component of the drift of Y t continue to hold with the substitution of π t for π (so that also the projection onto the tangent space P is now time-dependent), whereas in all three cases the orthogonal component of the drift picks up the termπ t , needed to keep the process on the evolving manifold M t . In particular, in the Itô-jet case we have where L t is the generator of X and L is that of (t, X t ) (which can be considered as being a timehomogeneous Markov process). This identity extends the observation made in Remark . . The same terṁ π t should be added to the Stratonovich drifts ( ) for the extension to the case of M time-dependent.

The optimal projection
In the previous section we showed how to abstractly project manifold-valued SDEs onto submanifolds in three (possibly) di erent ways, and specialised these constructions to the case of M → R d -valued di usions. In this section we will seek the optimal projection of an SDE for X t , which we write in Itô form as ( ). Let be the M -valued SDE to be de ned, which we write in R d -coordinates. Its coe cients • σ γ and • µ are to be treated as unknowns, to be determined by the optimisation criteria that involve the minimisation of the quantities asymptotically for small t. Before we de ne the optimality criteria precisely, it is important to note that such expectations are unde ned if the solution to either SDE is explosive, or, in the second case, even if it exits the tubular neighbourhood of M on which π is de ned. The problem must be slightly changed so as to ensure that we are minimising a well-de ned quantity. One option is to take the above expectations on the event {t ≤ τ r }, where for some suitable r > 0. However, since for such optimality criteria the values of the vector elds of both SDEs outside the ball B (y0,y0) (r) ⊆ R 2d are irrelevant, it is simpler to just assume that they vanish outside B (y0,y0) (2r). Since the optimisation criteria will only determine the value of • σ, • µ at the initial condition, this is really only an assumption on σ and µ. The following proposition reassures us that, at least in wellbehaved cases, this does not alter the problem in a way that interferes with the optimisation (which, as will be seen shortly, only involves the Taylor expansions of order of ( ) in t = 0).
Lemma . . Let X, Y, y 0 , τ r be as above, U a neighbourhood of (y 0 , y 0 ) in R 2d and assume that there f (y 0 , y 0 , 0) = 0, and assume moreover that E[max 0≤t≤ε |f (X t , Y t , t)|] < ∞ (this holds, in particular, under the global Lipschitz assumptions that guarantee SDE exactness [RW , Theorem . ]). Then for any Proof. Fix r, and let τ := τ r . The Itô formula yields the a decomposition |(X t , Y t )−(y 0 , y 0 )| 2 = L t +A t with L t sum of Brownian integrals and A t time integral, all of which for t ≤ τ ∧ε have bounded integrand (by continuity of the SDE coe cients and compactness of B r (y 0 , y 0 ) × [0, ε]).
[L] t can be expressed as a time integral with bounded integrand: let R > 0 bound the sum of the absolute values of all integrands mentioned for t ∈ [0, τ ∧ ε]. Then, still for t ≤ τ ∧ ε we have |A t |, [L] t ≤ Rt, and for any ξ > 0 it holds since the rst factor also vanishes as t → 0, by the hypotheses on f, X, Y and dominated convergence.
We proceed with the constrained optimisation problem, assuming all SDE coe cients to be compactly supported; this means all local martingales involved will be martingales, and that we may use Fubini to pass to the expectation inside integrals in dt. If we can write the Taylor expansion of the strong error ( ) a rst goal could be to minimise the leading coe cient a 1 (of course there is no constant term because Y 0 = y 0 = X 0 ). Using Itô's formula, and intending with equality of di erentials up to di erentials of martingales, we have We now compute the expectation: and di erentiating, with reference to ( ) we have Since a 1 only depends on the di usion coe cients, its minimisation is expressed by the constrained optimisation problem whose solution is simply given by projecting the σ γ 's onto T M : Here we have omitted evaluation at the initial condition (0, y 0 ). Since we have not obtained a condition on • µ our SDE ( ) is still underdetermined, and the condition would be satis ed by the Stratonovich projection of ( ).
One idea to obtain a condition on • µ would be to minimise a 2 in ( ). This attempt, however, has the drawback that we are minimising the second Taylor coe cient of a function without its rst vanishing (unless the σ γ 's are already tangent to start with: in this case the minimisation of a 2 can be seen to result in the three projections, which all coincide). Although this approach is discussed in [ABF ], we will not do so here, as there are more sound optimisation criteria. Indeed, we can look at the Taylor expansion of the weak error We compute the term on the left as which con rms that ( ) lacks a linear term, and we have Requiring the minimisation of b 2 is thus independent of the minimisation of a 1 above, and results in the constrained optimisation problem Remark . . In de ning the three projections in Section we intended for the projected coe cients to still be time-dependent if the original ones were. The optimality requirement only xes the coe cients at the initial condition, at time 0, i.e.
Remark . . Note that the form (Itô or Stratonovich) the initial SDE is provided in is irrelevant: if we had begun with ( ) instead of ( ) the optimality criterion would still have led us to the Itô-vector projection, which for the Stratonovich drift would have taken the form − → b in ( ). The only reason to start with an Itô SDE is that the calculations are simpler, and it is possible to express the optimal coe cients as functions of the values of the coe cients of the original SDE, without reference to their derivatives.
The optimisation of Theorem . has the disadvantage of coming from the two separate minimisations of a 1 and b 2 , which are Taylor coe cients of di erent quantities. There is a di erent way of arriving at coefcients by successively minimising the Taylor coe cients of the same quantity, with the rst minimisation resulting in a null term. The idea is to consider where X, Y, τ are respectively as in ( ), ( ),( ), with the requirement on r that B r (y 0 ) be contained in the domain of π. The map π is the one de ned in ( ), although it can more generally satisfy ( ). Letting The constrained optimisation problem for the minimisation of c 2 conditional on the previous minimisation of c 1 is thus given by Comparing with ( ) we see that we have proven the following Remarks analogous to Remark . and Remark . hold for Theorem . . The Itô-vector and Itô-jet projection therefore satisfy di erent optimality properties, while the Stratonovich projection is suboptimal in both senses. We end the section with the extension of the optimisations to the case of M time-dependent.
Example . (Optimality for M time-dependent). Recall the case in which the submanifold M depends smoothly on time, for which we can de ne similar versions of all three projections Example . . For Theorem . the optimisation criterion does not require reformulation, while the constraint is modi ed as described in Example . : therefore the Itô-vector projection remains optimal in the case of M timedependent. For Theorem . the natural generalisation is given by substituting π t for π in ( ). Since |y−π t (x)| = |(t, y)− π(t, x)|, by the de nition of the Itô-jet projection in the case of M time-dependent (and since the calculations in this section never relied on π being the Riemannian tubular neighbourhood projection), we have that the time-dependent Itô-jet projection ( ) is optimal in this case too.

Further considerations
In this nal section we dig deeper into the details surrounding the Itô and Stratonovich projections of SDEs, and answer a few lingering questions.

. Di erences between the projections
In this subsection we will provide examples to justify our claim that the vertical rectangles of ( ) do not commute.
We begin with an example in which the Itô-jet and -vector projections coincide, but are di erent from the Stratonovich projection. This example also shows how the dependence of the Stratonovich projection of ( ) on the derivatives of the di usion coe cients can be non-trivial.
Example . . Take M = {(x, 0) : x ∈ R} → R 2 , n = 1 and the Itô SDEs whose di usion coe cients coincide, and are orthogonal to M , on M . Their Stratonovich projections onto the a ne subspace M = R are respectively given by the ODEṡ The Itô-jet and -vector projections of the two SDEs above coincide (since their coe cients on M coincide) and are trivial. An example where Itô-jet = Itô-vector = Stratonovich, and where the Itô projections are non-trivial can be obtained from this by increasing n to 2 and adding a tangent di usion coe cient.
Next, we ask the question of when the Stratonovich and Itô-jet projections coincide. The following criterion is a rephrasing of [ABF , Theorem . ].
Remark . (Fibering property). In general the di erence of the Stratonovich-and Itô-jet-projected drift can be written as x in a neighbourhood of M (again, if we are only interested in starting our equation at time zero, the above requirement only needs to be considered for t = 0), the derivative of the above quantity along any vector tangent to the bre of π (which at points in M means orthogonal to M ) vanishes: this means ( ) vanishes and the Itô jet and Stratonovich projections are equal. Moreover, if, representing the original SDE in Stratonovich form as ( ), we additionally have that then it is immediate to verify that π(X t ) is a solution of the Stratonovich projection, and therefore that, letting Y be the solution to the Stratonovich=Itô-jet projection up to the exit time of X t from the tubular neighbourhood in which π is de ned. Observe that in the absence of these conditions we cannot expect, in general, to obtain a closed form SDE for π(X t ), as the coe cients will depend explicitly on X t . This is even true if ( ) holds but ( ) does not, as can be shown simply by considering the ODE case σ γ = 0.
Example . . Consider the case in which σ γ (x, t) = σ γ (t) do not depend on the state of the solution. In this case, even if ( ) and ( ) are equivalent, the projections may still be all di erent. ( ) however shows that µ − − → µ = 2( µ − − → µ ) ( ) so that if any two projections coincide, they must all. An example where all projections are di erent is given by taking M , d as in Example . and the single, constant di usion coe cient σ = (1, 1): all projections di er, for instance at the point (1, 0). An example where the projections all coincide is when n = d and σ k γ = δ k γ : γ σ i γ q σ j γ = γ P i α δ α γ Q j β δ β γ = γ P i γ Q j γ = P i γ Q γ j = 0 ( ) If the original drift also vanishes, we are in the presence of the trivial SDE for Brownian motion, whose Itô and Stratonovich projections coincide with the process π(W t ) up to the exit time of W from the domain of π, by the same reasoning of Remark . .

Figure :
In these gures we focus on [Example . , a = 1], with initial condition (cos(π/6), sin(π/6)), so that all projections are distinct. The two graphs above are respectively plots of the errors |E[Y t − X t ]| 2 and E[|Y t − π(X t )| 2 ] for Y t the solution to the Stratonovich, Itô-vector and Itô-jet projections, with the expectation taken over 10 4 sample paths. We see con rmation of the fact that the Itô-vector projection performs better in the rst error metric, that the Itô-jet projection does so in the second, and that the Stratonovich projection is markedly suboptimal in both senses (especially in the rst, while in the second case it performs very similarly to the Itô-vector projection). The analogous plot for the error ( ) is not included, as the results for the three projections are visually indistinguishable, in accordance with the fact that all three projections minimise a 1 (without it vanishing in this case). The gure below displays one sample path (t, Y t ) where Y t is each of the following processes: the solution to the original SDE, to the three projected SDEs, and the metric projection π applied to the original solution. All sample paths are derived from the same random seed. Since the optimality criteria all involve taking expectation, we do not expect to be able to derive meaningful intuition from a single path, but it is nonetheless informative to have visual con rmation that all projections are distinct, but related.
In this section we have developed examples that cover all possible situations involving identities, and lack thereof, between the three projections. We summarise them in the . Intrinsic optimality of the Itô projections The fact that in ( ) we are comparing two points, Y t and π(X t ), which lie in M opens up the possibility of substituting the Euclidean distance with the Riemannian distance of M , d M , inside the expectation. One can then ask whether this leads to a di erent optimisation. Let U be a neighbourhood of the initial condition y 0 in R d , V := U ∩ M , ϕ : V → R m a normal chart centred in y 0 , ϕ := ϕ • π : U → R m . This means that if G t is a geodesic in M starting at y 0 , ϕ(G t ) = vt where R m v = T y0 ϕ(Ġ 0 ). As a consequence we have that, if W y0 ∈ T y0 M , picking the geodesic G with G 0 = y 0 ,Ġ y0 = W y0 , we have that ( ) since the acceleration of G is orthogonal to M . Now, the problem consists of choosing • σ γ and • µ in such a way that c 1 vanishes and c 2 is minimal in . We have expressed d M in normal coordinates in order to be able to use the estimates of [Nic , Appendix A], which tell us that the derivatives of orders ≤ 3 of ϕ d M agree with those of the squared distance function of R m (in particular those of order and vanish). Since we are only interested in c 1 and c 2 , this means we can substitute the LHS of ( ) with E |ϕ(Y t ), ϕ(π(X t ))| 2 ( ) Proceeding as in the computations of Section , we see that This quantity is made to vanish exactly as before, namely in the unique case