1 Introduction

Hamilton–Jacobi (HJ) partial differential equations and the associated theory lie at the center of classical mechanics (Abraham and Marsden 1978; Arnold 1989; Marsden and Ratiu 1999; Goldstein et al. 2002). Motivated by Hamilton’s approach to geometrical optic where the action represents the time needed by a particle to move between two points and a variational principle due to Fermat, Jacobi extended this approach to Lagrangian and Hamiltonian mechanics. Jacobi designed a concept of “complete” solution of HJ equations allowing him to recover all solutions simply by substitutions and differentiations. Although, in general, it is more complicated to solve than a system of ODEs like Hamilton’s ones, HJ equations proved to be powerful tools of integration of classical equations of motion. In addition, Jacobi’s approach suggested him to ask what diffeomorphisms of the cotangent bundle, the geometric arena of canonical equations, preserve the structure of these first-order equations. Those are called today symplectic or canonical transformations, and Jacobi’s method of integration is precisely one of them.

It is not always recognized as it should be that HJ equations were also fundamental in the construction of quantum mechanics. The reading of Schrödinger (1926), Fock (1978), Dirac (1933) and others until Feynman (1948) makes abundantly clear that most of new ideas in the field made use of HJ equations for the classical system to be “quantized,” or some quantum deformation of them. There are at least two ways to express this deformation. On the one hand, one can exponentiate the \(L^2\) wave function, call S its complex exponent and look for the equation solved by S (see Goldstein et al. 2002). When the system is a single particle in a scalar potential, one obtain the classical HJ equation with an additional Laplacian term and factor \(i\hbar \), representing the regularization expected from the quantization of the system. This complex factor is symptomatic of the basic quantum probability problem, at least for pure states. In a nutshell, it is the reason why Feynman’s diffusions, in his path integral approach, do not exist. On the other hand, there is an hydrodynamical interpretation of quantum mechanics, founded on Madelung transform, a polar representation of the wave function whose real part is the square root of a probability density. The argument solves another deformation of HJ equation. The geometry of this transform has been thoroughly investigated recently, highlighting its relations with optimal transport theory (Khesin et al. 2021; von Renesse 2012).

However, the probabilistic content of quantum mechanics, especially for pure states, remained a vexing mathematical mystery right from its beginning, despite several interesting (but unsuccessful) attempts (Nelson 2001). The current consensus is that regular probability theory and stochastic analysis have little or nothing to teach us about it. And, in particular, that all that can be saved from Feynman path integral theory is Wiener’s measure and perturbations of it by potential terms. This is the “Euclidean approach,” one of the starting points of mathematical quantum field theory.

In 1931, however, Schrödinger suggested in a paper almost forgotten until the 1980s (Schrödinger 1932) [but insightfully commented by the probabilist Bernstein (1932)] the existence of a completely different Euclidean approach to quantum dynamics. In short, a stochastic variational boundary value problem for probability densities characterizes optimal diffusions on a given time interval as having a density product of two positive solutions of time adjoint heat equations. This idea, revived and elaborated from 1986 (Zambrini 1986), is known today as “Schrödinger’s problem” in the community of optimal transport, where it has proved to provide, among other results, very efficient regularization of fundamental problems of this field (Léonard 2014). In fact, Schrödinger’s problem hinted toward the existence of a stochastic dynamical theory of processes, considerably more general than its initial quantum motivation. In it, various regularizations associated with the tools of stochastic calculus should play the role of those involved in quantum mechanics in Hilbert space, where the looked-for measures do not exist.

The variational side of the stochastic theory has been developed in the last decades, inspired by number of results in stochastic optimal control (Haussmann 1986; Fleming and Soner 2006) and stochastic optimal transport (Mikami 2021). In this context, the crucial role of (second-order) Hamilton–Jacobi–Bellman (HJB) equation has been known for a long time. It provides the proper regularization of the (first-order) HJ equation needed to construct well-defined stochastic dynamical theories. In contrast, for instance, with the notion of viscosity solution, whose initial target was the study of the classical PDE, HJB equation becomes central, there, as natural stochastic deformation of this one, compatible with Itô’s calculus. It is worth mentioning that in any fields like AI or reinforcement learning, where HJB equations play a fundamental role (Peyré et al. 2019), it is natural to expect that such a stochastic dynamical framework, built on them, should present some interest.

The geometric, and especially, Hamiltonian side of the dynamical theory had resisted until now and constitutes the main contribution of this paper. It is our hope that it will be useful far beyond its initial motivation referred to, afterward, as its “inspirational examples.” In this sense, it can clearly be interpreted as a general contribution to stochastic geometric mechanics. More precisely, we are trying to answer the following questions:

  • Do we have any geometric interpretation of the Hamilton–Jacobi–Bellman equation? That is, can we derive the HJB equation from some sort of canonical transformations?

  • Can we formulate some variational problem that leads to a Euler–Lagrange equation which is equivalent to the HJB equation?

  • More systematically, can we develop some counterpart of Lagrangian and Hamiltonian mechanics that are associated with the HJB equation?

The first question indicates that canonical transformations should be somehow second-order, so that the corresponding symplectic and contact structures are also second-order. Meanwhile, the stochastic generalization of optimal control and optimal transport suggests that the variational problem of the second question should be formulated in stochastic sense. Combining these hints, the third question amounts to seeking a new theory of geometric mechanics that integrates stochastics and second-order together.

The cornerstone of stochastic analysis, the well-known Itô’s formula, tells us that the generator of a diffusion process is a second-order differential operator. This provides a very natural way to connect the stochastics with the second-order. That is, in order to build a stochastic or second-order counterpart of geometric mechanics, we need to encode the rule of Itô’s formula into the geometric structures.

There is a theory named second-order differential geometry (“stochastic differential geometry” is also used by some authors but we would like to keep the original terminology), which was devised by Schwartz and Meyer around 1980 (Schwartz 1980, 1982, 1984; Meyer 1979, 1981a), and later on developed by Belopolskaya and Dalecky (1990), Gliklikh (2011), Emery (1989), etc. See Emery (2007) for a survey of this aspect. Compared with the theory of stochastic analysis on manifolds (or geometric stochastic analysis) developed by Itô (1962, 1975), Malliavin (1997), Bismut (1981) and Elworthy (1982) etc., which focus on Stratonovich stochastic differential equations on classical geometric structures, like Riemannian manifolds, frame bundles and Lie groups, so that the Leibniz’s rule is preserved, Schwartz’ second-order differential calculus alter the underlying geometric structures to include second-order Itô correction terms, and provide a broader picture even though it loses Leibniz’s rule and is much less known.

In this paper, we will adopt the viewpoint of Schwartz–Meyer and enlarge their picture to develop a theory of stochastic geometric mechanics. We first give an equivalent and more intuitive description for the second-order tangent bundle by equivalent classes of diffusions, via Nelson’s mean derivatives. And then we generalize this idea to construct stochastic jets, from which stochastic prolongation formulae are proved and the stochastic counterpart of Cartan symmetries is studied. The second-order cotangent bundle is also studied, which helps us to establish stochastic Hamiltonian mechanics. We formulate the stochastic Hamilton’s equations, a system of stochastic equations on the second-order cotangent bundle in terms of mean derivatives. By introducing the second-order symplectic structure and the mixed-order contact structure, we derive the second-order HJB equations via canonical transformations. Finally, we set up a stochastic variational problem on the space of diffusion processes, also in terms of mean derivatives. Two kinds of stochastic principle of least action are built: stochastic Hamilton’s principle and stochastic Maupertuis’s principle. Both of them yield a stochastic Euler–Lagrange equation. The equivalence between the stochastic Euler–Lagrange equation and the HJB equation is proved, which exactly leads to the equivalence between our stochastic variational problem and Schrödinger’s problem in optimal transport. Last but not least (actually vital), a stochastic Noether’s theorem is proved. It says that every symmetry of HJB equation corresponds to a martingale that is exactly a conservation law in the stochastic sense. It should be observed, however, that the Schwartz–Meyer approach, together with the one of Bismut (1981), has also inspired a distinct, Stratonovich-type stochastic Hamiltonian framework (Lázaro-Camí and Ortega 2008) leading to a stochastic HJ equation (Lázaro-Camí and Ortega 2009), without relations with Schrödinger’s problem or optimal transport.

The key results of the present paper and the dependence among them are briefly expressed in the following diagram:

figure a

The organization of this paper is the following:

Section 2 is a summary on the theory of stochastic differential equations on manifolds, in the perspective appropriate to our goal. In particular, diffusions will be characterized by their mean and quadratic mean derivatives as in Nelson’s stochastic mechanics (Nelson 2001) although the resulting dynamical content of our theory will have very little to do with his. In this way, we are able to rewrite Itô SDEs on manifolds as ODE-like equations that have better geometric nature. The notion of second-order tangent bundle answers to the question: the drift parts of Itô SDEs are sections of what?

Section 3 is devoted to the notion of Stochastic jets. In the same way as tangent vector on M are defined as equivalence classes of smooth curves through a given point and then generalized to higher-order cases to produce the notion of jets, the stochastic tangent vector is defined as equivalence classes of diffusions so that the stochastic tangent bundle is isomorphic to the elliptic subbundle of the second-order tangent bundle. Stochastic jets are also constructed. This provides an intrinsic definition of SDEs under consideration.

Section 4 illustrates the use of the above geometric formulation of SDEs for the study of their symmetries. Prolongations of M-valued diffusions are defined as new processes with values on the stochastic tangent bundle. Among all deterministic space-time transformations, bundle homomorphisms will be the only subclass to transform diffusions to diffusions. Total mean and quadratic derivative are defined in conformity with the rules of Itô’s calculus. The prolongation of diffusions allows to define symmetries of SDEs and their infinitesimal versions. Stochastic prolongation formulae are derived for infinitesimal symmetries, which yield determining equations for Itô SDEs.

In Sect. 5, the second-order cotangent bundle, as dual bundle of second-order tangent bundle, is defined and analyzed. The properties of second-order differential operator, pushforwards and pullbacks are described. When time is involved, i.e., the base manifold is the product manifold \({\mathbb {R}}\times M\), the corresponding bundles are mixed-order tangent and cotangent bundles, where “mixed-order” means they are second-order in space but first-order in time. More about this topic, like mixed-order pushforwards and pullbacks, pushforwards and pullbacks by diffusions, and Lie derivatives, can be found in “Appendix A.” An generalized notion to stochastic Cartan distribution and its symmetries are discussed in “Appendix B” based on the mixed-order contact structure.

The point of Sect. 6 is to use the tools developed before in the construction of the stochastic Hamiltonian mechanics which is one of the main goals of the paper. One of our inspirational example will be the one underlying the dynamical content of Schrödinger’s problem. By analogy with Poincaré 1-form in the cotangent bundle of classical mechanics and its associated symplectic form, one can construct counterparts in the second-order cotangent bundle. Using the canonical second-order symplectic form on second-order cotangent bundles, one defines second-order symplectomorphisms. The generalization of classical Hamiltonian vector fields becomes second-order operators, for a given real-valued Hamiltonian function on the second-order cotangent bundle. The resulting stochastic Hamiltonian system involves pairs of extra equations compared with their classical versions. Bernstein’s reciprocal processes inspired by Schrödinger’s problem are described in this framework, corresponding to a large class of second-order Hamiltonians on Riemannian manifolds. A mixed-order contact structure describes time-dependent stochastic Hamiltonian systems. The last subsection of this section is devoted to canonical transformations preserving the form of stochastic Hamilton’s equations. The corresponding generating function satisfies the Hamilton–Jacobi–Bellman equation.

Section 7 treats the stochastic version of classical Lagrangian mechanics on Riemannian manifolds. Itô’s stochastic deformation of the classical notion of parallel displacements are recalled. Another one, called damped parallel displacement in the mathematical literature, involving the Ricci tensor, is also indicated. Each of these displacements corresponds to a mean covariant derivative along diffusions. The action functional is defined as expectation of Lagrangian and the stochastic Euler–Lagrange equation involves the damped mean covariant derivative. The dynamics of Schrödinger’s problem is, again, used as illustration. The equivalence between stochastic Hamilton’s equations on Riemannian manifolds and the stochastic Euler–Lagrange one as well as the HJB equation are derived via the Legendre transform. Relations with stochastic control are also mentioned. The section ends with the stochastic Noether’s theorem. The stochastic version of Maupertuis principle, as the twin of stochastic Hamilton’s principle, is left into “Appendix C.”

2 Stochastic Differential Equations on Manifolds

In this section, we will study several types of stochastic differential equations on manifolds which are weakly equivalent to Itô SDEs. We start with a d-dimensional smooth manifold M and a probability space \((\Omega , {\mathcal {F}}, {\textbf{P}})\), and equip the latter with a filtration \(\{{\mathcal {P}}_t\}_{t \in {\mathbb {R}}}\), i.e., a family of nondecreasing sub-\(\sigma \)-fields of \({\mathcal {F}}\). We call \(\{{\mathcal {P}}_t\}_{t \in {\mathbb {R}}}\) a past filtration. Unless otherwise specified, the manifold M will not be endowed with any structures other than the smooth structure. In some cases, it will be endowed with a linear connection, a Riemannian metric, or a Levi–Civita connection.

Recall from Hsu (2002, Definition 1.2.1) that by an M-valued (forward) \(\{{\mathcal {P}}_t\}\)-semimartingale, we mean a \(\{{\mathcal {P}}_t\}\)-adapted continuous M-valued process \(X= \{X(t)\}_{t\in [t_0,\tau )}\), where \(t_0\in {\mathbb {R}}\) and \(\tau \) is a \(\{{\mathcal {P}}_t\}\)-stopping time satisfying \(t_0<\tau \le +\infty \), such that f(X) is a real-valued \(\{{\mathcal {P}}_t\}\)-semimartingale on \([t_0,\tau )\) for all \(f\in C^\infty (M)\). The stopping time \(\tau \) is called the lifetime of X. If we adopt the convention to introduce the one-point compactification of M by \(M^*:= M \cup \{\partial _M\}\), then the process X can be extended to the whole time line \([t_0,+\infty )\) by setting \(X(t) = \partial _M\) for all \(t\ge \tau \). The point \(\partial _M\) is often called the cemetery point in the context of Markovian theory.

2.1 Itô SDEs on Manifolds

Given \(N+1\) time-dependent vector field \(b,\sigma _r, r=1,\ldots ,N\) on M, one can introduce a Stratonovich SDE in local coordinates, which has the same form as in Euclidean space (Hsu 2002, Section 1.2). The form of Stratonovich SDEs on M is invariant under changes of coordinates, as Stratonovich stochastic differentials obey the Leibniz’s rule.

However, for Itô stochastic differentials this is not the case because of Itô’s formula. Hence, we cannot directly write an Euclidean form of Itô SDE on M in local coordinates, since it is no longer invariant under changes of coordinates. Indeed, a change of coordinates will always produce an additional term. To balance this term, a common way is to add a correction term to the drift part of the Euclidean form of Itô SDE, by taking advantage of a linear connection. More precisely, under local coordinates \((x^i)\), we consider the following Itô SDE (Gliklikh 2011, Section 7.1, 7.2):

$$\begin{aligned} dX^i(t)= & {} \left[ b^i(t,X(t)) - \frac{1}{2} \sum _{r=1}^N \Gamma ^i_{jk}(X(t)) (\sigma ^j_r \sigma ^k_r)(t,X(t)) \right] dt \nonumber \\{} & {} + \sigma ^i_r(t,X(t)) dW^r(t), \end{aligned}$$
(2.1)

where \((\Gamma ^i_{jk})\) is the family of Christoffel symbols for a given linear connection \(\nabla \) on TM. When conditioning on \(\{X(t)=q\}\) and taking \((x^i)\) as normal coordinates at \(q\in M\), (2.1) turns to the Euclidean form, since at q,

$$\begin{aligned} \sum _{r=1}^N \Gamma ^i_{jk} \sigma ^j_r \sigma ^k_r = \frac{1}{2} \sum _{r=1}^N \left( \Gamma ^i_{jk} + \Gamma ^i_{kj}\right) \sigma ^j_r \sigma ^k_r = 0. \end{aligned}$$
(2.2)

If we denote

$$\begin{aligned} \sigma \circ \sigma ^*:= \sum _{r=1}^N \sigma _r \otimes \sigma _r = \sum _{r=1}^N \sigma _r^j \sigma _r^k \frac{\partial }{\partial {x^j}}\otimes \frac{\partial }{\partial {x^k}}. \end{aligned}$$

Then, clearly \(\sigma \circ \sigma ^*\) is a symmetric and positive semi-definite (2, 0)-tensor field. We also introduce formally a modified drift \({\mathfrak {b}}\) which has the following coordinate expression

$$\begin{aligned} {\mathfrak {b}}^i = b^i - \frac{1}{2} \sum _{r=1}^N \Gamma ^i_{jk} \sigma ^j_r \sigma ^k_r. \end{aligned}$$
(2.3)

We change the coordinate chart from \((U,(x^i))\) to \((V,(\tilde{x}^j))\) with \(U\cap V\ne \emptyset \). Since each \(\sigma _r\) transforms as a vector, we apply the change-of-coordinate formula for Christoffel symbols (e.g., Kobayashi and Nomizu 1963, Proposition III.7.2) to derive that

$$\begin{aligned} \begin{aligned} \Gamma ^i_{jk} \sigma ^j_r \sigma ^k_r&=\, \left( {\tilde{\Gamma }}^l_{mn} \frac{\partial {\tilde{x}}^m}{\partial x^j} \frac{\partial {\tilde{x}}^n}{\partial x^k} \frac{\partial x^i}{\partial {\tilde{x}}^l} + \frac{\partial ^2{\tilde{x}}^l}{\partial x^j \partial x^k} \frac{\partial x^i}{\partial {\tilde{x}}^l} \right) \sigma ^j_r \sigma ^k_r \\&=\, \left( {\tilde{\Gamma }}^l_{mn} {\tilde{\sigma }}^m_r {\tilde{\sigma }}^n_r + \frac{\partial ^2{\tilde{x}}^l}{\partial x^j \partial x^k} \sigma ^j_r \sigma ^k_r \right) \frac{\partial x^i}{\partial {\tilde{x}}^l}. \end{aligned} \end{aligned}$$

It follows that the coefficients of the modified drift \({\mathfrak {b}}\) in (2.3) transform as

$$\begin{aligned} \begin{aligned} \tilde{{\mathfrak {b}}}^l&= {\tilde{b}}^l - \frac{1}{2} \sum _{r=1}^N {\tilde{\Gamma }}^l_{mn} {\tilde{\sigma }}^m_r {\tilde{\sigma }}^n_r = b^i\frac{\partial {\tilde{x}}^l}{\partial x^i} - \frac{1}{2} \sum _{r=1}^N \left( \Gamma ^i_{jk} \frac{\partial {\tilde{x}}^l}{\partial x^i} - \frac{\partial ^2 \tilde{x}^l}{\partial x^j \partial x^k} \right) \sigma ^j_r \sigma ^k_r \\&= {\mathfrak {b}}^i \frac{\partial {\tilde{x}}^l}{\partial x^i} + \frac{1}{2} \frac{\partial ^2 \tilde{x}^l}{\partial x^j \partial x^k} \sum _{r=1}^N \sigma ^j_r \sigma ^k_r. \end{aligned} \end{aligned}$$
(2.4)

Therefore, \({\mathfrak {b}}\) is not a vector field as it does not pointwisely transform as a vector.

Finally, using Itô’s formula, we derive the transformation of (2.1) as follows:

$$\begin{aligned} \begin{aligned} d{\tilde{x}}^l&= \frac{\partial {\tilde{x}}^l}{\partial x^i} dx^i + \frac{1}{2} \frac{\partial ^2 {\tilde{x}}^l}{\partial x^j \partial x^k} d[x^j,x^k] \\&= \left[ \frac{\partial {\tilde{x}}^l}{\partial x^i} \left( b^i - \frac{1}{2} \sum _{r=1}^N \Gamma ^i_{jk} \sigma ^j_r \sigma ^k_r \right) + \frac{1}{2} \sum _{r=1}^N \frac{\partial ^2 {\tilde{x}}^l}{\partial x^j \partial x^k} \sigma ^j_r \sigma ^k_r \right] dt + \frac{\partial {\tilde{x}}^l}{\partial x^i} \sigma ^i_r dW^r \\&= \left( {\tilde{b}}^l - \frac{1}{2} \sum _{r=1}^N {\tilde{\Gamma }}^l_{mn} {\tilde{\sigma }}^m_r {\tilde{\sigma }}^n_r \right) dt + {\tilde{\sigma }}^l_r dW^r, \end{aligned} \end{aligned}$$

where the bracket \([\cdot ,\cdot ]\) on the right-hand side (RHS) of the first equality denotes the quadratic variation. This shows that Eq. (2.1) is indeed invariant under changes of coordinates.

Remark 2.1

One can regard \(\sigma = (\sigma _r)_{r=1}^N \in ({\mathbb {R}}^N)^*\otimes {\mathfrak {X}}(M)\) as an \(({\mathbb {R}}^N)^*\)-valued vector field on M. In this way, the pair \((b,\sigma )\) is called an Itô vector field in Gliklikh (2011, Chapter 7), while the pair \(({\mathfrak {b}},\sigma )\) is called an Itô equation therein.

Now we present the definition of weak solutions to (2.1).

Definition 2.2

(Weak solutions to Itô SDEs) Given a linear connection on M, a weak solution of the Itô SDE (2.1) is a triple (XW), \((\Omega ,{\mathcal {F}},{\textbf{P}})\), \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\), where

  1. (i)

    \((\Omega ,{\mathcal {F}},{\textbf{P}})\) is a probability space, and \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\) is a past (i.e., nondecreasing) filtration of \({\mathcal {F}}\) satisfying the usual conditions,

  2. (ii)

    \(X = \{X(t)\}_{t\in [t_0,\tau )}\) is a continuous, \(\{{\mathcal {P}}_t\}\)-adapted M-valued process with \(\{{\mathcal {P}}_t\}\)-stopping time \(\tau >t_0\), W is an N-dimensional \(\{{\mathcal {P}}_t\}\)-Brownian motion, and

  3. (iii)

    for every \(q\in M\), \(t\ge t_0\) and any coordinate chart \((U,(x^i))\) of q, it holds under the conditional probability \({\textbf{P}}(\cdot | X(t_0) = q)\) that almost surely in the event \(\{X(t)\in U\}\),

    $$\begin{aligned} X^i(t)= & {} X^i(t_0) + \int _{t_0}^t \left( b^i(s,X(s)) - \frac{1}{2} \sum _{r=1}^N \Gamma ^i_{jk}(X(s)) \left( \sigma ^j_r \sigma ^k_r\right) (s,X(s)) \right) {\textrm{d}}s \\{} & {} + \int _{t_0}^t \sigma ^i_r(s,X(s)) {\textrm{d}}W^r(s). \end{aligned}$$

Definition 2.3

(Uniqueness in law) We say that uniqueness in the sense of probability law holds for the Itô SDE (2.1) if, for any two weak solutions (XW), \((\Omega ,{\mathcal {F}},{\textbf{P}})\), \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\), and \(({\hat{X}}, {\hat{W}})\), \(({\hat{\Omega }},{\hat{{\mathcal {F}}}},{\hat{{\textbf{P}}}})\), \(\{{\hat{{\mathcal {P}}_t\}}}_{t\in {\mathbb {R}}}\) with the same initial data, i.e., \({\textbf{P}}(X(0)=x_0) = {\hat{{\textbf{P}}}}({\hat{X}}(0)=x_0) = 1\), the two processes X and \({\tilde{X}}\) have the same law.

Note that it is possible to change \(\sigma \) and W in the Itô SDE (2.1) but keep the same weak solution in law. In other words, the form of (2.1) does not univocally correspond to its weak solution in law. For this reason, we will reformulate SDEs in a fashion that makes them look more like ODEs and have better geometric nature. Moreover, we will see that it is the pair \(({\mathfrak {b}}, \sigma \circ \sigma ^*)\) that univocally corresponds to the weak solution of (2.1).

2.2 Mean Derivatives and Mean Differential Equations on Manifolds

In this part, we will recall the definitions of Nelson’s mean derivatives and extend them to M-valued processes. In Nelson’s stochastic mechanics (Nelson 2001), the probability space \((\Omega , {\mathcal {F}}, {\textbf{P}})\) is equipped with two different filtrations. The first one is just an usual nondecreasing filtration \(\{{\mathcal {P}}_t\}_{t \in {\mathbb {R}}}\), a past filtration. The second is a family of nonincreasing sub-\(\sigma \)-fields of \({\mathcal {F}}\), which is denoted by \(\{{\mathcal {F}}_t\}_{t \in {\mathbb {R}}}\) and called a future filtration. For an \({\mathbb {R}}^d\)-valued process \(\{X(t)\}_{t \in I}\), its forward mean derivative DX and forward quadratic mean derivative QX are defined by conditional expectations as follows:

$$\begin{aligned} DX(t)= & {} \lim _{\epsilon \rightarrow 0^+} {\textbf{E}}\left[ \frac{X(t+\epsilon )-X(t)}{\epsilon } \bigg | {\mathcal {P}}_t \right] , \\ Q X(t)= & {} \lim _{\epsilon \rightarrow 0^+} {\textbf{E}}\left[ \frac{(X(t+\epsilon )-X(t))\otimes (X(t+\epsilon )-X(t))}{\epsilon } \bigg | {\mathcal {P}}_t \right] , \end{aligned}$$

Their backward versions, i.e., the backward mean derivative and backward quadratic mean derivative, are defined as follows:

$$\begin{aligned} \overleftarrow{D}X(t)= & {} \lim _{\epsilon \rightarrow 0^+} {\textbf{E}}\left[ \frac{X(t)-X(t-\epsilon )}{\epsilon } \bigg | {\mathcal {F}}_t \right] , \\ \overleftarrow{Q} X(t)= & {} \lim _{\epsilon \rightarrow 0^+} {\textbf{E}}\left[ \frac{(X(t)-X(t-\epsilon ))\otimes (X(t)-X(t-\epsilon ))}{\epsilon } \bigg | {\mathcal {F}}_t \right] . \end{aligned}$$

In our present paper, we will only focus on the “forward” case, so that only the past filtration \(\{{\mathcal {P}}_t\}_{t \in {\mathbb {R}}}\) will be invoked. The “backward” case is analogous and every part of this paper can have its “backward” counterpart (cf. Zambrini 2015).

Denote by \(\textrm{Sym}^2(TM)\) (and \(\textrm{Sym}^2_+(TM)\)) the fiber bundle of symmetric (and respectively, symmetric positive semi-definite) (2, 0)-tensors on M. Now we define quadratic mean derivatives for M-valued semimartingales, cf. Gliklikh (2011, Chapter 9).

Definition 2.4

(Quadratic mean derivatives) The (forward) quadratic mean derivative of the M-valued semimartingale \(\{X(t)\}_{t \in [t_0,\tau )}\) is a \(\textrm{Sym}^2_+(TM)\)-valued process QX on \([t_0,\tau )\), whose value at time \(t\in [t_0,\tau )\) in any coordinate chart \((U,(x^i))\) and in the event \(\{X(t) \in U\}\) is given by

$$\begin{aligned} (Q X)^{ij}(t) = \lim _{\epsilon \rightarrow 0^+} {\textbf{E}}\left[ \frac{(X^i(t+\epsilon )-X^i(t)) (X^j(t+\epsilon )-X^j(t))}{\epsilon } \bigg | {\mathcal {P}}_t \right] , \end{aligned}$$
(2.5)

where the limits are assumed to exist in \(L^1(\Omega , {\mathcal {F}}, {\textbf{P}})\).

More generally, we can define the (forward) quadratic mean derivative for two M-valued semimartingales X and Y in local coordinates by

$$\begin{aligned} (Q (X,Y))^{ij}(t) = \lim _{\epsilon \rightarrow 0^+} {\textbf{E}}\left[ \frac{(X^i(t+\epsilon )-X^i(t)) (Y^j(t+\epsilon )-Y^j(t))}{\epsilon } \bigg | {\mathcal {P}}_t \right] . \end{aligned}$$

Due to Itô’s formula for semimartingales, QX(t) does transform as a (2, 0)-tensor and is obviously symmetric, so that the definition is independent of the choice of U. However, the formal limit \({\textbf{E}}[ \frac{1}{\epsilon } (X^i(t+\epsilon )-X^i(t)) | {\mathcal {P}}_t ]\) under any coordinates \((x^i)\), no longer transforms as a vector, as can be guessed from (2.4). In order to turn it into a vector we need to specify a coordinate system. A natural choice is the normal coordinate system. For this purpose, we endow M with a linear connection \(\nabla \), which determines a normal coordinate system near each point on M.

Definition 2.5

(\(\nabla \)-mean derivatives) Given a linear connection \(\nabla \) on M, the (forward) \(\nabla \)-mean derivative of the M-valued semimartingale \(\{X(t)\}_{t \in [t_0,\tau )}\) is a TM-valued process \(D_\nabla X\) on \([t_0,\tau )\), whose value at time \(t\in [t_0,\tau )\) is defined in normal coordinates \((x^i)\) on the normal neighborhood U of \(q\in M\) and under the conditional probability \({\textbf{P}}(\cdot | X(t) = q)\) as follows:

$$\begin{aligned} (D_\nabla X)^i(t) = \lim _{\epsilon \rightarrow 0^+} {\textbf{E}}\left[ \frac{X^i(t+\epsilon )-X^i(t)}{\epsilon } \bigg | {\mathcal {P}}_t \right] , \end{aligned}$$

where the limits are assumed to exist in \(L^1(\Omega , {\mathcal {F}}, {\textbf{P}})\).

As we force \(D_\nabla X(t)\) to be vector-valued by definition, its coordinate expression under any other coordinate system can be calculated via Leibniz’s rule. Let us stress that the notation \(D_\nabla \) should not be confused with the one of covariant derivatives in geometry.

Now we formally take forward mean derivatives in Itô SDE (2.1), and note that the correction term in the modified drift involving Christoffel symbols vanishes by (2.2). Then, we get an ODE-like system:

$$\begin{aligned} \left\{ \begin{aligned}&D_\nabla X(t) = b(t,X(t)), \\&Q X(t) = (\sigma \circ \sigma ^*)(t,X(t)). \end{aligned} \right. \end{aligned}$$
(2.6)

We call Eq. (2.6) a system of mean differential equations (MDEs). Note that both MDEs (2.6) and Itô SDE (2.1) rely on linear connections on M.

Definition 2.6

(Solutions to MDEs) Given a linear connection on M, a solution of MDEs (2.6) is a triple X, \((\Omega ,{\mathcal {F}},{\textbf{P}})\), \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\), where

  1. (i)

    \((\Omega ,{\mathcal {F}},{\textbf{P}})\) is a probability space, and \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\) is a past filtration of \({\mathcal {F}}\) satisfying the usual conditions,

  2. (ii)

    \(X = \{X(t)\}_{t\in [t_0,\tau )}\) is a continuous, \(\{{\mathcal {P}}_t\}\)-adapted M-valued semimartingale with lifetime a \(\{{\mathcal {P}}_t\}\)-stopping time \(\tau >t_0\), and

  3. (iii)

    the \(\nabla \)-mean derivative and quadratic mean derivative of X exist and satisfy (2.6).

2.3 Second-Order Operators and Martingale Problems

Definition 2.7

(Second-order operators) A second-order operator on M is a linear operator \(A: C^\infty (M) \rightarrow C^\infty (M)\), which has the following expression in a coordinate chart \((U,(x^i))\),

$$\begin{aligned} Af = A^i \frac{\partial f}{\partial x^i} + A^{ij} \frac{\partial ^2 f}{\partial x^i\partial x^j}, \quad f\in C^\infty (M), \end{aligned}$$
(2.7)

where \((A^{ij})\) is a symmetric (2, 0)-tensor field, and the expression is required to be invariant under changes of coordinates. If \((A^{ij})\) is positive semi-definite, then we say the second-order operator A is elliptic; if \((A^{ij})\) is positive definite, we say A is nondegenerate elliptic.

There is a coordinate-free definition of second-order operators. A linear map \(A_q: C^\infty (M) \rightarrow {\mathbb {R}}\) is called a second-order derivation at \(q\in M\), if there is a symmetric (2, 0)-tensor \(\Gamma _{A_q}\) at q such that \(A_q(fg) = f(q) A_q g + g(q) A_qf + (df\otimes dg) (\Gamma _{A_q})\) for all \(f,g \in C^\infty (M)\). Then, a second-order operator is nothing but a smooth field of second-order derivations. From this, we see that for A in (2.7), \(A^i = A(x^i)\), \(A^{ij} = A(x^i x^j) - x^iA(x^j) - x^jA(x^i)\), and

$$\begin{aligned} \Gamma _A = A^{ij} \frac{\partial }{\partial {x^i}}\otimes \frac{\partial }{\partial {x^j}}. \end{aligned}$$
(2.8)

We call \(\Gamma _A\) the squared field operator (originally “opérateur carré du champ”) associated with A. We also denote \(\Gamma _A(f,g):= (df\otimes dg) (\Gamma _A)\). Clearly, for a classical vector field V, \(\Gamma _V \equiv 0\) by Leibniz’s rule.

It is easy to verify from the coordinate-change invariance that the coefficients \(A^i\)’s and \(A^{ij}\)’s transform under the change of coordinates from \((x^i)\) to \(({\tilde{x}}^j)\) by the following rule (e.g., Ikeda and Watanabe 1989, Section V.4),

$$\begin{aligned} {\tilde{A}}^i = \frac{\partial {\tilde{x}}^i}{\partial x^j} A^j + \frac{\partial ^2 {\tilde{x}}^i}{\partial x^j \partial x^k} A^{jk}, \quad {\tilde{A}}^{ij} = \frac{\partial {\tilde{x}}^i}{\partial x^k} \frac{\partial {\tilde{x}}^j}{\partial x^l} A^{kl}. \end{aligned}$$
(2.9)

The formal generator of Itô SDE (2.1) is given by,

$$\begin{aligned} A^X_t = {\mathfrak {b}}^i(t) \frac{\partial }{\partial x^i} + \frac{1}{2} \sum _{r=1}^N \sigma ^i_r(t) \sigma ^j_r(t) \frac{\partial ^2}{\partial x^i\partial x^j}, \end{aligned}$$
(2.10)

which is a time-dependent second-order elliptic operator due to the change-of-coordinate formula (2.4).

Denote by \({\mathcal {C}}_{t_0}\) the subspace of \(C([t_0,\infty ),M^*)\) consisting of all paths always staying in M or eventually stopped at \(\partial _M\). That is, \(\omega \in {\mathcal {C}}_{t_0}\) if and only if there exists \(\tau (\omega )\in (t_0,\infty ]\) such that \(\omega (t)\in M\) for \(t\in [t_0,\tau (\omega ))\) and \(\omega (t) = \partial _M\) for \(t\in [\tau (\omega ),\infty )\). Let \({\mathcal {B}}({\mathcal {C}}_{t_0})\) be the \(\sigma \)-field generated by Borel cylinder sets. Let \(X(t): {\mathcal {C}}_{t_0}\rightarrow M^*, X(t,\omega ) = \omega (t), t\ge t_0\) be the coordinate mapping. For each \(t\in {\mathbb {R}}\), define a sub-\(\sigma \)-field by \({\mathcal {B}}_t = \sigma \{X(s): t_0 \le s\le t_0\vee t\}\). Then, \(\{{\mathcal {B}}_t\}_{t\in {\mathbb {R}}}\) is a past filtration of \({\mathcal {B}}({\mathcal {C}}_{t_0})\) and \(\tau \) is a \(\{{\mathcal {B}}_t\}\)-stopping time.

Definition 2.8

(Martingale problems on manifolds, Hsu 2002, Definition 1.3.1) Given a time-dependent second-order elliptic operator \(A=(A_t)_{t\ge t_0}\), a solution to the martingale problem associated with A is a triple X, \((\Omega ,{\mathcal {F}},{\textbf{P}})\), \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\), where

  1. (i)

    \((\Omega ,{\mathcal {F}},{\textbf{P}})\) is a probability space, and \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\) is a past filtration of \({\mathcal {F}}\) satisfying the usual conditions,

  2. (ii)

    \(X:\Omega \rightarrow {\mathcal {C}}_{t_0}\) is an \(M^*\)-valued \(\{{\mathcal {P}}_t\}\)-semimartingale, and

  3. (iii)

    for every \(f\in C^\infty ({\mathbb {R}}\times M)\), the process \(M^{f,X}(t):= f(t,X(t)) - f(t_0,X(t_0)) - \int _{t_0}^t (\frac{\partial }{\partial t}+A_s) f(s,X(s)) {\textrm{d}}s\), \(t\in [t_0,\tau (X))\), is a real-valued continuous \(\{{\mathcal {P}}_t\}\)-martingale.

The process \(\{X(t)\}_{t\in [t_0,\tau (X))}\) is called an M-valued \(\{{\mathcal {P}}_t\}\)-diffusion process with generator A (or simply an A-diffusion).

The uniqueness in the sense of probability law for both MDEs and martingale problems can be defined in a similar fashion to Definition 2.3. Note that unlike Itô SDEs or MDEs, the definition for martingale problems does not rely on linear connections.

When provided with a linear connection on M, one can see, in the same way as in Stroock and Varadhan’s theory (e.g., Karatzas and Shreve 1991, Section 5.4), that the existence of a solution to the martingale problem associated with \(A^X=(A^X_t)_{t\ge t_0}\) in (2.10) is equivalent to the existence of a weak solution to the Itô SDE (2.1), and also equivalent to the existence of a solution to MDEs (2.6); their uniqueness in law of are also equivalent.

2.4 The Second-Order Tangent Bundle

As we have seen, the modified drift \({\mathfrak {b}}\) in (2.3) is not a vector field. Is \({\mathfrak {b}}\) a section (and, in the affirmative, of what)? In fact, it is not a section of any bundle, as its changes-of-coordinate formula (2.4) involves \(\sigma \). But if we look at the formal generator \(A^X\) in (2.10), or the pair \(({\mathfrak {b}}, \sigma \circ \sigma ^*)\) of its coefficients, then we can construct a bundle whose structure group is governed by the changes-of-coordinate formulae (2.9), so that the sections are just second-order operators.

We denote by \(\textrm{Sym}^2({\mathbb {R}}^d)\) the space of all symmetric (2, 0)-tensors on \({\mathbb {R}}^d\), and by \(\textrm{Sym}^2_+({\mathbb {R}}^d)\) the subspace of it consisting of all positive semi-definite (2, 0)-tensors. Also denote by \({\mathcal {L}}({\mathbb {R}}^n,{\mathbb {R}}^d)\) the space of all linear maps from \({\mathbb {R}}^n\) to \({\mathbb {R}}^d\).

Definition 2.9

(The second-order tangent bundle)

  1. (i)

    Gliklikh (2011, Definition 7.14) The Itô group \(G_I^d\) is the Cartesian product (but not direct product of groups) \(\textrm{GL}(d,{\mathbb {R}}) \times {\mathcal {L}}({\mathbb {R}}^d\otimes {\mathbb {R}}^d,{\mathbb {R}}^d)\) equipped with the following binary operation:

    $$\begin{aligned} (g_2, \kappa _2) \circ (g_1, \kappa _1) = (g_2\circ g_1, g_2\circ \kappa _1 + \kappa _2\circ (g_1\otimes g_1)), \end{aligned}$$

    for all \(g_1, g_2 \in \textrm{GL}(d,{\mathbb {R}})\), \(\kappa _1, \kappa _2\in {\mathcal {L}}({\mathbb {R}}^d\otimes {\mathbb {R}}^d,{\mathbb {R}}^d)\).

  2. (ii)

    The left group action of \(G_I^d\) on \({\mathbb {R}}^d \times \textrm{Sym}^2({\mathbb {R}}^d)\) is defined by

    $$\begin{aligned} (g, \kappa )\cdot ({\mathfrak {b}}, a) = (g{\mathfrak {b}} + \kappa a, (g\otimes g) a), \end{aligned}$$
    (2.11)

    for all \((g, \kappa ) \in G_I^d\), \({\mathfrak {b}}\in {\mathbb {R}}^d\), \(a\in \textrm{Sym}^2({\mathbb {R}}^d)\).

  3. (iii)

    The second-order tangent bundle \(({\mathcal {T}}^O M, \tau ^O_M, M)\) is the fiber bundle with base space M, typical fiber \({\mathbb {R}}^d \times \textrm{Sym}^2({\mathbb {R}}^d)\), and structure group \(G_I^d\).

  4. (iv)

    The fiber \({\mathcal {T}}^O_q M\) at \(q\in M\) is called second-order tangent space to M at q. An element \(({\mathfrak {b}}, a)_q\in {\mathcal {T}}^O_q M\) is called a second-order tangent vector at q. A (global or local) section of \(\tau ^O_M\) is called a second-order vector field.

  5. (v)

    Denote by \({\mathcal {T}}^E M\) the subbundle of \({\mathcal {T}}^O M\) consisting of all elements \(({\mathfrak {b}}, a)_q\in {\mathcal {T}}^O_q M\), \(q\in M\), with \(a_q\) a positive semi-definite (2, 0)-tensors. Let \(\tau ^E_M = \tau ^O_M|_{{\mathcal {T}}^E M}\). We call \(({\mathcal {T}}^E M, \tau ^E_M, M)\) the second-order elliptic tangent bundle.

Remark 2.10

  1. (i)

    We indulge in some abuse of notions. For example, the second-order vector fields should not be confused with the semisprays which are sections of the double tangent bundle \(T^2M\) (e.g., Saunders 1989, Section 1.4; Lang 1999, Section IV.3).

  2. (ii)

    Some authors just defined second-order vector fields as second-order operators as in Definition 2.7 (Emery 1989, Definition 6.3 or Gliklikh 2011, Definition 2.74). As soon as we choose a frame for \({\mathcal {T}}^O M\), it will be clear that second-order vector fields are identified with second-order operators.

  3. (iii)

    The authors in Belopolskaya and Dalecky (1990), Gliklikh (2011) define a bundle which has the Itô group as its structure group and has the pair \(({\mathfrak {b}}, \sigma )\) of coefficients in Itô SDE (2.1) as its section. They name it Itô’s bundle and denote it as \({\mathcal {I}} M\). The difference is that, in our formulation, the pair \(({\mathfrak {b}}, \sigma \circ \sigma ^*)\) of coefficients of the generator of Itô SDE (2.1) is a section of second-order elliptic tangent bundle \(\tau ^E_M\). The advantage of the bundle \(\tau ^E_M\) is that it is a natural generalization of tangent bundle to second-order and has a good geometric interpretation, as we will see in Proposition 3.2.

  4. (iv)

    Note that the typical fiber \({\mathbb {R}}^d \times \textrm{Sym}^2({\mathbb {R}}^d)\) of \(\tau ^O_M\) is a vector space of dimension \(d+\frac{d(d+1)}{2}\). But \(\tau ^E_M\) is not a vector bundle, since its structure group \(G_I^d\) is not a linear group (subgroup of general linear group). The typical fiber of \(\tau ^E_M\) is \({\mathbb {R}}^d \times \textrm{Sym}^2_+({\mathbb {R}}^d)\), which is not even a vector space, so that \(\tau ^E_M\) is not a vector bundle either. Indeed, we may call them quadratic bundles, just as the way they call Itô’s bundle in Belopolskaya and Dalecky (1990, Chapter 4).

  5. (v)

    The Itô’s bundle \({\mathcal {I}} M\) defined in Gliklikh (2011, Definition 7.17) is the fiber bundle over manifold M, with fiber \({\mathbb {R}}^d \times {\mathcal {L}}({\mathbb {R}}^N,{\mathbb {R}}^d)\) and structure group \(G_I^d\) which acts on the fiber from the left by

    $$\begin{aligned} (g, \kappa )({\mathfrak {b}}, \sigma ) = \left( g{\mathfrak {b}} + \textstyle {{\frac{1}{2}}} \textrm{tr}\,(\kappa \circ (\sigma \otimes \sigma )), g \circ \sigma \right) , \end{aligned}$$

    for all \((g, \kappa ) \in G_I^d\), \({\mathfrak {b}}\in {\mathbb {R}}^d\), \(\sigma \in {\mathcal {L}}({\mathbb {R}}^N,{\mathbb {R}}^d)\). For the same reason as \({\mathcal {T}}^O M\) or \({\mathcal {T}}^E M\), Itô’s bundle \({\mathcal {I}} M\) is not a vector bundle. There is a bundle homomorphism over M from \({\mathcal {I}} M\) to \({\mathcal {T}}^E M\), which maps in fibers from \({\mathcal {I}}_q M\) to \({\mathcal {T}}^E_q M\), \(q\in M\), by \(({\mathfrak {b}}, \sigma ) \rightarrow ({\mathfrak {b}}, \sigma \circ \sigma ^*)\). It is easy to see that this bundle homomorphism is also a subjective submersion. If we identify \(g\in \textrm{GL}(d,{\mathbb {R}})\) with \((g,0) \in G_I^d\), then \(\textrm{GL}(d,{\mathbb {R}})\) is a subgroup of \(G_I^d\). We define the Stratonovich’s bundle \({\mathcal {S}} M\) to be the reduction of \({\mathcal {I}} M\) to the structure group \(\textrm{GL}(d,{\mathbb {R}})\), that is, the fiber bundle over M, with fiber \({\mathbb {R}}^d \times {\mathcal {L}}({\mathbb {R}}^N,{\mathbb {R}}^d)\) and structure group \(\textrm{GL}(d,{\mathbb {R}})\) which acts on the fiber from the left by

    $$\begin{aligned} g({\mathfrak {b}}, \sigma ) = (g{\mathfrak {b}}, g \circ \sigma ). \end{aligned}$$

    Unlike \({\mathcal {T}}^O M\) or \({\mathcal {I}} M\), Stratonovich’s bundle \({\mathcal {S}} M\) is indeed a vector bundle, and the tangent bundle TM is a vector subbundle of \({\mathcal {S}} M\). It can be expected that Stratonovich’s bundle is a natural bundle to formulate Stratonovich SDEs. But, in this paper, we mainly focus on Itô SDEs and their generators.

It is natural to regard the differential operators

$$\begin{aligned} \left\{ \frac{\partial }{\partial {x^i}}, \frac{\partial ^2}{\partial x^j \partial x^k}: 1\le i\le d, 1\le j\le k \le d \right\} \end{aligned}$$
(2.12)

as a local frame of \({\mathcal {T}}^O M\) over the local chart \((U,(x^i))\) on M. In the sequel, we will usually shorten them by

$$\begin{aligned} \left\{ \partial _i,\ \partial _j\partial _k: 1\le i\le d, 1\le j\le k \le d \right\} . \end{aligned}$$

We make the convention that \(\partial _k\partial _j = \partial _j\partial _k\) for all \(1\le j\le k \le d\). A second-order vector field \(({\mathfrak {b}},a)\) is expressed in terms of this local frame by

$$\begin{aligned} ({\mathfrak {b}},a) = {\mathfrak {b}}^i \partial _i + \textstyle {{\frac{1}{2}a^{jk}}} \partial _j\partial _k. \end{aligned}$$

In this way, every second-order vector field can be regarded as a second-order operator and vice versa. In particular, the generator \(A^X\) of an M-valued diffusion process X, for example the generator (2.10) of the Itô SDE, is a time-dependent second-order vector field, so that we can rewrite \(A^X\) as \(A^X_t = ({\mathfrak {b}}(t),(\sigma \circ \sigma ^*)(t))\).

The tangent bundle TM is a subbundle (but not a vector subbunddle) and also an embedded submanifold of \({\mathcal {T}}^O M\), as the bundle monomorphism

$$\begin{aligned} \iota : (TM, \tau _M, M)\rightarrow \big ({\mathcal {T}}^O M, \tau ^O_M, M\big ), \quad v_q\mapsto (v,0)_q \end{aligned}$$
(2.13)

is also an embedding. However, there is no canonical bundle epimorphism from \({\mathcal {T}}^O M\) to TM which is a left inverse of \(\iota \) and linear in fiber. We call such a bundle epimorphism a fiber-linear bundle projection from \({\mathcal {T}}^O M\) to TM. The choice of such a bundle epimorphism is exactly the choice of a linear connection on M. More precisely, we have the following connection correspondence properties, the first of which can also be found in Gliklikh (2011, Section 2.9).

Proposition 2.11

(Connection correspondence) Any linear connection on M induces a fiber-linear bundle projection from \({\mathcal {T}}^O M\) to TM. Conversely, any fiber-linear bundle projection from \({\mathcal {T}}^O M\) to TM induces a torsion-free linear connection on M.

Remark 2.12

The connection correspondence is similar to the correspondence between horizontal subbundles of the tangent bundle of a vector bundle and connections on this vector bundle, cf. Saunders (1989, Section 3.1).

Proof

Let \((\Gamma _{ij}^k)\) be the Christoffel symbols of a linear connection \(\nabla \) on M. Define a projection by

$$\begin{aligned} \varrho _\nabla : {\mathcal {T}}^O M \rightarrow T M, \quad ({\mathfrak {b}}, a)_q \mapsto \left( {\mathfrak {b}}^i + \textstyle {{\frac{1}{2}}} a^{jk} \Gamma ^i_{jk}(q) \right) \partial _i\big |_q. \end{aligned}$$
(2.14)

Clearly, \(\varrho _\nabla \) is linear in fiber and \(\varrho _\nabla \circ \iota = \textbf{Id}_{TM}\). Conversely, let \(\varrho : {\mathcal {T}}^O M\rightarrow T M\) be a fiber-linear bundle projection. Then, on each coordinate chart \((U,(x^i))\) around \(q\in M\), there exists a diffeomorphism \(B_U: U \rightarrow {\mathcal {L}}(\textrm{Sym}^2({\mathbb {R}}^d), {\mathbb {R}}^d)\), such that

$$\begin{aligned} \varrho ( {\mathfrak {b}}, a ) = \left( {\mathfrak {b}}^i + B_U(q)(a)^i \right) \partial _i\big |_q, \quad ( {\mathfrak {b}}, a ) \in {\mathcal {T}}^O_q M, q\in U. \end{aligned}$$

The family of diffeomorphisms \((B_U)\) determines a spray and then a torsion-free linear connection on M (see, e.g., Lang 1999, Section IV.3). The torsion-freeness follows from the symmetry of \(B_U\)’s. \(\square \)

Observe that a group action of \(\textrm{GL}(d,{\mathbb {R}})\) on \(\textrm{Sym}^2({\mathbb {R}}^d)\) can be separated from (2.11), which is given by \(g\cdot a = (g\otimes g) a\). Thus, the second component a of each element \(( {\mathfrak {b}}, a ) \in {\mathcal {T}}^O_q M\) can be regarded as a (2, 0)-tensor. Recall that we denote by \(\textrm{Sym}^2(TM)\) the bundle of (2, 0)-tensors on M, then there is a canonical bundle epimorphism

$$\begin{aligned} {\hat{\varrho }}: {\mathcal {T}}^O M \rightarrow \textrm{Sym}^2(TM), \quad ({\mathfrak {b}}, a)_q \mapsto a_q, \end{aligned}$$
(2.15)

whose kernel is the image of \(\iota \). Conversely, we also have a similar connection correspondence property for \(\textrm{Sym}^2(TM)\), as in Proposition 2.11. That is, a linear connection \(\nabla \) on M induces a fiber-linear bundle monomorphism from \(\textrm{Sym}^2(TM)\) to \({\mathcal {T}}^O M\), which is a right inverse of \({\hat{\varrho }}\) and given by

$$\begin{aligned} {\hat{\iota }}_{\nabla }: \textrm{Sym}^2(TM) \rightarrow {\mathcal {T}}^O M, \quad a_q \mapsto a^{ij} \left( \partial _i\partial _j \big |_q - \Gamma ^k_{ij}(q) \partial _k \big |_q \right) = a^{ij} \nabla ^2_{\partial _i,\partial _j}\big |_q\qquad \quad \end{aligned}$$
(2.16)

where \(\nabla ^2\) is the second covariant derivative (Petersen 2016, Subsection 2.2.2.3) [which is also called the Hessian operator when acting on smooth functions (Jost 2017)]. In other words, \(\nabla ^2_{\partial _i,\partial _j}|_q = {\hat{\iota }}_{\nabla }(dx^i \odot dx^j |_q)\), where \(\odot \) is the symmetrization operator on \(T^2 M\).

Combining (2.13) and (2.14) together, we have the following short exact sequence:

$$\begin{aligned} 0 \longrightarrow TM {\mathop {\longrightarrow }\limits ^{\iota }} {\mathcal {T}}^O M {\mathop {\longrightarrow }\limits ^{{\hat{\varrho }}}} \textrm{Sym}^2(TM) \longrightarrow 0. \end{aligned}$$
(2.17)

Proposition 2.11 and (2.15), (2.16) imply that when a linear connection \(\nabla \) is given, the sequence is also split, in the fiber-wise sense. The induced decomposition

$$\begin{aligned} {\mathcal {T}}^O M = \iota (TM) \oplus {\hat{\iota }}_{\nabla } \left( \textrm{Sym}^2(TM) \right) \cong TM \oplus \textrm{Sym}^2(TM), \end{aligned}$$
(2.18)

where both the first direct sum \(\oplus \) and the isomorphism \(\cong \) are in the fiber-wise sense (but not bundle isomorphism and Whitney sum), while the second direct sum is the Whitney sum, and is given by

$$\begin{aligned} ({\mathfrak {b}}, a)_q = b^i \partial _i\big |_q + \textstyle {{\frac{1}{2}}} a^{ij} \nabla ^2_{\partial _i,\partial _j}\big |_q \mapsto (b_q, a_q), \end{aligned}$$
(2.19)

for \(b_q = ( {\mathfrak {b}}^i + \textstyle {{\frac{1}{2}}} a^{jk} \Gamma ^i_{jk}(q) ) \partial _i |_q \in T_qM\). A similar short exact sequence as (2.17) holds with \({\mathcal {T}}^E M\) and \(\textrm{Sym}^2_+(TM)\) in place of \({\mathcal {T}}^O M\) and \(\textrm{Sym}^2(TM)\), respectively.

Now we introduce a subclass of semimartingales on manifolds which contains diffusions. We call the M-valued process \(X= \{X(t)\}_{t\in [t_0,\tau )}\) an Itô process, if there exists a \(\{{\mathcal {P}}_t\}\)-adapted continuous \({\mathcal {T}}^E M\)-valued process \(\{({\mathfrak {b}}, a)(t)\}_{t\in [t_0,\tau )}\) satisfying \(({\mathfrak {b}}, a)(t) \in {\mathcal {T}}^E_{X(t)} M\) for each \(t\in [t_0,\tau )\), such that for every \(f\in C^\infty ({\mathbb {R}}\times M)\), \(M^{f,X}(t):= f(t,X(t)) - f(t_0,X(t_0)) - \int _{t_0}^t (\frac{\partial }{\partial {t}}+ {\mathcal {A}}^X )f(s,X(s)) {\textrm{d}}s\), \(t\in [t_0,\tau )\) is a real-valued \(\{{\mathcal {P}}_t\}\)-martingale, where \({\mathcal {A}}^X_t = ({\mathfrak {b}}, a)(t) = {\mathfrak {b}}^i(t) \partial _i + \textstyle {{\frac{1}{2}}} a^{ij}(t) \partial _i\partial _j\). We call the process \(\{({\mathfrak {b}}, a)(t)\}_{t\in [t_0,\tau )}= \{{\mathcal {A}}^X_t\}_{t\in [t_0,\tau )}\) the random generator of X. A similar notion “Brownian semimartingale” is also used in the literature (e.g., Driver 1992). If X is a diffusion with generator \(A^X_t = ({\mathfrak {b}}(t), a(t))\), then it is an Itô process with random generator \({\mathcal {A}}^X_t = A^X_{(t,X(t))} = (\mathfrak b(t,X(t)), a(t,X(t)))\). The difference between Itô processes and diffusions is that the randomness of the random generator of the former can not only appear on the base manifold M, but also on the fibers.

Then, we can define forward mean derivatives in a coordinate-free way, without relying on linear connections.

Definition 2.13

(Mean derivatives) For an M-valued Itô process \(X= \{X(t)\}_{t\in [t_0,\tau )}\), we define its (forward) mean derivatives (DX(t), QX(t)) at time \(t\in [t_0,\tau )\) by

$$\begin{aligned} (DX(t), QX(t)) = ({\mathfrak {b}}, a)(t) \in {\mathcal {T}}^E_{X(t)} M, \end{aligned}$$

where \(({\mathfrak {b}}, a)\) is the random generator of X.

Comparing with forward mean derivatives defined in local coordinates before, we have the following relations. The proof follows the lines of Gliklikh (2011, Lemma 9.4).

Lemma 2.14

Given an M-valued Itô process \(X= \{X(t)\}_{t\in [t_0,\tau )}\) and a coordinate chart \((U,(x^i))\) centered at \(q\in M\).

  1. (i)

    In the event \(\{X(t) \in U\}\), QX(t) has the coordinate expression (2.5) and

    $$\begin{aligned} (D X)^i(t) = \lim _{\epsilon \rightarrow 0^+} {\textbf{E}}\left[ \frac{X^i(t+\epsilon )-X^i(t) }{\epsilon } \bigg | {\mathcal {P}}_t \right] . \end{aligned}$$
  2. (ii)

    Given a linear connection \(\nabla \) on M, we have, under the conditional probability \({\textbf{P}}(\cdot | X(t) = q)\), that

    $$\begin{aligned} (D_\nabla X)^i(t) = (D X)^i(t) + \frac{1}{2} \Gamma ^i_{jk}(X(t)) (QX)^{jk}(t). \end{aligned}$$
    (2.20)

It follows from (2.20) that the map \(\varrho _\nabla \) in (2.14) acts on the generator \(A^X\) of a diffusion X by

$$\begin{aligned} \varrho _\nabla \left( A^X_{(t,X(t))}\right) = \varrho _\nabla (DX(t), QX(t)) = D_\nabla X(t) \end{aligned}$$
(2.21)

For a time-dependent second-order vector field \(A_t = ({\mathfrak {b}}(t), a(t))\), we can take MDEs (2.6) to set up a new type of MDEs by using the mean derivatives as follows:

$$\begin{aligned} \left\{ \begin{aligned} D X(t)&= {\mathfrak {b}}(t,X(t)), \\ Q X(t)&= a(t,X(t)). \end{aligned} \right. \end{aligned}$$
(2.22)

Then, similarly to Definitions 2.6 and 2.3, we may also define solutions and uniqueness in law for MDEs (2.22). We call a solution of (2.22) an integral process of \(A = (A_t)\). Note that the system (2.22) does not rely on linear connections. The equivalence of the well-posedness of (2.22) and the martingale problem in Definition 2.8 is easy to verify. When a linear connection is specified, the system (2.22) and martingale problem associated with \(A^X\) in (2.10) are both equivalent to the Itô SDE (2.1) and MDEs (2.6).

3 Stochastic Jets

In classical differential geometry, a tangent vector to a manifold may be defined as an equivalence class of curves passing through a given point, where two curves are equivalent if they have the same derivative at that point (Lee 2013, Chapter 3). This idea can be generalized to higher-order cases, which leads to the notion of jets. The jet structures allow us to translate a system of differential equations to a system of algebraic equations, and make it more intuitive to study the symmetries of systems of differential equations.

In this section we shall generalize these ideas to the stochastic case. We will first give an equivalent description to the second-order elliptic tangent bundle \(\tau ^E_M\) by constructing an equivalence relation on diffusions. Then, we will define the stochastic jets and figure out the “jet-like” bundle structure involved in the space of stochastic jets. Finally, we shall see that the bundle structure is the appropriate platform to formulate SDEs intrinsically. In the next section, we will apply stochastic jets to study stochastic symmetries.

3.1 The Stochastic Tangent Bundle

Recall that a tangent vector can be represented as a equivalence classes of smooth curves that have the same velocity at the base point. This leads to the following equivalent definition of tangent bundle TM:

$$\begin{aligned} TM \cong \left\{ [\gamma ]_q: \gamma \in C^\infty _{(0,q)}(M), q\in M \right\} , \end{aligned}$$
(3.1)

where \(C^\infty _{(0,q)}(M)\) is the set of all smooth curves on M that pass through q at time \(t=0\), and the equivalence relation is defined as \(\gamma ,{\tilde{\gamma }}\in C^\infty _{(0,q)}(M)\) are equivalent if and only if \((f\circ \gamma )'(0)=(f\circ {\tilde{\gamma }})'(0)\) for every real-valued smooth function f defined in neighborhood q. If we replace smooth curves by diffusion processes, and time derivatives by mean derivatives, then we get the following definition.

Definition 3.1

(The stochastic tangent bundle) Two M-valued diffusion processes \(X=\{X(t)\}_{t\in [0,\tau )}\), \(Y=\{Y(t)\}_{t\in [0,\sigma )}\) are said to be stochastically equivalent at \((t,q)\in {\mathbb {R}}\times M\), if, almost surely, \(X(t)=Y(t)=q\) and \(D(f\circ X)(t) = D(f\circ Y)(t)\) for all \(f\in C^\infty (M)\). The equivalence class containing X is called the stochastic tangent vector of X at q and is denoted by \(j_{(t,q)} X\). When \(t=0\), we denote \(j_q X:= j_{(0,q)}X\) in short. Let \(I_{(t,q)}(M)\) be the set of all M-valued diffusion processes starting from q at time t. The stochastic tangent bundle of M is the set

$$\begin{aligned} {\mathcal {T}}^S M = \{ j_q X: X\in I_{(0,q)}(M), q\in M \}. \end{aligned}$$

Note that since XY are M-valued diffusion processes, f(X) and f(Y) are real-valued Itô processes, and hence their mean derivatives exists.

At this stage, we have not yet touched the jet-like formulation even though we used the jet-like notation \(j_q X\). Indeed, if one follows strictly the definition of jet bundles over the trivial bundle \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\), it is more rational to use the time line \({\mathbb {R}}\) as “source” and the manifold M as “target” (cf. Saunders 1989, Example 4.1.16). But here we just assign the “target” to the manifold M, because, roughly speaking, one can talk about the velocity of a smooth curve at a moment t, but not about the generator of a diffusion at a moment t. Instead, we can talk about the generator of a diffusion at a position \(q\in M\). Later on, we will define the “bona fide” stochastic jet space which possess the time line \({\mathbb {R}}\) as “source” and the manifold M as “target.”

Similarly to the one-to-one correspondence between tangent space and space of equivalence classes of smooth curves, we have the following:

Proposition 3.2

There is a one-to-one correspondence between the stochastic tangent bundle \({\mathcal {T}}^S M\) and the second-order elliptic tangent bundle \({\mathcal {T}}^E M\).

Proof

For an M-valued diffusion process \(X\in I_{(0,q)}(M)\), \(q\in M\), we denote by \(A^X\) its generator. Then, the map \(j_q X \mapsto A^X_{(0,q)} = (DX(0), QX(0))\) defines a one-to-one correspondence between \({\mathcal {T}}^S M\) and \({\mathcal {T}}^E M\). The inverse map is \(A_q = ({\mathfrak {b}},a)_q \mapsto j_q X^A\), where A is a section of \({\mathcal {T}}^E M\) (i.e., an elliptic second-order operator) smoothly extending the element \(A_q\in {\mathcal {T}}^E_q M\), and \(X^A\in I_{(0,q)}(M)\) is a diffusion processes having A as its generator. \(\square \)

Therefore, the stochastic tangent bundle \({\mathcal {T}}^S M\) admit a smooth structure which makes it to be a smooth manifold diffeomorphic to \({\mathcal {T}}^E M\), and hence it is a bona fide fiber bundle over M. In the sequel, we will identify \({\mathcal {T}}^S M\) with \({\mathcal {T}}^E M\) without ambiguity. And the projection map from \({\mathcal {T}}^S M\) to M will be denoted by \(\tau ^S_M\), that is, \(\tau ^S_M(j_q X) = q\) for any \(j_q X\in {\mathcal {T}}^S M\).

Definition 3.3

(Canonical coordinate system on \({\mathcal {T}}^S M\)) Let \((U, (x^i))\) be an coordinate system on M. The induced canonical coordinate chart \((U^{(1)}, x^{(1)})\) on \({\mathcal {T}}^S M\) is defined by

$$\begin{aligned} U^{(1)}:= \{ j_q X: q \in U, X\in I_{(0,q)}(M) \}, \quad x^{(1)}:= \big (x^i, D^i x, Q^{jk} x\big ), \end{aligned}$$

where \(x^i(j_q X) = x^i(q)\), \(D^i x(j_q X) = (DX)^i(0)\) and \(Q^{jk} x(j_q X) = (Q X)^{jk}(0)\).

Our slightly ambiguous notations \(D^i x\) and \(Q^{jk} x\) are chosen so as to avoid the worse one \(Qx^{jk}\).

When a linear connection \(\nabla \) is provided, we can also define the coordinates via the \(\nabla \)-mean derivative \(D_\nabla \) instead of D, as follows:

$$\begin{aligned} D^i_\nabla x(j_q X):= (D_\nabla X)^i(0). \end{aligned}$$

Then, \(x^{(1)}_\nabla := (x^i, D^i_\nabla x, Q^{jk} x)\) also forms a coordinate system on \({\mathcal {T}}^S M\), which we call the \(\nabla \)-canonical coordinate system. It follows from relation (2.20) that

$$\begin{aligned} D_\nabla ^i x = D^ix + \textstyle {{\frac{1}{2}}} \left( \Gamma ^i_{jk}\circ x\right) Q^{jk} x. \end{aligned}$$
(3.2)

Using the identification of elements \(j_q X \in {\mathcal {T}}^S_q M\) and \(({\mathfrak {b}},a)_q \in {\mathcal {T}}^E_q M\) via Proposition 3.2, as well as their relations with the element \((b_q, a_q)\in TM \oplus \textrm{Sym}^2(TM)\), via (2.19), we have \(D^i x(j_q X) = {\mathfrak {b}}^i\), \(D^i_\nabla x(j_q X) = b^i = \mathfrak b^i + \textstyle {{\frac{1}{2}}} a^{jk} \Gamma ^i_{jk}(q)\) and \(Q^{jk} x(j_q X) = a^{jk}\). In this way the fiber-linear bundle projection \(\varrho _\nabla \) of (2.14) maps, under the canonical coordinates \((x,\dot{x})\) on TM, as follows:

$$\begin{aligned} \dot{x}^i \circ \varrho _\nabla (j_q X) = \left( D^ix + \textstyle {{\frac{1}{2}}} \left( \Gamma ^i_{jk}\circ x\right) Q^{jk} x \right) (j_q X) = D^i_\nabla x(j_q X), \end{aligned}$$
(3.3)

so that \(D_\nabla ^ix = \dot{x}^i \circ \varrho _\nabla \). Therefore, \((x^i, D_\nabla ^ix)\) is a partial coordinate system on \({\mathcal {T}}^S M\) that coincides with \((x^i,\dot{x}^i)\) when restricted on TM. Moreover, the decomposition in (2.19) yields the following expressions for second-order vector fields:

$$\begin{aligned} (Dx, Qx) = D^i x \partial _i + \textstyle {{\frac{1}{2}}} Q^{jk} x \partial _j\partial _k = D_\nabla ^i x \partial _i + \textstyle {{\frac{1}{2}}} Q^{jk} x \nabla ^2_{\partial _j,\partial _k}. \end{aligned}$$
(3.4)

Similarly to Definition 3.1, we define a \(\nabla \)-dependent equivalence relation as follows:

Definition 3.4

Two M-valued diffusion processes \(X=\{X(t)\}_{t\in [0,\tau )}\), \(Y=\{Y(t)\}_{t\in [0,\sigma )}\) are said to be \(\nabla \)-stochastically equivalent at \((t,q)\in {\mathbb {R}}\times M\), if, almost surely, \(X(t)=Y(t)=q\) and \(D_\nabla X(t) = D_\nabla X(t)\). The equivalence class containing X is called the \(\nabla \)-tangent vector of X at q and is denoted by \(j^\nabla _{(t,q)} X\). When \(t=0\), we denote \(j^\nabla _q X:= j^\nabla _{(0,q)}X\) for short.

Then, similarly to Proposition 3.2, one can show that the tangent bundle TM can be identified with the following set of equivalent classes of diffusions:

$$\begin{aligned} \left\{ j_q^\nabla X: X\in I_{(0,q)}(M), q\in M \right\} , \end{aligned}$$
(3.5)

via \(j_q^\nabla X\mapsto D_\nabla X(0)\). Under this identification, it follows from (2.21) that \(j_q^\nabla X = \varrho _\nabla (j_q X)\). Clearly, if we regard all smooth curves as special diffusions, then the partition determined by (3.1) is the restriction of the one determined by (3.5) to the set of all smooth curves.

Remark 3.5

In presence of a linear connection \(\nabla \) on M, one can easily follow Definition 3.1 and Proposition 3.2 with \(D_\nabla \) in place of D, to verify the one-to-one correspondence between the set \({\mathcal {T}}^S M\) of equivalent classes and the Whitney sum \(TM \oplus \textrm{Sym}^2_+(TM)\), which brings back to the fiber-wise isomorphism (2.18). But since such kind of correspondence need to specify beforehand a linear connection, we still endow \({\mathcal {T}}^S M\) with the structure of \({\mathcal {T}}^E M\) instead of that of \(TM \oplus \textrm{Sym}^2(TM)\) in this paper, although the latter is also feasible and may provide easier calculations.

3.2 The Stochastic Jet Space

In classical jet theory, for the trivial bundle \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\), there is a one-to-one correspondence between 1-jets and tangent vectors, and there is a canonical diffeomorphism between the first-order jet bundle \(J^1 \pi \) and \({\mathbb {R}}\times TM\) (Saunders 1989, Example 4.1.16).

Now using similar ideas, we will introduce the “bona fide” stochastic jet space. The key is to modify the definition of stochastic tangent vectors, to involve the time line \({\mathbb {R}}\) as the “source” as well as to randomize the initial datum of the diffusion processes. Intuitively, an M-valued diffusion process X can be regarded as a random “section” of the trivial “bundle” \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) which is merely continuous in time and depends on the sample point \(\omega \).

For a metric space (Fd), we denote by \(L^0(\Omega , F)\) the quotient space of all F-valued random elements, by the following equivalence relation: two random elements are equivalent if and only if they are identical almost surely. We endow \(L^0(\Omega , F)\) with the topology of the following \({\textbf{P}}\)-essential metric (cf. Munkres 1975, Section 43):

$$\begin{aligned} \rho (\xi ,\zeta ) = \inf \{c>0: {\textbf{P}}(d(\xi ,\zeta )>c) =0 \} \wedge 1. \end{aligned}$$

Definition 3.6

Two M-valued diffusion processes \(X=\{X(s)\}_{s\in [t,\tau )}\), \(Y=\{Y(s)\}_{s\in [t,\sigma )}\) starting at time t, are said to be stochastically equivalent at \(t\in {\mathbb {R}}\), if, almost surely, \(X(t)= Y(t)\) and \((DX(t), QX(t)) = (DY(t), QY(t))\). The equivalence class containing X is called the stochastic jet of X at t, denoted by \(j_t X\). Let \(I_t(M)\) be the set of all M-valued diffusion processes starting at time t. Then, the stochastic jet space of M is the set

$$\begin{aligned} {\mathcal {J}}^S M = \{ j_t X: X\in I_t(M), t\in {\mathbb {R}}\}. \end{aligned}$$

The functions \(\pi ^S_1\) and \(\pi ^S_{1,0}\), called stochastic source and target projections, are defined by

$$\begin{aligned} \pi ^S_1: {\mathcal {J}}^S M \rightarrow {\mathbb {R}}, \quad j_t X \mapsto t, \end{aligned}$$

and

$$\begin{aligned} \pi ^S_{1,0}: {\mathcal {J}}^S M \rightarrow {\mathbb {R}}\times L^0(\Omega , M), \quad j_t X \mapsto (t,X(t)). \end{aligned}$$

In the above definition, since \(\pi _M\circ \phi = \textbf{Id}_M\), we have \(\pi (Y) = \pi _M\circ \phi (X) = X\) a.s., that is, X is the projection of Y.

To characterize the relation between \({\mathcal {J}}^S M\) and \(\mathcal T^S M\) (or \({\mathcal {T}}^E M\)), we need the following definitions.

Definition 3.7

(Horizontal subspace) Let \((E,\pi _M, M)\) be a fiber bundle. The horizontal subspace of \(L^0(\Omega ,E)\) is defined by

$$\begin{aligned} L^h(\Omega ; \pi _M):= \{ \phi \circ \xi \in L^0(\Omega , E): \phi \text { is a section of } \pi _M, \xi \in L^0(\Omega , M) \}. \end{aligned}$$

An element of the horizontal subspace \(L^h(\Omega ; \tau ^E_M)\) of \(L^0(\Omega , {\mathcal {T}}^E M)\) is then of the form \(A \circ \xi \), where A is a section of \(\tau ^E_M\) and \(\xi \in L^0(\Omega , M)\). Such an element \(A \circ \xi \) will be denoted by \(A_\xi \). By the correspondence of \({\mathcal {T}}^S M\) and \({\mathcal {T}}^E M\), one can easily get the following equivalent definition for \(L^h(\Omega ; \tau ^E_M)\),

$$\begin{aligned} L^h\left( \Omega ; \tau ^E_M\right) = L^h\left( \Omega ; \tau ^S_M\right) := \{j_{X(0)} X: X\in I_0(M) \} \subset L^0(\Omega , {\mathcal {T}}^S M). \end{aligned}$$

The correspondence is given explicitly by

$$\begin{aligned} j_{X(0)} X = A^X_{X(0)} = (DX(0), QX(0)), \quad \text {or} \quad A_\xi = j_\xi X^{A_\xi }. \end{aligned}$$

where \(X^{A_\xi }\) is an M-valued diffusion with generator A and with \(X^{A_\xi }(0) = \xi \) a.s..

Proposition 3.8

The stochastic jet space \({\mathcal {J}}^S M\) is trivial. More precisely, we have the homeomorphism

$$\begin{aligned} {\mathcal {J}}^S M \cong {\mathbb {R}}\times L^h\left( \Omega ; \tau ^S_M\right) , \end{aligned}$$

given by \(j_t X \mapsto (t, j_{X(t)} (\theta _t X))\), for any \(X\in I_t(M)\), where \(\theta _t\) is the shift operator on \({\mathcal {C}}\), that is, \(\theta _t \omega (\cdot ) = \omega (\cdot +t)\).

Proof

The homeomorphism \({\mathcal {J}}^S M \cong {\mathbb {R}}\times {\mathcal {J}}^S_0 M\) is given by \(j_t X \mapsto (t, j_0 (\theta _t X))\). The homeomorphism \({\mathcal {J}}^S_0 M \cong L^h(\Omega ; \tau ^S_M)\) is given by \(j_0 X \mapsto j_{X(0)} X\), whose inverse map is \(A_\xi \mapsto j_0 X^{A_\xi }\). \(\square \)

Definition 3.9

(Stochastic fibered space)

  1. (i)

    Given a fiber bundle \((E,\pi _M, M)\) with total space E, base space M and typical fiber manifold F, the stochastic fibered space associated with it is the triplet \((E^S,\pi ^S_M, M)\) where

    $$\begin{aligned} E^S:= \{ (q, \xi ): q\in M, \xi \in {\hat{L}}(\Omega , E_q) \}, \end{aligned}$$

    \(\pi ^S_M: E^S\rightarrow M\) is the natural projection given by \(\pi ^S_M(q, \xi ) = q\), and \({\hat{L}}(\Omega ,F)\) is a subspace of \(L^0(\Omega ,F)\), with \(E_q\) denoting the fiber of \(\pi _M\) over q. The fiber bundle E is called model bundle of \(E^S\). There is a family of projections \(\{\pi _\omega \}_{\omega \in \Omega }\) from the stochastic fiber manifold \(E^S\) to its model bundle E, defined by

    $$\begin{aligned} \pi _\omega : E^S\rightarrow E, \quad (q, \xi ) \mapsto (q, \xi (\omega )). \end{aligned}$$
  2. (ii)

    A global section of \((E^S,\pi ^S_M, M)\) is called a random global section. A random local section is a map \(\sigma : U \rightarrow E\) defined on some measurable subset \(U\subset \Omega \times M\) and such that, for almost all \(\omega \in \Omega \), \(\sigma (\omega ): U_\omega \rightarrow E\) is a local section of \((E,\pi _M, M)\), where \(U_\omega = U\cap (\{\omega \}\times M)\).

Note that a random global section is a random local section defined on all \(\Omega \times M\).

It follows from Proposition 3.8 that the stochastic jet space \(({\mathcal {J}}^S M, \pi _1^S, {\mathbb {R}})\) is a stochastic fibered space, whose associated model bundle is \(({\mathbb {R}}\times {\mathcal {T}}^S M, \pi _1, {\mathbb {R}})\). Just like the first-order jet bundle \(J^1 \pi \) which is diffeomorphic to \({\mathbb {R}}\times TM\), the model bundle \({\mathbb {R}}\times \mathcal T^S M\) is itself a jet bundle and also has two bundle structures, with base space \({\mathbb {R}}\) and \({\mathbb {R}}\times M\), respectively. The corresponding source and target projections are defined, respectively by

$$\begin{aligned} \pi _1: {\mathbb {R}}\times {\mathcal {T}}^S M \rightarrow {\mathbb {R}}, \quad (t, j_q X) \mapsto t, \end{aligned}$$

and

$$\begin{aligned} \pi _{1,0}: {\mathbb {R}}\times {\mathcal {T}}^S M \rightarrow {\mathbb {R}}\times M, \quad (t, j_q X)\mapsto (t,q). \end{aligned}$$

Moreover, we will denote the natural projection from \({\mathbb {R}}\times {\mathcal {T}}^S M\) to \({\mathcal {T}}^S M\) by \(\pi _{0,1}\). This projection map is indeed a bundle homomorphism from \(({\mathbb {R}}\times {\mathcal {T}}^S M, \pi _{1,0}, {\mathbb {R}}\times M)\) to \(({\mathcal {T}}^S M, \tau ^S_M, M)\), whose projection is the natural projection from \({\mathbb {R}}\times M\) to M, denoted by \({\hat{\pi }}\).

Similarly to Proposition 3.8, we have the following diffeomorphisms for the model bundle \({\mathbb {R}}\times {\mathcal {T}}^S M\):

$$\begin{aligned} \{ j_{(t,q)} X: X\in I_{(t,q)}(M), t\in {\mathbb {R}}, q\in M \} \cong {\mathbb {R}}\times {\mathcal {T}}^S M \cong {\mathbb {R}}\times {\mathcal {T}}^E M, \end{aligned}$$

which is given by

$$\begin{aligned} j_{(t,q)} X \mapsto (t, j_q (\theta _t X)) \mapsto A^X_{(t,q)} = (t, DX(t), QX(t)), \end{aligned}$$
(3.6)

for any \(X\in I_{(t,q)}(M)\), where \(A^X\) is the generator of X as a section of \({\mathbb {R}}\times {\mathcal {T}}^E M\) (i.e., a time-dependent elliptic second-order differential operator). Furthermore, the proof of Proposition 3.2 allows us to find simply the inverse maps, especially for the second diffeomorphism. That is, for any \((t,A_q) = (t,{\mathfrak {b}},a) \in \pi _{1,0}^{-1}(t,q)\),

$$\begin{aligned} (t,A_q) = (t,{\mathfrak {b}},a) \mapsto \big (t, j_q \big (\theta _t X^A\big )\big ) \mapsto j_{(t,q)} X^A, \end{aligned}$$
(3.7)

where A is a section of \({\mathbb {R}}\times {\mathcal {T}}^E M\) such that \(A_{(t,q)} = A_q\), and \(X^A\in I_{(t,q)}(M)\) is a diffusion process having A as its generator.

The “stochastic target” of \({\mathcal {J}}^S M\), i.e., the trivial bundle \(({\mathbb {R}}\times L^0(\Omega , M), \pi ^S, M)\), is another example of stochastic fibered spaces. Its model bundle is the trivial bundle \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\). The graph of an M-valued stochastic process defined on a random time interval \([0,\tau )\) is a random (local) section of \(({\mathbb {R}}\times L^0(\Omega , M), \pi ^S, {\mathbb {R}})\). The projection of \(\pi _\omega \) on the targets from \({\mathbb {R}}\times L^0(\Omega , M)\) to \({\mathbb {R}}\times M\) is denoted by \({\hat{\pi }}_\omega \).

We may summarize how all these maps fit together by the following diagram:

figure b

When a linear connection is specified on M, one can easily obtain, similarly to (3.6), the following homeomorphism:

$$\begin{aligned} \left\{ j^\nabla _t X: X\in I_t(M), t\in {\mathbb {R}}\right\} \cong {\mathbb {R}}\times L^h(\Omega ; \tau _M), \quad j^\nabla _t X \mapsto \left( t, j^\nabla _{X(t)} (\theta _t X) \right) , \end{aligned}$$

and the following diffeomorphisms:

$$\begin{aligned} \left\{ j^\nabla _{(t,q)} X: X\in I_{(t,q)}(M), t\in {\mathbb {R}}, q\in M \right\}\cong & {} {\mathbb {R}}\times \left\{ j_q^\nabla X: X\in I_{(0,q)}(M), q\in M \right\} \\\cong & {} {\mathbb {R}}\times T M \cong J^1 \pi , \end{aligned}$$

where the first two diffeomorphisms are given by

$$\begin{aligned} j^\nabla _{(t,q)} X \mapsto \left( t, j^\nabla _q (\theta _t X) \right) \mapsto (t, D_\nabla X(t)), \end{aligned}$$

and the last one is due to the classical theory.

3.3 Intrinsic Formulation of SDEs

With the classical machinery of jet structures, it is possible to translate differential equations into algebraic equations on jet bundle (Saunders 1989). In this subsection, we follow this way to formulate intrinsic SDEs.

For a subset S of the model bundle \({\mathbb {R}}\times {\mathcal {T}}^S M\) and \(t\in {\mathbb {R}}\), we denote by \(S_t\) the intersection of S with the fiber \(\{t\} \times {\mathcal {T}}^S M\).

Definition 3.10

A stochastic differential equation on M is a closed embedded submanifold S of the model jet bundle \({\mathbb {R}}\times {\mathcal {T}}^S M\) with \(S_0 \ne \emptyset \). A (local) solution of the stochastic differential equation S is a triple X, \((\Omega ,{\mathcal {F}},{\textbf{P}})\), \(\{{\mathcal {P}}_t\}_{t\ge 0}\), where

  1. (i)

    \((\Omega ,{\mathcal {F}},{\textbf{P}})\) is a probability space, and \(\{{\mathcal {P}}_t\}_{t\ge 0}\) is a past filtration of \({\mathcal {F}}\) satisfying the usual conditions,

  2. (ii)

    \(X = \{X(t)\}_{t\in [0,\tau )}\) is a \(\{{\mathcal {P}}_t\}\)-adapted M-valued diffusion process over \([0,\tau )\), where \(\tau \) is a \(\{{\mathcal {P}}_t\}\)-stopping time, and

  3. (iii)

    almost surely \(j_t X = (t, j_{X(t)} (\theta _t X)) \in S\) for every \(t\in [0,\tau )\).

Remark 3.11

  1. (i)

    The condition that \(S_0 \ne \emptyset \) is just for convenience, in order to set the initial time at \(t=0\).

  2. (ii)

    There is an equivalent way to formulate the solution of a stochastic differential equation S. That is, a (local) solution is a pair \((P,\tau )\), where P is a probability measure on \(({\mathcal {C}},{\mathcal {B}}({\mathcal {C}}),\{{\mathcal {B}}_t\})\) and \(\tau \) is a \(\{{\mathcal {B}}_t\}\)-stopping time, such that for P-almost surely \(\omega \), \(j_t \omega = (t, j_{\omega (t)} (\theta _t \omega )) \in S\) for every \(t\in [0,\tau (\omega ))\).

This definition does not look like the traditional definition of a stochastic differential equation, but we can see the relationship between the two by using coordinates. Since S is a embedded submanifold of \({\mathbb {R}}\times {\mathcal {T}}^S M\), it admits a local defining function in a neighborhood of each of its points (Lee 2013, Proposition 5.16). That is, for a coordinate chart \(({\mathbb {R}}\times U^{(1)}, (t, x^{(1)}))\) of the point \((0,j_q X) \in S_0\), there is a function \(\Theta : {\mathbb {R}}\times U^{(1)} \rightarrow {\mathbb {R}}^K\) where \(K = \dim {\mathcal {T}}^S M - \dim S\), such that \(S\cap ({\mathbb {R}}\times U^{(1)}) = \Theta ^{-1}(0)\) and 0 is a regular value of \(\Theta \). Then, the condition \(j_t X = (t, j_{X(t)} (\theta _t X)) \in S\) before X(t) leaves the neighborhood \(U=\tau ^S_M(U^{(1)})\) reads in local coordinates as

$$\begin{aligned} \Theta (t,x, Dx, Qx)(j_t X) = \Theta (t, X(t), DX(t), QX(t)) = 0, \end{aligned}$$
(3.8)

which defines a general MDE (in terms of mean derivatives). The use of a submanifold S is therefore a way to distinguish the definition of the equation from a definition of its solutions.

As an example, the system of MDEs (2.22) can be rewritten to the form (3.8) by setting the defining function

$$\begin{aligned} \Theta (t, x, Dx, Q x) = \left( D x - {\mathfrak {b}}(t,x), Q x- (\sigma \circ \sigma ^*)(t,x) \right) . \end{aligned}$$
(3.9)

So far we have not done anything but reformulate the basic problem of finding solutions of systems of stochastic differential equations in a more geometrical form, ideally suited to our investigation into symmetry groups thereof.

4 Stochastic Symmetries

The symmetry group of a system of differential equations is the largest local group of transformations acting on the independent and dependent variables of the system with the property that it transform solutions of the system into other solutions (Olver 1998). In the stochastic case, we can proceed analogously.

All methods of this section work in the local case, that is, the vector fields are not necessarily complete and the bundle homomorphisms could be only locally defined.

4.1 Prolongations of Diffusions and Bundle Homomorphisms

Definition 4.1

(Prolongations of diffusions) Let X be an M-valued diffusion process defined on a stopping time interval \([t_0,\tau )\). The prolongation of X is a \(\mathcal T^S M\)-valued process jX defined by, for \(\theta _t\) the shift operator,

$$\begin{aligned} j X(t) = j_{X(t)} (\theta _t X), \quad t\in [t_0,\tau ). \end{aligned}$$

Note that \(j_t X = (t, j_{X(t)} (\theta _t X)) = (t, j X(t))\). Thus the graph of the prolongation process jX is nothing but the random section jX of the stochastic jet space \({\mathcal {J}}^S M\). It is easy to see that if X is an M-valued diffusion process, then jX is a \({\mathcal {T}}^S M\)-valued diffusion process.

Given two smooth manifolds M and N, a bundle homomorphism F from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) is a projectable (or fiber-preserving) smooth map, which means it maps fibers of \(\pi \) to fibers of \(\rho \). Hence, there exist two smooth maps \(F^0:{\mathbb {R}}\rightarrow {\mathbb {R}}\) and \({\bar{F}}:{\mathbb {R}}\times M \rightarrow N\) such that \(F(t,q) = (F^0(t), {\bar{F}}(t,q))\). This leads to \(\rho \circ F = F^0\circ \pi \) which is the original definition of bundle homomorphisms. We denote \(F = (F^0, {\bar{F}})\) and say that F projects to \(F^0\).

The following lemma shows that a bundle homomorphisms has the property that it always transforms diffusions into diffusions. One can find a proof of it in Lemma 4.8 or Corollary A.5.

Lemma 4.2

Given a bundle homomorphism \(F = (F^0, {\bar{F}})\) from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\), where \(F^0\) is a diffeomorphism, for every M-valued diffusion process \(X = \{X(t)\}_{t\in [t_0,\tau )}\), the image of its graph (or its corresponding random local section) \(\{(t,X(t)): t\in [t_0,\tau ) \}\) by F, i.e.,

$$\begin{aligned} \{ F(t, X(t)): t \in [t_0,\tau )\} \end{aligned}$$

is almost surely the graph of a well-defined N-valued diffusion process \({\tilde{X}}\) given by

$$\begin{aligned} {\tilde{X}}(s) = {\bar{F}}\left( (F^0)^{-1}(s), X((F^0)^{-1}(s)) \right) , \quad s \in [F^0(t_0),F^0(\tau )). \end{aligned}$$
(4.1)

As observed in Remark A.6, among all (deterministic) smooth maps from \({\mathbb {R}}\times M\) to \({\mathbb {R}}\times N\), the class of bundle homomorphisms is the only subclass that maps diffusions to diffusions.

Definition 4.3

(Pushforwards of diffusions by bundle homomorphisms) We call the diffusion \({\tilde{X}}\) of Lemma 4.2 the pushforward of X by F, and write \({\tilde{X}} = F\cdot X\). When \(M=N\) and F is a bundle endomorphism on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\), we also call \(F\cdot X\) the transform of X by F.

We now introduce the idea of stochastic prolongation whereby a bundle homomorphism may be extended to act upon the model jet bundle.

Definition 4.4

(Stochastic prolongations of bundle homomorphisms) Let F be a bundle homomorphism from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) projecting to a diffeomorphism \(F^0:{\mathbb {R}}\rightarrow {\mathbb {R}}\). The stochastic prolongation of F is the map \(j F: {\mathbb {R}}\times {\mathcal {T}}^S M \rightarrow {\mathbb {R}}\times {\mathcal {T}}^S N\) defined by

$$\begin{aligned} j F (j_{(t,q)} X) = j_{F(t,q)} (F\cdot X). \end{aligned}$$
(4.2)

It is easy to see from (4.1) that if \(j_{(t,q)} X = j_{(t,q)} Y\), then \(j_{F(t,q)} (F\cdot X) = j_{F(t,q)} (F\cdot Y)\). Therefore, the map jF is well defined. By letting \(F = (F^0, {\bar{F}})\), definition (4.2) can be rewritten in a more evident way:

$$\begin{aligned} j F (t, j_q(\theta _t X)) = \big ( F^0(t), j_{{\bar{F}}(t,q)} \theta _{F^0(t)}(F\cdot X) \big ). \end{aligned}$$
(4.3)

The following properties are easy to check.

Corollary 4.5

  1. (i)

    The map \(jF: \pi _1 \rightarrow \rho _1\) is a bundle homomorphism projecting to \(F^0\).

  2. (ii)

    The map \(jF: \pi _{1,0} \rightarrow \rho _{1,0}\) is a bundle homomorphism projecting to F.

  3. (iii)

    \(j(\textbf{Id}_{{\mathbb {R}}\times M}) = \textbf{Id}_{{\mathbb {R}}\times {\mathcal {T}}^S M}\). Let F and G be two bundle endomorphisms on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) projecting to diffeomorphisms. Then, \(j(F\circ G) = jF \circ jG\).

By virtue of (4.3) and Corollary 4.5.(i), we may write \(jF = (F^0, \overline{jF})\), where \(\overline{jF}: {\mathbb {R}}\times {\mathcal {T}}^S M \rightarrow {\mathcal {T}}^S N\) is the smooth map given by

$$\begin{aligned} \overline{jF} (t, j_q(\theta _t X)) = j_{{\bar{F}}(t,q)} \theta _{F^0(t)}(F\cdot X). \end{aligned}$$
(4.4)

We can also consider the pushforward of the \({\mathcal {T}}^S M\)-valued process jX by the bundle homomorphism jF.

Corollary 4.6

Given a bundle homomorphism \(F: ({\mathbb {R}}\times M, \pi , {\mathbb {R}})\rightarrow ({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) projecting to a diffeomorphism on \({\mathbb {R}}\), and an M-valued diffusion process X, we have

$$\begin{aligned} jF \cdot jX = j(F\cdot X). \end{aligned}$$

Proof

It follows from (4.1), (4.4) and Definition 4.1 that

$$\begin{aligned} \begin{aligned} jF \cdot jX(s)&= \overline{jF}\left( (F^0)^{-1}(s),j X((F^0)^{-1}(s)) \right) \\&= \overline{jF}\left( (F^0)^{-1}(s),j_{X((F^0)^{-1}(s))} (\theta _{(F^0)^{-1}(s)} X) \right) \\&= j_{{\tilde{X}}(s)} (\theta _s {\tilde{X}}) = j{\tilde{X}}(s). \end{aligned} \end{aligned}$$

The result follows. \(\square \)

Now we need to investigate the coordinate representation of jF, in stochastic analysis terms. Before that, we introduce the stochastic version of the notion of total derivatives.

Definition 4.7

(Total mean derivatives) Let f be a smooth real-valued function on \({\mathbb {R}}\times M\). The total mean derivative and total quadratic mean derivative of f are the unique smooth functions \({\textbf{D}}_{\textrm{t}} f\) and \({\textbf{Q}}_{\textrm{t}} f\) defined on \({\mathbb {R}}\times {\mathcal {T}}^S M\), with the property that if \(X\in I_{(t_0,q)}(M)\) is a representative diffusion process of \(j_{(t_0,q)} X\), then

$$\begin{aligned} ({\textbf{D}}_{\textrm{t}} f)(j_{(t_0,q)} X)&= D[f(t_0, X(t_0))], \\ ({\textbf{Q}}_{\textrm{t}} f)(j_{(t_0,q)} X)&= Q[f(t_0, X(t_0))]. \end{aligned}$$

There is an abuse of notations in the above definition. Indeed, the left-hand sides (LHSs) of the above two equations both involve subscripts t, but their RHS’s do not depend on t. Those two equations need to be understood as that functions \({\textbf{D}}_{\textrm{t}} f,{\textbf{Q}}_{\textrm{t}} f\) taking their values on the point \(j_{(t_0,q)} X\in {\mathbb {R}}\times \mathcal T^S M\) equal to the RHS’s.

It is easy to check that the definitions of total mean derivatives are independent of the choice of representative diffusions. By Itô’s formula, we have the following coordinate representation for total mean derivatives in the local chart \(({\mathbb {R}}\times U^{(1)}, (t, x^{(1)}))\) on \({\mathbb {R}}\times {\mathcal {T}}^S M\),

$$\begin{aligned} {\textbf{D}}_{\textrm{t}} f&= \frac{\partial f}{\partial t} + \frac{\partial f}{\partial x^i} D^i x + \frac{1}{2} \frac{\partial ^2 f}{\partial x^j \partial x^k} Q^{jk} x, \nonumber \\ {\textbf{Q}}_{\textrm{t}} f&= \frac{\partial f}{\partial x^j} \frac{\partial f}{\partial x^k} Q^{jk} x. \end{aligned}$$
(4.5)

If a linear connection \(\nabla \) is specified, we can use (3.4) to rewrite \({\textbf{D}}_{\textrm{t}}\) as follows:

$$\begin{aligned} {\textbf{D}}_{\textrm{t}} = \partial _t + D_\nabla ^i x \partial _i + \textstyle {{\frac{1}{2}}} Q^{jk} x \nabla ^2_{\partial _j,\partial _k}. \end{aligned}$$
(4.6)

Lemma 4.8

Let us be given a bundle homomorphism \(F = (F^0, {\bar{F}})\) from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) projecting to a diffeomorphism \(F^0\) and an M-valued diffusion process \(X = \{X(t)\}_{t\in [t_0,\tau )}\). If \({\tilde{X}} = F\cdot X\), then in local coordinates \((t,x^i)\) around \((t_0,q)\) and \((s,y^j)\) around \(F(t_0,q)\),

$$\begin{aligned} (D{\tilde{X}})^{j}(F^0(t))&= ({\textbf{D}}_{\textrm{t}} {\bar{F}}^j) \left( j_{(t, X(t))} X \right) \frac{{\textrm{d}}(F^0)^{-1}}{{\textrm{d}}s}(F^0(t)), \\ (Q{\tilde{X}})^{kl}(F^0(t))&= \left( \frac{\partial {\bar{F}}^k}{\partial x^i}\frac{\partial {\bar{F}}^l}{\partial x^j} \right) \left( t, X(t) \right) (QX)^{ij} \left( t \right) \frac{{\textrm{d}}(F^0)^{-1}}{{\textrm{d}}s}(F^0(t)). \end{aligned}$$

Proof

Assume that the diffusion X can be represented in local coordinates by

$$\begin{aligned} dX^i(t) = {\mathfrak {b}}^i(t,X(t)) dt + \sigma ^i_r(t,X(t)) dW^r(t), \quad X^i(t_0) = x^i(q). \end{aligned}$$

where W is an N-dimensional Brownian motion, so that

$$\begin{aligned} j_t X = (DX(t), QX(t)) = ({\mathfrak {b}}, \sigma \circ \sigma ^*)(t,X(t)). \end{aligned}$$

Let \((s_0,{\tilde{q}})=F(t_0,q) = (F^0(t_0), {\bar{F}}(t_0,q))\). Then,

$$\begin{aligned} X^i((F^0)^{-1}(s))= & {} x^i(q) + \int _{(F^0)^{-1}(s_0)}^{(F^0)^{-1}(s)} {\mathfrak {b}}^i(u,X(u)) {\textrm{d}}u \\{} & {} + \int _{(F^0)^{-1}(s_0)}^{(F^0)^{-1}(s)} \sigma ^i_r(u,X(u)) {\textrm{d}}W^r(u). \end{aligned}$$

Define

$$\begin{aligned} B(s) = \int _0^{(F^0)^{-1}(s)} \sqrt{(F^0)'(u)} {\textrm{d}}W(u). \end{aligned}$$

Then, (Øksendal 2010, Theorem 8.5.7) says that B is an N-dimensional \(\{{\mathcal {F}}_{(F^0)^{-1}(s)}\}\)-Brownian motion, as by a change of variable \(u=(F^0)^{-1}(v)\), we have

$$\begin{aligned} \int _{(F^0)^{-1}(s_0)}^{(F^0)^{-1}(s)} \sigma ^i_r(u,X(u)) {\textrm{d}}W^r(u)= & {} \int _{s_0}^{s} \sigma ^i_r((F^0)^{-1}(v),X((F^0)^{-1}(v))) \\{} & {} \left( \frac{{\textrm{d}}(F^0)^{-1}}{{\textrm{d}}s}(v) \right) ^{\frac{1}{2}} {\textrm{d}}B^r(v). \end{aligned}$$

Therefore,

$$\begin{aligned} \begin{aligned} X^i((F^0)^{-1}(s)) =\,&x^i(q) + \int _{s_0}^{s} {\mathfrak {b}}^i((F^0)^{-1}(v),X((F^0)^{-1}(v))) {\textrm{d}} (F^0)^{-1}(v) \\&+ \int _{s_0}^{s} \sigma ^i_r((F^0)^{-1}(v),X((F^0)^{-1}(v))) \left( \frac{{\textrm{d}}(F^0)^{-1}}{{\textrm{d}}s}(v) \right) ^{\frac{1}{2}} {\textrm{d}}B^r(v). \end{aligned} \end{aligned}$$

Recall that \({\tilde{X}}(s) = {\bar{F}}\left( (F^0)^{-1}(s), X((F^0)^{-1}(s)) \right) \). Using Itô’s formula, we have

$$\begin{aligned} {\tilde{X}}^j(s) =\,&y^j({\tilde{q}}) + \int _{s_0}^{s} \frac{\partial {\bar{F}}^j}{\partial t} \left( (F^0)^{-1}(v), X((F^0)^{-1}(v)) \right) {\textrm{d}}(F^0)^{-1}(v) \\&+ \int _{s_0}^{s} \frac{\partial {\bar{F}}^j}{\partial x^i} \left( (F^0)^{-1}(v), X((F^0)^{-1}(v)) \right) {\textrm{d}}X^i((F^0)^{-1}(v)) \\&+ \frac{1}{2} \int _{s_0}^{s} \frac{\partial ^2{\bar{F}}^j}{\partial x^k \partial x^l} \left( (F^0)^{-1}(v),X((F^0)^{-1}(v)) \right) {\textrm{d}}\langle X^k\circ (F^0)^{-1}, \\ X^l\circ (F^0)^{-1} \rangle (v)=\,&y^j(q) + \int _{s_0}^{s} \left[ \frac{\partial {\bar{F}}^j}{\partial t} + \frac{\partial {\bar{F}}^j}{\partial x^i} {\mathfrak {b}}^i + \frac{1}{2} \frac{\partial ^2{\bar{F}}^j}{\partial x^k \partial x^l} \sigma ^k_r \sigma ^l_r \right] \\&\left( (F^0)^{-1}(v), X((F^0)^{-1}(v)) \right) \frac{{\textrm{d}}(F^0)^{-1}}{{\textrm{d}}s}(v) {\textrm{d}}v \\&+ \int _{s_0}^{s} \left( \frac{\partial {\bar{F}}^j}{\partial x^i} \sigma ^i_r \right) \left( (F^0)^{-1}(v), X((F^0)^{-1}(v)) \right) \\&\left( \frac{{\textrm{d}}(F^0)^{-1}}{{\textrm{d}}s}(v) \right) ^{\frac{1}{2}} {\textrm{d}}B^r(v). \end{aligned}$$

It follows that

$$\begin{aligned} (D{\tilde{X}})^j(s)&= \left[ \frac{\partial {\bar{F}}^j}{\partial t} + \frac{\partial {\bar{F}}^j}{\partial x^i} {\mathfrak {b}}^i + \frac{1}{2} \frac{\partial ^2{\bar{F}}^j}{\partial x^k \partial x^l} \sigma ^k_r \sigma ^l_r \right] \\&\quad \left( (F^0)^{-1}(v), X((F^0)^{-1}(v)) \right) \frac{{\textrm{d}}(F^0)^{-1}}{{\textrm{d}}s}(v) \\&= ({\textbf{D}}_{\textrm{t}}{\bar{F}}^j) \left( j_{((F^0)^{-1}(s), X((F^0)^{-1}(s)))} X \right) \frac{{\textrm{d}}(F^0)^{-1}}{{\textrm{d}}s}(s), \\ (Q{\tilde{X}})^{kl}(s)&= \left( \frac{\partial {\bar{F}}^k}{\partial x^i} \sigma ^i_r \frac{\partial {\bar{F}}^l}{\partial x^j} \sigma ^j_r \right) \left( (F^0)^{-1}(s), X((F^0)^{-1}(s)) \right) \frac{{\textrm{d}}(F^0)^{-1}}{{\textrm{d}}s}(s) \\&= \left( \frac{\partial {\bar{F}}^k}{\partial x^i}\frac{\partial {\bar{F}}^l}{\partial x^j} \right) \left( (F^0)^{-1}(s), X((F^0)^{-1}(s)) \right) (QX)^{ij}\\&\quad \left( (F^0)^{-1}(s) \right) \frac{{\textrm{d}}(F^0)^{-1}}{{\textrm{d}}s}(s). \end{aligned}$$

This completes the proof. \(\square \)

We denote the induced local coordinates on \({\mathcal {T}}^S N\) by \((y^j, D^j y, Q^{kl} y)\). Then, clearly, \(y^j \circ jF = y^j \circ \overline{jF} = y^j \circ F = {\bar{F}}^j\). Now take \(j_{(t,q)} X \in {\mathbb {R}}\times {\mathcal {T}}^S M\). Then,

$$\begin{aligned} D^j y \circ jF\big (j_{(t,q)} X\big )= & {} D^j y \big (j_{F(t,q)} {\tilde{X}}\big ) = (D{\tilde{X}})^j (F^0(t)) \nonumber \\= & {} \big ({\textbf{D}}_{\textrm{t}}{\bar{F}}^j\big ) \big (j_{(t, q)} X\big ) \left( \frac{{\textrm{d}}F^0}{{\textrm{d}}t}(t) \right) ^{-1}, \end{aligned}$$
(4.7)
$$\begin{aligned} Q^{kl} y \circ jF\big (j_{(t,q)} X\big )= & {} Q^{kl} y \big (j_{F(t,q)} {\tilde{X}}\big ) = (Q{\tilde{X}})^{kl}(F^0(t)) \nonumber \\= & {} \left( \frac{\partial {\bar{F}}^k}{\partial x^i}\frac{\partial {\bar{F}}^l}{\partial x^j} \right) (t, X(t))(QX)^{ij}(t) \left( \frac{{\textrm{d}}F^0}{{\textrm{d}}t}(t) \right) ^{-1}. \end{aligned}$$
(4.8)

4.2 Symmetries of SDEs

As an important application of the prolongations of diffusions and bundle homomorphisms, we now study the symmetries of stochastic differential equations. As in classical Lie’s theory of symmetries of ODEs, a symmetry of a stochastic differential equation is a space–time transformation that maps solutions to solutions. But this is not sufficient for the stochastic case. As we have mentioned in Sect. 4.1, the only smooth transformation on \({\mathbb {R}}\times M\) mapping diffusions to diffusions are bundle endomorphisms. Moreover, a solution of stochastic differential equation is always accompanied by a filtration, which will also be altered under space–time transformations. Thus, we have the following definition:

Definition 4.9

(Symmetries) Given a stochastic differential equation \(S\subset {\mathbb {R}}\times \mathcal T^S M\), a symmetry of S is a bundle automorphism F on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) projecting to \(F^0\) such that if \((X, \{{\mathcal {P}}_t\})\) is a solution of S, then so is \((F\cdot X, \{{\mathcal {P}}_{(F^0)^{-1}(s)}\})\).

Using the definitions of stochastic differential equations and pushforwards, we have the following equivalent characterization of symmetries.

Lemma 4.10

Let S be a stochastic differential equation on M. A bundle automorphism F on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) is a symmetry of S, if and only if, whenever \(j_{(t,q)} X \in S\) we have \(j F (j_{(t,q)} X) \in S\), or equivalently, \(j F(S)\subset S\).

Recall that the infinitesimal version of bundle homomorphisms are the so called projectable or fiber-preserving vector fields. More precisely, a vector field V on \({\mathbb {R}}\times M\) is called \(\pi \)-projectable, if the (local) flow (or one-parameter group action) generated by V consists of (local) bundle endomorphisms on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) (cf. Olver 1998, Example 2.22 or Saunders 1989, Proposition 3.2.15). For such a vector field, we define its prolongation to be the infinitesimal generator of the prolongated flow.

Definition 4.11

(Stochastic prolongations of projectable vector fields) Let V be a \(\pi \)-projectable vector field on \({\mathbb {R}}\times M\), with corresponding (local) flow \(\psi = \{\psi _\epsilon \}_{\epsilon \in (-\varepsilon ,\varepsilon )}\). Then, the stochastic prolongation of V, denoted by jV, will be a vector field on the model jet bundle \({\mathbb {R}}\times {\mathcal {T}}^S M\), defined as the infinitesimal generator of the corresponding prolonged flow \(\{j\psi _\epsilon \}_{\epsilon \in (-\varepsilon ,\varepsilon )}\). In other words, jV is a vector field on \({\mathbb {R}}\times {\mathcal {T}}^S M\) defined by

$$\begin{aligned} j V \big |_{j_{(t,q)} X} = \frac{{\textrm{d}}}{{\textrm{d}}\epsilon } \bigg |_{\epsilon =0} (j\psi _\epsilon ) (j_{(t,q)} X), \end{aligned}$$

for any \(j_{(t,q)} X\in {\mathbb {R}}\times {\mathcal {T}}^S M\).

Now we can define infinitesimal versions of symmetries.

Definition 4.12

(Infinitesimal symmetries) Let S be a stochastic differential equation on M. An infinitesimal symmetry of S is a \(\pi \)-projectable vector field V on \({\mathbb {R}}\times M\) whose stochastic prolongation jV is tangent to S.

The following properties follow straightforwardly from definitions.

Lemma 4.13

Given a stochastic differential equation S on M, let V be a complete \(\pi \)-projectable vector field on \({\mathbb {R}}\times M\) and \(\psi = \{\psi _\epsilon \}_{\epsilon \in {\mathbb {R}}}\) be its flow. Then,

  1. (i)

    V is an infinitesimal symmetry of S if and only if \(jV(\Theta ) = 0\) for every local defining function \(\Theta \) of S;

  2. (ii)

    V is an infinitesimal symmetry of S if and only if for each \(\epsilon \in {\mathbb {R}}\), \(\psi _\epsilon \) is a symmetry of S.

4.3 Stochastic Prolongation Formulae

We consider a coordinate chart \(({\mathbb {R}}\times U^{(1)}, (t, x^{(1)}))\) on the model jet bundle \({\mathbb {R}}\times {\mathcal {T}}^S M\), which is induced by the coordinate chart \((U, (x^i))\) on M. A \(\pi \)-projectable vector field V on \({\mathbb {R}}\times M\) has the following local coordinate representation

$$\begin{aligned} V_{(t,q)} = V^0(t) \frac{\partial }{\partial {t}}\bigg |_t + V^i(t,q) \frac{\partial }{\partial {x^i}}\bigg |_q. \end{aligned}$$
(4.9)

Its prolongation jV is a vector field \({\mathbb {R}}\times {\mathcal {T}}^S M\) of the form

$$\begin{aligned} j V\big |_{j_{(t,q)} X}= & {} V^0(t) \frac{\partial }{\partial {t}}\bigg |_t + V^i(t,q) \frac{\partial }{\partial {x^i}}\bigg |_{j_{(t,q)} X} + V^i_1(j_{(t,q)} X) \frac{\partial }{\partial {D^i x}}\bigg |_{j_{(t,q)} X} \\{} & {} + V^{jk}_2(j_{(t,q)} X) \frac{\partial }{\partial {Q^{jk} x}}\bigg |_{j_{(t,q)} X}. \end{aligned}$$

Now we use Lemma 4.8 to compute the coefficients \(V^i_1\)’s and \(V^{jk}_2\)’s.

Theorem 4.14

Suppose V is complete and \(\pi \)-projectable and has the local representation (4.9). Then, in the canonical coordinates \((t, x^{(1)})\), the coefficient functions of its prolongation jV are given by the following formulae:

$$\begin{aligned} V^i_1\big (t, x^{(1)}\big )&= \big ({\textbf{D}}_{\textrm{t}} V^i\big )\big (t,x^{(1)}\big ) - \dot{V}^0(t) D^i x, \end{aligned}$$
(4.10)
$$\begin{aligned} V^{jk}_2\big (t, x^{(1)}\big )&= \frac{\partial V^j}{\partial x^i}(t,x) Q^{ik} x + \frac{\partial V^k}{\partial x^i}(t,x) Q^{ij} x - \dot{V}^0(t) Q^{jk} x. \end{aligned}$$
(4.11)

Proof

Let \(\psi = \{\psi _\epsilon \}_{\epsilon \in {\mathbb {R}}}\) be the flow generated by V. Since V is complete and \(\pi \)-projectable, each \(\psi _\epsilon \) is a bundle endomorphism on \({\mathbb {R}}\times M\) projecting to a diffeomorphism on \({\mathbb {R}}\). Let \(\psi _\epsilon (t, q) = (\psi ^0_\epsilon (t), {\bar{\psi }}_\epsilon (t,q))\). Note that \(\psi ^0_0(t)=t\), \({\bar{\psi }}_0(t,q) = q\) and

$$\begin{aligned} V^0(t) = \frac{{\textrm{d}}}{{\textrm{d}}\epsilon } \bigg |_{\epsilon =0} \psi ^0_\epsilon (t), \quad V^i(t,q) = \frac{{\textrm{d}}}{{\textrm{d}}\epsilon } \bigg |_{\epsilon =0} {\bar{\psi }}^i_\epsilon (t,q). \end{aligned}$$

Let \(X=\{X(t)\}_{t\in [t_0,\tau )}\) be a representative diffusion of \(j_{(t_0,q)} X \in U^{(1)}\). Then, by Lemma 4.2 and Definition 4.4, a representative diffusion of \(j\psi _\epsilon (j_{(t,q)} X)\) is

$$\begin{aligned} {\tilde{X}}_\epsilon (s) = \psi _\epsilon \cdot X(s) = {\bar{\psi }}_\epsilon \left( \big (\psi ^0_\epsilon \big )^{-1}(s), X\big (\big (\psi ^0_\epsilon \big )^{-1}(s)\big ) \right) , \quad s \in \big [\psi ^0_\epsilon (t_0), \psi ^0_\epsilon (\tau )\big ). \end{aligned}$$

Now we apply Lemma 4.8 and take derivatives with respect to \(\epsilon \). Since \(\frac{{\textrm{d}}}{{\textrm{d}}\epsilon }\) commutes with the total mean derivative \({\textbf{D}}_{\textrm{t}}\) as is clear from the coordinate representation, we have

$$\begin{aligned} \begin{aligned} V_1^i (j_{(t,q)} X)&= \frac{{\textrm{d}}}{{\textrm{d}}\epsilon } \bigg |_{\epsilon =0} (D{\tilde{X}}_\epsilon )^i\big (\psi ^0_\epsilon (t)\big )\\&= \frac{{\textrm{d}}}{{\textrm{d}}\epsilon } \bigg |_{\epsilon =0}\left[ \big ({\textbf{D}}_{\textrm{t}} {\bar{\psi }}_\epsilon ^i\big ) \left( j_{(t, X(t))} X \right) \frac{{\textrm{d}}\big (\psi _\epsilon ^0\big )^{-1}}{{\textrm{d}}s}\big (\psi ^0_\epsilon (t)\big ) \right] \\&= {\textbf{D}}_{\textrm{t}} V^i (j_{(t,q)} X) - (DX)^i(t) \dot{V}^0(t). \end{aligned} \end{aligned}$$

Also,

$$\begin{aligned} \begin{aligned} V_2^{kl}(j_{(t,q)} X)&= \frac{{\textrm{d}}}{{\textrm{d}}\epsilon } \bigg |_{\epsilon =0} (Q{\tilde{X}}_\epsilon )^{kl}\big (\psi ^0_\epsilon (t)\big ) \\&= \frac{{\textrm{d}}}{{\textrm{d}}\epsilon } \bigg |_{\epsilon =0} \left[ \left( \frac{\partial {\bar{\psi }}_\epsilon ^k}{\partial x^i}\frac{\partial {\bar{\psi }}_\epsilon ^l}{\partial x^j} \right) \left( t, X(t) \right) (QX)^{ij} \left( t \right) \frac{{\textrm{d}}\big (\psi _\epsilon ^0\big )^{-1}}{{\textrm{d}}s}\big (\psi ^0_\epsilon (t)\big ) \right] \\&= \left( \frac{\partial V^k}{\partial x^i} \delta _j^l + \delta _i^k \frac{\partial V^l}{\partial x^j} \right) (t,X(t)) (QX)^{ij}(t) - \delta _i^k \delta _j^l (QX)^{ij}(t) \dot{V}^0(t) \\&= \frac{\partial V^k}{\partial x^i}(t,q) (QX)^{il}(t) + \frac{\partial V^l}{\partial x^j} (t,q) (QX)^{jk}(t) - (QX)^{kl}(t) \dot{V}^0(t). \end{aligned} \end{aligned}$$

In the induced coordinate system \((t, x^{(1)})= (t, x^i, D^i x, D^{jk}_2 x)\), the last two formulae read as (4.10) and (4.11), respectively. \(\square \)

Stochastic analogs of contact structure on \({\mathbb {R}}\times {\mathcal {T}}^S M\) and Cartan symmetries will be discussed in “Appendix B.” It turns out that the infinitesimal symmetry of the mixed-order Cartan distribution is equivalent to stochastic prolongation formulae of Theorem 4.14.

Applying Theorem 4.14 to the system of mean differential equations (2.22), we have

Corollary 4.15

The complete and \(\pi \)-projectable vector field V in (4.9) is an infinitesimal symmetry of MDEs (2.22) if and only if the coefficients \(V^0\) and \(V^i\)’s satisfy the following “determining equations”:

$$\begin{aligned}&V^0 \frac{\partial {\mathfrak {b}}^i}{\partial t} + V^j \frac{\partial {\mathfrak {b}}^i}{\partial x^j} = \frac{\partial V^i}{\partial t} + \frac{\partial V^i}{\partial x^j} {\mathfrak {b}}^j + \frac{1}{2} \frac{\partial ^2 V^i}{\partial x^j \partial x^k} \sigma ^j_r \sigma ^k_r - \dot{V}^0 {\mathfrak {b}}^i, \nonumber \\&V^0 \frac{\partial \big (\sigma _r^j \sigma _r^k\big )}{\partial t} + V^i \frac{\partial \big (\sigma _r^j \sigma _r^k\big )}{\partial x^i} = \frac{\partial V^j}{\partial x^i} \sigma ^i_r \sigma ^k_r + \frac{\partial V^k}{\partial x^i} \sigma ^i_r \sigma ^j_r - \dot{V}^0 \sigma ^j_r \sigma ^k_r. \end{aligned}$$
(4.12)

Proof

We apply Lemma 4.13.(i) to (3.9) and then use Theorem 4.14, to get

$$\begin{aligned} V^0 \frac{\partial {\mathfrak {b}}^i}{\partial t} + V^j \frac{\partial {\mathfrak {b}}^i}{\partial x^j}&= {\textbf{D}}_{\textrm{t}} V^i - \dot{V}^0 D^i x, \\ V^0 \frac{\partial \big (\sigma _r^j \sigma _r^k\big )}{\partial t} + V^i \frac{\partial \big (\sigma _r^j \sigma _r^k\big )}{\partial x^i}&= \frac{\partial V^j}{\partial x^i} Q^{ik} x + \frac{\partial V^k}{\partial x^i} Q^{ij} x - \dot{V}^0 Q^{jk} x. \end{aligned}$$

Then, we use the coordinate representation (4.5) for the total mean derivative \({\textbf{D}}_{\textrm{t}}\) and plug Eq. (3.9) in; the results follow. \(\square \)

Remark 4.16

In Gaeta and Quintero (1999), the author proved a result similar to Corollary 4.15, with the following equation instead of Eq. (4.12):

$$\begin{aligned} V^0 \frac{\partial \sigma _r^j}{\partial t} + V^i \frac{\partial \sigma _r^j }{\partial x^i} = \frac{\partial V^j}{\partial x^i} \sigma ^i_r - \frac{1}{2} \dot{V}^0 \sigma ^j_r. \end{aligned}$$
(4.13)

By multiplying both sides of (4.13) with \(\sigma _r^k\), and using the symmetry for index jk, one gets easily (4.12). So our determining equations for infinitesimal symmetries are more general than those of Gaeta and Quintero (1999). Basically, the paper (Gaeta and Quintero 1999) concerns symmetries of the Itô equation \(({\mathfrak {b}}, \sigma )\), while we consider symmetries of the diffusion with generator \(({\mathfrak {b}}, \sigma \circ \sigma ^*)\), or equivalently, a weak formulation of SDE. The former symmetries belong to the latter obviously, but not vice versa.

Now given a linear connection \(\nabla \) on M, we define the \(\nabla \)-dependent versions of Definitions 4.1, 4.4 and 4.11. More precisely, for a diffusion X on M, we define its \(\nabla \)-prolongation to be a TM-valued diffusion \(j^\nabla X\) given by \(j^\nabla X(t) = j^\nabla _{X(t)} (\theta _t X)\). For a bundle homomorphism from \(F:({\mathbb {R}}\times M, \pi , {\mathbb {R}})\rightarrow ({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) projecting to a diffeomorphism \(F^0:{\mathbb {R}}\rightarrow {\mathbb {R}}\), the \(\nabla \)-prolongation of F is the map \(j^\nabla F: {\mathbb {R}}\times T M \rightarrow {\mathbb {R}}\times T N\) defined by \(j^\nabla F (j^\nabla _{(t,q)} X) = j^\nabla _{F(t,q)} (F\cdot X)\). The \(\nabla \)-prolongation of V, denoted by \(j^\nabla V\), is defined to be the infinitesimal generator of the corresponding prolonged flow \(\{j^\nabla \psi _\epsilon \}_{\epsilon \in (-\varepsilon ,\varepsilon )}\), so that \(j^\nabla V\) is a vector field on \({\mathbb {R}}\times T M\) and has the form

$$\begin{aligned} j^\nabla V\big |_{j^\nabla _{(t,q)} X} = V^0(t) \frac{\partial }{\partial {t}}\bigg |_t + V^i(t,q) \frac{\partial }{\partial {x^i}}\bigg |_{j^\nabla _{(t,q)} X} + V^i_\nabla \left( j^\nabla _{(t,q)} X\right) \frac{\partial }{\partial {\dot{x}^i}}\bigg |_{j^\nabla _{(t,q)} X}, \end{aligned}$$

for V of the form (4.9). If we denote \({\bar{V}} = V^i \frac{\partial }{\partial {x^i}}\) so that \(V = V^0 + {\bar{V}}\), we have

Corollary 4.17

Under the canonical coordinates \((t, x, \dot{x})\), the coefficient \(V^i_\nabla \) of the \(\nabla \)-prolongation \(j^\nabla V\) are given by:

$$\begin{aligned} V^i_\nabla (t, x,\dot{x})= & {} \left( \partial _t + \dot{x}^j \partial _j \right) V^i(t,x) + \textstyle {{\frac{1}{2}}} Q^{jk}x \left[ \nabla ^2_{\partial _j,\partial _k} {\bar{V}} + R({\bar{V}},\partial _j)\partial _k \right] ^i (t,x) - \dot{V}^0(t) \dot{x}^i, \end{aligned}$$

where R is the curvature tensor.

Proof

By (4.10) and (4.11), we have

$$\begin{aligned} \begin{aligned} V_\nabla ^i (j_{(t,q)} X)&= \frac{{\textrm{d}}}{{\textrm{d}}\epsilon } \bigg |_{\epsilon =0} (D_\nabla {\tilde{X}}_\epsilon )^i\big (\psi ^0_\epsilon (t)\big )\\&= \frac{{\textrm{d}}}{{\textrm{d}}\epsilon } \bigg |_{\epsilon =0} \left[ (D{\tilde{X}}_\epsilon )^i\big (\psi ^0_\epsilon (t)\big ) + \frac{1}{2} \Gamma ^i_{jk}({\tilde{X}}_\epsilon (t)) (Q{\tilde{X}}_\epsilon )^{jk}\big (\psi ^0_\epsilon (t)\big ) \right] \\&= V_1^i (j_{(t,q)} X) + \frac{1}{2} \Gamma ^i_{jk}(X(t)) V_2^{jk} (j_{(t,q)} X) \\&\quad + \frac{1}{2} \frac{\partial \Gamma ^i_{jk}}{\partial x^l}(X(t)) (QX)^{jk}(t) V^l(X(t)) \\&= \left[ \frac{\partial }{\partial t} + \left( (D_\nabla X)^l(t) - \frac{1}{2}\Gamma _{jk}^l(X(t)) (QX)^{jk}(t) \right) \frac{\partial }{\partial x^l} \right. \\&\left. \quad + \frac{1}{2} (QX)^{jk}(t) \frac{\partial ^2}{\partial x^j \partial x^k} \right] V^i (t,X(t)) - (DX)^i(t) \dot{V}^0(t) \\&\quad + \frac{1}{2} \Gamma ^i_{jk}(X(t)) \left[ \frac{\partial V^j}{\partial x^l}(t,(X(t)) (QX)^{kl}(t) \right. \\&\left. \quad + \frac{\partial V^k}{\partial x^m} (t,(X(t)) (QX)^{jm}(t) - (QX)^{jk}(t) \dot{V}^0(t) \right] \\&\quad + \frac{1}{2} \frac{\partial \Gamma ^i_{jk}}{\partial x^l}(X(t)) (QX)^{jk}(t) V^l(t,X(t)) \\&= \left[ \frac{\partial }{\partial t} + (D_\nabla X)^l(t) \frac{\partial }{\partial x^l} \right] V^i (t,X(t)) \\&\quad + \frac{1}{2} (Q_\nabla X)^{jk}(t) \left[ \nabla ^2_{\partial _j,\partial _k} {\bar{V}} + R({\bar{V}},\partial _j)\partial _k \right] ^i (t,X(t)) \\&\quad - (D_\nabla X)^i(t) \dot{V}^0(t). \end{aligned} \end{aligned}$$

The proof is complete. \(\square \)

5 The Second-Order Cotangent Bundle

5.1 Second-Order Covectors

Definition 5.1

(Second-order cotangent space) The second-order cotangent space at \(q\in M\) is the dual vector space of \({\mathcal {T}}^O_q M\), denoted by \({\mathcal {T}}^{S*}_q M\). The pairing of \(\alpha \in {\mathcal {T}}^{S*}_q M\) and \(A\in {\mathcal {T}}^O_q M\) is denoted by \(\langle \alpha , A \rangle \) or \(\alpha (A)\). Elements of \({\mathcal {T}}^{S*}_q M\) are called second-order covectors at q. The disjoint union \({\mathcal {T}}^{S*} M:= \amalg _{q\in M} {\mathcal {T}}^{S*}_q M\) is called the stochastic cotangent bundle of M. The natural projection map from \({\mathcal {T}}^{S*} M\) to M is denoted by \(\tau ^{S*}_M\). A (local or global) smooth section of \({\mathcal {T}}^{S*} M\) is called a second-order covector field or a second-order form.

Dual to the left action (2.11) of \(G_I^d\) on fibers of \({\mathcal {T}}^S M\), \(G_I^d\) will act on those of \({\mathcal {T}}^{S*} M\) from the right.

Lemma 5.2

The stochastic cotangent bundle \(({\mathcal {T}}^{S*} M, \tau ^{S*}_M, M)\) is the fiber bundle dual to \(({\mathcal {T}}^S M, \tau ^S_M, M)\), with structure group \(G_I^d\) acting on the typical fiber \(({\mathbb {R}}^d \times \textrm{Sym}^2({\mathbb {R}}^d))^*\) from the right by

$$\begin{aligned} (p, o)\cdot (g, \kappa ) = (g^* p, \kappa ^* p + (g^*\otimes g^*) o), \end{aligned}$$

for all \((g, \kappa ) \in G_I^d\), \(p\in ({\mathbb {R}}^d)^*\), \(o\in (\textrm{Sym}^2({\mathbb {R}}^d))^*\).

The notion of second-order forms should not be confused with the classical one of 2-forms. There are two basic examples of second-order forms, say, \(d^2 f\) and \(df\cdot dg\), where f and g are given smooth functions on M. They are defined as follows: for \(A\in {\mathcal {T}}^S M\),

$$\begin{aligned} \langle d^2 f, A \rangle := Af, \qquad \langle df\cdot dg, A \rangle := \Gamma _A(f,g) = A(fg) - fAg - gAf, \end{aligned}$$
(5.1)

where \(\Gamma _A\) is the squared field operator defined in (2.8). These notations go back to Schwartz (1984) and Meyer (1981a) (see also Emery 1989, Chapters VI and VII), where the term \(d^2 f\) is called the second differential of f, and the term \(df\cdot dg\) is called the symmetric product of df and dg. Note that in these original references, there is a factor \(\frac{1}{2}\) at the RHS of the definition of \(df\cdot dg\). Here we drop this factor. Obviously, when restricted to TM, the second differential \(d^2f\) is just the differential df but the symmetric product \(df\cdot dg\) vanishes.

The definition of the symmetric product \(df\cdot dg\) yields two properties: \(df\cdot dg\) is symmetric in f and g; and \((df\cdot dg)_q = 0\) if one of \(df_q\) and \(dg_q\) vanishes. These lead to a more general definition for symmetric products of two 1-forms. More precisely, let \(\omega , \eta \in {\mathcal {T}}^*_q M\), then there exist smooth functions f and g on M such that \(\omega = df_q\) and \(\eta = dg_q\). By the preceding property, the second-order covector \((df\cdot dg)_q\) does not depend on the choice of f and g, and we will denote it by \(\omega \cdot \eta \). Now if \(\omega , \eta \) are second-order forms, then their symmetric product is defined pointwisely through \((\omega \cdot \eta )_q = \omega _q \cdot \eta _q\). More formally, we have

Definition 5.3

(Symmetric product, Emery 1989, Chapter VI) There exists a unique fiber-linear bundle homomorphism \(\bullet \) from \(T^* M \otimes T^* M\) to \({\mathcal {T}}^{S*} M\), which is called the symmetric product, such that for all \(\omega , \eta \in T^* M\), \(\bullet (\omega \otimes \eta ) = \omega \cdot \eta \).

It is easy to verify from (5.1) that the local frame, dual to (2.12), for \(({\mathcal {T}}^{S*} M, \tau ^{S*}_M, M)\) over the local chart \((U,(x^i))\) is given by (see also Emery 1989, Chapter VI)

$$\begin{aligned} \left\{ d^2 x^i, \textstyle {\frac{1}{2}} dx^i\cdot dx^i, dx^j\cdot dx^k: 1\le i\le d, 1\le j< k \le d \right\} . \end{aligned}$$

We adopt the convention that \(dx^k\cdot dx^j = dx^j\cdot dx^k\) for all \(1\le j< k \le d\). Under this frame, a second-order covector \(\alpha \in {\mathcal {T}}^{S*}_q M\) has a local expression

$$\begin{aligned} \alpha = \alpha _i d^2 x^i\big |_q + \textstyle {\frac{1}{2}} \alpha _{jk} dx^j \cdot dx^k\big |_q, \end{aligned}$$
(5.2)

where \(\alpha ^{jk}\) is symmetric in jk. The coordinates \((x^i)\) induce a canonical coordinate system on \({\mathcal {T}}^{S*} M\), denoted by \((x^i, p_i, o_{jk})\) and defined by

$$\begin{aligned} x^i(\alpha ) = x^i(q), \quad p_i(\alpha ) = \alpha _i, \quad o_{jk}(\alpha ) = \alpha _{jk}. \end{aligned}$$
(5.3)

for \(\alpha \) in (5.2). Since the coefficients \((\alpha _i)\) do transform like a covector, as indicated in Lemma 5.2, it will cause no ambiguity to retain \((x^i,p_i)\) as canonical coordinates on \(T^* M\). As in classical geometric mechanics (Abraham and Marsden 1978; Holm et al. 2009), we still call the coordinates \((p_i)\) the conjugate momenta. And we shall call the second-order coordinates \((o_{jk})\) the conjugate diffusivities.

The pairing of \(\alpha \) and the second-order vector field A in (2.7) is then

$$\begin{aligned} \langle \alpha , A \rangle = \alpha _i A^i + \alpha _{jk} A^{jk}. \end{aligned}$$

It follows from (5.1) and (2.8) that for smooths functions f and g on M,

$$\begin{aligned} d^2 f = \frac{\partial f}{\partial x^i} d^2 x^i + \frac{1}{2} \frac{\partial ^2 f}{\partial x^j \partial x^k} dx^j\cdot dx^k, \qquad df\cdot dg = \frac{\partial f}{\partial x^i} \frac{\partial g}{\partial x^j} dx^i\cdot dx^j. \end{aligned}$$

More generally, for 1-forms \(\omega \) and \(\eta \) with local expressions \(\omega = \omega _i dx^i\) and \(\eta = \eta _i dx^i\), the symmetric product \(\omega \cdot \eta \) has local expression

$$\begin{aligned} \omega \cdot \eta = \omega _i \eta _j dx^i\cdot dx^j. \end{aligned}$$
(5.4)

Dual to the tangent case, there is indeed a canonical bundle epimorphism \({\hat{\varrho }}^*: ({\mathcal {T}}^{S*} M, \tau ^{S*}_M, M) \rightarrow (T^* M, \tau ^*_M, M)\), given by

$$\begin{aligned} {\hat{\varrho }}^*(\alpha ) = \alpha |_{TM}. \end{aligned}$$

In particular \({\hat{\varrho }}^*(d^2 f) = df\). In local coordinates, \({\hat{\varrho }}^*\) reads as

$$\begin{aligned} {\hat{\varrho }}^*\left( \alpha _i d^2 x^i|_q + \textstyle {\frac{1}{2}} \alpha _{jk} dx^j \cdot dx^k|_q \right) = \alpha _i d x^i|_q, \end{aligned}$$

The map \({\hat{\varrho }}^*\) is well defined since \(\alpha |_{TM}\) is a covector. Clearly, \({\hat{\varrho }}^*\) is also a surjective submersion, so that \({\mathcal {T}}^{S*} M\) is a fiber bundle over \(T^* M\). Occasionally, we will use the notation \({\hat{\varrho }}^*_M\) to indicate the base manifold M.

However, there is no canonical bundle monomorphism from \(T^* M\) to \({\mathcal {T}}^{S*} M\) which is a left inverse of \({\hat{\varrho }}^*\) and linear in fiber. We call such a bundle epimorphism a fiber-linear bundle injection from \(T^* M\) to \(\mathcal T^{S*} M\). Similarly to Proposition 2.11, we also have a connection correspondence property. Namely, if we are given a linear connection \(\nabla \) on M, then it induces a fiber-linear bundle injection from \(T^* M\) to \({\mathcal {T}}^{S*} M\) by

$$\begin{aligned} {\hat{\iota }}^*_\nabla : T^* M \rightarrow {\mathcal {T}}^{S*} M, \quad d x^i|_q \mapsto d^2 x^i|_q + \textstyle {\frac{1}{2}} \Gamma _{jk}^i(q) dx^j \cdot dx^k|_q =: d^\nabla x^i|_q,\qquad \quad \end{aligned}$$
(5.5)

or in local coordinates \({\hat{\iota }}^*_\nabla (x, p) = (x, p, (\Gamma _{jk}^i(x) p_i))\). Any fiber-linear bundle injection from \(T^* M\) to \({\mathcal {T}}^{S*} M\) induces a torsion-free linear connection on M.

Denote by \(\textrm{Sym}^2(T^* M)\) the subbundle of \(T^* M \otimes T^* M\) consisting of all (0, 2)-tensors on M. Then, the symmetric product \(\bullet \), when restricting to \(\textrm{Sym}^2(T^* M)\), is a bundle monomorphism whose image is the kernel of \({\hat{\varrho }}^*\). Conversely, still by the connection correspondence, a linear connection \(\nabla \) induces a fiber-linear bundle epimorphism from \({\mathcal {T}}^{S*} M\) to \(\textrm{Sym}^2(T^* M)\) which is a right inverse of \(\bullet \) and is given by

$$\begin{aligned}{} & {} \varrho ^*_\nabla : {\mathcal {T}}^{S*} M \rightarrow \textrm{Sym}^2(T^* M), \quad \alpha _i d^2 x^i|_q + \textstyle {\frac{1}{2}} \alpha _{jk} dx^j \cdot dx^k|_q \\{} & {} \quad \mapsto \left( \alpha _{jk} - \alpha _i\Gamma _{jk}^i(q)\right) dx^j \otimes dx^k|_q. \end{aligned}$$

We introduce the \(\nabla \)-dependent coordinates \((o_{jk}^\nabla )\) by \(o_{jk}^\nabla (\alpha ) = \alpha _{jk} - \alpha _i \Gamma _{jk}^i(q)\) for \(\alpha \) in (5.2), i.e.,

$$\begin{aligned} o_{jk}^\nabla = o_{jk} - p_i\big (\Gamma _{jk}^i\circ x\big ). \end{aligned}$$
(5.6)

Then, \(\varrho ^*_\nabla (\alpha ) = o_{jk}^\nabla (\alpha ) dx^j \otimes dx^k|_q\) and in particular

$$\begin{aligned} \varrho ^*_\nabla (d^2 f) = \left( \frac{\partial ^2 f}{\partial x^j \partial x^k} - \Gamma _{jk}^i \frac{\partial f}{\partial x^i} \right) dx^j \otimes dx^k = \nabla ^2 f. \end{aligned}$$

The coordinates \((x^i, p_i, o_{jk}^\nabla )\) form a coordinate system on \({\mathcal {T}}^{S*} M\), which we call the \(\nabla \)-canonical coordinate system. The coordinates \((x^i, o_{jk}^\nabla )\) also form a coordinate system on \(\textrm{Sym}^2(T^* M)\) when restricted to it. We will call the coordinates \((o^\nabla _{jk})\) the tensorial conjugate diffusivities.

To sum up, we have the following short exact sequence which is split when a linear connection is provided:

$$\begin{aligned} 0 \longrightarrow \textrm{Sym}^2(T^* M) {\mathop {\longrightarrow }\limits ^{\bullet }} {\mathcal {T}}^{S*} M {\mathop {\longrightarrow }\limits ^{{\hat{\varrho }}^*}} T^* M \longrightarrow 0. \end{aligned}$$
(5.7)

It is easy to check that the bundle homomorphisms \({\hat{\varrho }}^*\), \({\hat{\iota }}^*_\nabla \), \(\bullet \) and \(\varrho ^*_\nabla \) are dual to \(\iota \), \(\varrho _\nabla \), \({\hat{\varrho }}\) and \({\hat{\iota _\nabla }}\) in (2.13), (2.14), (2.15) and (2.16), respectively, so that the short exact sequence (5.7) is dual to (2.17). Similarly to (2.18), we have the following decomposition if a linear connection \(\nabla \) is given,

$$\begin{aligned} {\mathcal {T}}^{S*} M = {\hat{\iota }}^*_\nabla (T^*M) \oplus \bullet \left( \textrm{Sym}^2(T^* M) \right) \cong T^*M \oplus \textrm{Sym}^2(T^*M), \end{aligned}$$

with fiber-wise isomorphism \(\cong \) and first direct sum \(\oplus \), which is given by

$$\begin{aligned} \alpha= & {} \alpha _i d^\nabla x^i|_q + \textstyle {\frac{1}{2}} \left( \alpha _{jk} - \alpha _i\Gamma _{jk}^i(q)\right) dx^j \cdot dx^k|_q \\{} & {} \quad \mapsto \left( \alpha _i d x^i|_q, \left( \alpha _{jk} - \alpha _i\Gamma _{jk}^i(q)\right) dx^j \otimes dx^k|_q \right) . \end{aligned}$$

In particular,

$$\begin{aligned} d^2 f = \partial _i f d^\nabla x^i + \textstyle {{\frac{1}{2}}} \nabla ^2_{\partial _j, \partial _k} f dx^j \cdot dx^k \mapsto (df, \nabla ^2 f). \end{aligned}$$
(5.8)

Similarly to the classical cotangent space, the second-order cotangent space may be defined via germs. To be precise, we denote by \(C_q^\infty (M)\) the set of all germs of smooth functions at \(q\in M\), and define a equivalence relation between germs: \([f]_q, [g]_q \in C_q^\infty (M)\) are equivalent if and only if they have the same Taylor expansion at q higher than order zero and up to order two. Then, one can easily check that there is a one-to-one correspondence between \({\mathcal {T}}^{S*}_q M\) and the quotient space of \(C_q^\infty (M)\) by this equivalence relation. Along this way, we can also observe the following diffeomorphism:

$$\begin{aligned} {\mathcal {T}}^{S*} M\times {\mathbb {R}}\cong J^2{\hat{\pi }}, \end{aligned}$$
(5.9)

by mapping \((d^2 f_q, f(q))\) to \(j^2_q f\), where \(J^2{\hat{\pi }}\) is the classical second-order jet bundle of \((M\times {\mathbb {R}}, {\hat{\pi }}, M)\). This is similar to \(T^*M\times {\mathbb {R}}\) is diffeomorphic to the first-order jet bundle \(J^1{\hat{\pi }}\) (e.g., Geiges 2008, Example 2.5.11 or Saunders 1989, Example 4.1.15). We denote the natural projection maps from \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) to \({\mathbb {R}}\) and from \(T^* M\times {\mathbb {R}}\) to \({\mathbb {R}}\) by \({\hat{\pi }}^2_{0,1}\) and \({\hat{\pi }}^1_{0,1}\), respectively.

The relations and projection maps are integrated into the following commutative diagram:

figure c

Remark 5.4

  1. (i)

    As in Remark 3.5, given a linear connection \(\nabla \), we can obtain a one-to-one correspondence between \((T^*M \oplus \textrm{Sym}^2(T^*M))\times {\mathbb {R}}\) and \(J^2{\hat{\pi }}\) by mapping \((df_q, \nabla ^2 f_q, f(q))\) to \(j^2_q f\). One can find in Dahlqvist et al. (2019) an application of the jet-like structure on \(T^*M \oplus \textrm{Sym}^2(T^*M)\) and higher-order bundles to Martin Hairer’s theory of regularity structures (Hairer 2014).

  2. (ii)

    As we have seen, the product \({\mathbb {R}}\times {\mathcal {T}}^S M\) is the model bundle of the stochastic jet space \({\mathcal {J}}^S M\), while the product \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) is diffeomorphic to the second-order jet bundle \(J^2{\hat{\pi }}\). So, in a way, we can say that the “stochastic” and the “second-order” are dual to each other. This stochastic–second-order duality is somehow analogous to the particle–wave duality in quantum mechanics.

5.2 Second-Order Tangent and Cotangent Maps

Definition 5.5

(Second-order tangent and cotangent maps, Emery 1989, Chapter VI) Let M and N be two smooth manifolds, \(F: M\rightarrow N\) be a smooth map. The second-order tangent map of F at \(q\in M\) is a linear map \(d^2 F_q: {\mathcal {T}}^S_q M \rightarrow {\mathcal {T}}^S_{F(q)} N\) defined by

$$\begin{aligned} d^2 F_q (A) f = A(f\circ F), \quad \text {for } A\in {\mathcal {T}}^S_q M, f\in C^\infty (N). \end{aligned}$$

The second-order cotangent map of F at \(q\in M\) is a linear map \(d^2 F^*_q: {\mathcal {T}}^{S*}_{F(q)}N \rightarrow {\mathcal {T}}^{S*}_q M\) dual to \(d^2 F_q\), that is,

$$\begin{aligned} d^2 F^*_q (\alpha ) (A) = \alpha (d^2 F_q( A)), \quad \text {for } A\in {\mathcal {T}}^S_q M, \alpha \in {\mathcal {T}}^{S*}_{F(q)} N. \end{aligned}$$

The restrictions of \(d^2 F_q\) to \(T_q M\) coincide with the classical tangent map \(d F_q\). But this is not the case for \(d^2 F^*_q\) when restricting to \(T^{*}_{F(q)} N\), since for \(\alpha \in T^*_{F(q)} N\), \(d^2 F^*_q (\alpha )\) is still a linear map on \({\mathcal {T}}^S_q M\). A manifestation of these phenomena may be seen through local coordinates in the following lemma.

Lemma 5.6

Let \((U,(x^i))\) and \((V,(y^j))\) be local coordinate charts around q and F(q), respectively. If

$$\begin{aligned} A = A^i \frac{\partial }{\partial x^i}\bigg |_q + A^{ij} \frac{\partial ^2}{\partial x^i\partial x^j}\bigg |_q \quad \text {and} \quad \alpha = \alpha _i d^2y^i|_{F(q)} + \alpha _{ij} dy^i\cdot dy^j|_{F(q)}. \end{aligned}$$

Then,

$$\begin{aligned} d^2 F_q (A)= & {} (A F^i) \frac{\partial }{\partial y^i}\bigg |_{F(q)} + \Gamma _A(F^i, F^j) \frac{\partial ^2}{\partial y^i\partial y^j}\bigg |_{F(q)}, \\ d^2 F^*_q(\alpha )= & {} \alpha _i d^2F^i|_q + \alpha _{ij} dF^i\cdot dF^j|_q. \end{aligned}$$

Now if \(A\in {\mathcal {T}}_q M\), then all \((A^{ij})\) vanish and thereby so do \(\Gamma _A(F^i, F^j)\)’s. Thus, \(d^2 F_q(A) = (A F^i) \frac{\partial }{\partial y^i}|_{F(q)} = d F_q(A)\). This makes clear that \(d^2 F_q|_{{\mathcal {T}}_{q} M} = d F_q\). But if \(\alpha \in {\mathcal {T}}^*_{F(q)} N\), then \(\alpha ^{ij}\)’s vanish and

$$\begin{aligned} d^2 F^*_q (\alpha ) = \alpha _i d^2F^i|_q = \alpha _i \frac{\partial F^i}{\partial x^j}(q) d^2 x^j|_q + \alpha _i \frac{\partial ^2 F^i}{\partial x^j \partial x^k}(q) dx^j\cdot dx^k|_q, \end{aligned}$$

while \(d F^*_q (\alpha ) = \alpha _i d F^i|_q = \alpha _i \frac{\partial F^i}{\partial x^j}(q) d^2 x^j|_q\). Hence, \(d^2 F^*_q|_{{\mathcal {T}}^*_{F(q)} N} \ne d F^*_q\).

Definition 5.7

(Second-order pushforwards and pullbacks) Let \(F: M\rightarrow N\) be smooth map. The second-order pushforward by F is a bundle homomorphism \(F^S_*: ({\mathcal {T}}^S M, \tau ^S_M, M) \rightarrow ({\mathcal {T}}^S N, \tau ^S_N, N)\) defined by

$$\begin{aligned} F^S_*|_{{\mathcal {T}}^S_q M} = d^2 F_q. \end{aligned}$$

Given a second-order form \(\alpha \) on N, the second-order pullback of \(\alpha \) by F is a second-order form \(F^{S*}\alpha \) on M defined by

$$\begin{aligned} \big (F^{S*}\alpha \big )_q = d^2 F^*_q \left( \alpha _{F(q)} \right) , \quad q\in M. \end{aligned}$$

Let F be a diffeomorphism. The second-order pullback by F is a bundle isomorphism \(F^{S*}: ({\mathcal {T}}^{S*} N, \tau ^{S*}_N, N) \rightarrow ({\mathcal {T}}^{S*} M, \tau ^{S*}_M, M)\) defined by

$$\begin{aligned} F^{S*}|_{{\mathcal {T}}^{S*}_{q'} N} = d^2 F^*_{F^{-1}(q')}. \end{aligned}$$

Given a second-order vector field A on M, the second-order pushforward of A by F is a second-order vector field \(F^S_*A\) on N defined by

$$\begin{aligned} (F^S_*A)_{q'} = d^2 F_{F^{-1}(q')} \left( A_{F^{-1}(q')} \right) , \quad q'\in N. \end{aligned}$$

Clearly, \(F^S_*|_{T M} = F_*\) is the usual pushforward, but \(F^{S*}|_{T^* N} \ne F^*\). The following properties are straightforward.

Lemma 5.8

Let \(F: M\rightarrow N\), \(G:N\rightarrow K\) be two smooth maps. Let A be a second-order vector field on M and fg be two smooth functions on N.

  1. (i)

    \(G^S_*\circ F^S_* = (G\circ F)^S_*\).

  2. (ii)

    If F is a diffeomorphism, then \(((F^S_*A)f)\circ F = A(f\circ F)\).

  3. (iii)

    \(F^{S*}(d^2 f) = d^2 (f\circ F)\), \(F^{S*} (df\cdot dg) = d(f\circ F)\cdot d(g\circ F)\).

5.3 Mixed-Order Tangent and Cotangent Bundles

In this subsection, we will extend the notions of the previous two subsections to the product manifold \({\mathbb {R}}\times M\).

Definition 5.9

The mixed-order tangent bundle of \({\mathbb {R}}\times M\) is the product bundle (Saunders 1989, Definition 1.4.1) \((T {\mathbb {R}}\times {\mathcal {T}}^S M, \tau _{\mathbb {R}}\times \tau ^S_M, {\mathbb {R}}\times M)\). The mixed-order cotangent bundle of \({\mathbb {R}}\times M\) is the product bundle \((T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} M, \tau ^*_{\mathbb {R}}\times \tau ^{S*}_M, {\mathbb {R}}\times M)\). A section of the mixed-order tangent or cotangent bundle is called a mixed-order vector field or mixed-order form, respectively.

The mixed-order tangent and cotangent bundles are dual to each other. The mixed-order tangent (or cotangent) bundle is the bundle that mixes the first-order tangent (or cotangent) bundle in time and the second-order one in space (this is why we use the terminology “mixed-order”). It also matches the fundamental principle of stochastic analysis, whose Itô’s logo is \((dX(t))^2 \sim dt\).

For an M-valued diffusion X with (time-dependent) generator \(A^X\), we call the operator \(\frac{\partial }{\partial {t}} + A^X\) its extended generator. This extended generator is a mixed-order vector field on \({\mathbb {R}}\times M\). Also note that the extended generator \(\frac{\partial }{\partial {t}} + A^X\) of \(X\in I_{t_0}(M)\) can be characterized by the property that for every \(f\in C^\infty ({\mathbb {R}}\times M)\), the process

$$\begin{aligned} f(t,X(t)) - f(t_0,X(t_0)) - \int _{t_0}^t \left( \frac{\partial }{\partial {t}} + A^X \right) f(s,X(s)) {\textrm{d}}s, \quad t\ge t_0, \end{aligned}$$

is a real-valued continuous \(\{{\mathcal {P}}_t\}\)-martingale. In general, a mixed-order vector field A has the following local expression:

$$\begin{aligned} A = A^0 \frac{\partial }{\partial {t}} + A^i \frac{\partial }{\partial {x^i}} + A^{jk} \frac{\partial ^2}{\partial x^j \partial x^k}. \end{aligned}$$

To give an example of mixed-order forms, we consider a smooth function f on \({\mathbb {R}}\times M\) and define in local coordinates

$$\begin{aligned} d^\circ f:= \frac{\partial f}{\partial t} dt + \frac{\partial f}{\partial x^i} d^2 x^i + \frac{1}{2} \frac{\partial ^2 f}{\partial x^j \partial x^k} dx^j\cdot dx^k. \end{aligned}$$

Then, \(d^\circ f\) is a mixed-order form, and we call it the mixed differential of f. Clearly, the pairing of the mixed differential \(d^\circ f\) and a mixed-order vector field A is \(\langle d^\circ f, A \rangle = Af\).

Given a bundle homomorphism from \(F:({\mathbb {R}}\times M, \pi , {\mathbb {R}})\rightarrow ({\mathbb {R}}\times N, \rho , {\mathbb {R}})\), we define its mixed-order tangent map at \((t,q)\in {\mathbb {R}}\times M\) by

$$\begin{aligned} d^\circ F_{(t,q)} = d^2 F_{(t,q)}|_{T_t {\mathbb {R}}\times {\mathcal {T}}^S_q M}: T {\mathbb {R}}\times {\mathcal {T}}^S M|_{(t,q)} \rightarrow T {\mathbb {R}}\times {\mathcal {T}}^S N|_{F(t,q)}. \end{aligned}$$

Its mixed-order cotangent map at \((t,q)\in {\mathbb {R}}\times M\) is defined as the linear map \(d^\circ F^*_{(t,q)}: T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} N|_{F(t,q)} \rightarrow T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} M|_{(t,q)}\) dual to \(d^\circ F_{(t,q)}\). If, moreover, F is a bundle isomorphism, its mixed-order pushforward and pullback, denoted by \(F^R_*\) and \(F^{R*}\), respectively, can be defined in a similar manner to Definition 5.7. We leave their detailed but cumbersome definitions and properties to “Appendix A.1.”

6 Stochastic Hamiltonian Mechanics

6.1 Horizontal Diffusions

In this subsection, we consider a general fiber bundle \((E,\pi _M, M)\) over a manifold M, with fiber dimension n. We first introduce a special class of diffusions on this fiber bundle, which we call horizontal diffusions. They are defined in a similar fashion as the horizontal subspaces in Definition 3.7. Roughly speaking, a horizontal diffusion process on E is a diffusion that is random only “horizontally,” but not on fibers.

Definition 6.1

(Horizontal diffusions on fiber bundles) Let \((E,\pi _M, M)\) be a fiber bundle. A E-valued diffusion process \({\textbf{X}}\) is said to be horizontal, if there exists an M-valued diffusion process X and a smoothly time-dependent section \(\phi =(\phi _t)\) of \(\pi _M\), such that a.s. \({\textbf{X}}(t) = \phi (t, X(t))\) for all t.

The process X in the above definition is just the projection of \({\textbf{X}}\), for \(\pi _M({\textbf{X}}(t)) = \pi _M(\phi (t,X(t))) = X(t)\) a.s.. Since the projection map \(\pi _M\) is smooth, X is still a diffusion process.

Now we are going to define a subclass of “integral processes” for second-order vector fields on E by making use of horizontal diffusions. We use \((x^i,u^\mu )\) for an adapted coordinate system on E (see Saunders 1989, Definition 1.1.5), where we use Greek alphabet to label the coordinates of fibers.

Given a second-order vector field with local expression

$$\begin{aligned} A = A^i \frac{\partial }{\partial {x^i}} + A^\mu \frac{\partial }{\partial {u^\mu }} + A^{jk} \frac{\partial ^2}{\partial x^j \partial x^k} + A^{j\mu } \frac{\partial ^2}{\partial x^j \partial u^\mu } + A^{\mu \nu } \frac{\partial ^2}{\partial u^\mu \partial u^\nu }, \end{aligned}$$
(6.1)

where \(A^i, A^\mu , A^{jk}, A^{j\mu }, A^{\mu \nu }\) are smooth functions in the local chart of E, by a horizontal integral process of A in (6.1) we mean an E-valued horizontal diffusion process \({\textbf{X}}\) such that \({\textbf{X}}\) is an integral process of A in the sense of (2.22), that is, it is determined by the system

$$\begin{aligned} \left\{ \begin{aligned}&(D(x\circ {\textbf{X}}))^i(t) = A^i({\textbf{X}}(t)), \\&(Q(x\circ {\textbf{X}}))^{jk}(t) = 2A^{jk}({\textbf{X}}(t)), \\&(D(u\circ {\textbf{X}}))^\mu (t) = A^\mu ({\textbf{X}}(t)), \\&(Q(x\circ {\textbf{X}}, u\circ {\textbf{X}}))^{j\nu }(t) = A^{j\mu }({\textbf{X}}(t)), \\&(Q(u\circ {\textbf{X}}))^{\mu \nu }(t) = 2A^{\mu \nu }({\textbf{X}}(t)), \end{aligned}\right. \end{aligned}$$
(6.2)

where the expression \(x\circ {\textbf{X}}\) means that the family of coordinate functions \((x^i)\) acts on \({\textbf{X}}\), and so on. Set \({\textbf{X}}(t) = \phi (t, X(t))\) for some time-dependent section \(\phi \) of \(\pi _M\) and M-valued diffusion X. Denote \(\phi ^\mu = u^\mu \circ \phi \). By Itô’s formula, the system (6.2) can be written as

$$\begin{aligned} \left\{ \begin{aligned}&(DX)^i(t) = A^i(\phi (t, X(t))), \\&(QX)^{jk}(t) = 2A^{jk}(\phi (t, X(t))), \\&\bigg ( \frac{\partial }{\partial t} + A^i(\phi (t, X(t))) \frac{\partial }{\partial {x^i}} + A^{jk}(\phi (t, X(t))) \frac{\partial ^2}{\partial x^j\partial x^k}\bigg )\phi ^\mu (t, X(t))\\ {}&\qquad \quad = A^\mu (\phi (t, X(t))) \\&2 A^{jk}(\phi (t, X(t))) \frac{\partial \phi ^\mu }{\partial x^k}(t, X(t)) = A^{j\mu }(\phi (t, X(t))) \\&A^{jk}(\phi (t, X(t))) \frac{\partial \phi ^\mu }{\partial x^j}\frac{\partial \phi ^\nu }{\partial x^k}(t, X(t)) = A^{\mu \nu }(\phi (t, X(t))). \end{aligned}\right. \end{aligned}$$
(6.3)

If X(t) has full support for all t, then the last three equations in (6.3) translate into a system of (possibly degenerate) parabolic equations on E,

$$\begin{aligned} \left\{ \begin{aligned}&\bigg ( \frac{\partial }{\partial t} + A^i(\phi (t, q)) \frac{\partial }{\partial {x^i}} + A^{jk}(\phi (t, q)) \frac{\partial ^2}{\partial x^j\partial x^k} \bigg )\phi ^\mu (t, q) = A^\mu (\phi (t, q)), \\&2 A^{jk}(\phi (t, q)) \frac{\partial \phi ^\mu }{\partial x^k}(t, q) = A^{j\mu }(\phi (t, q)) \\&A^{jk}(\phi (t, q)) \frac{\partial \phi ^\mu }{\partial x^j}\frac{\partial \phi ^\nu }{\partial x^k}(t, q) = A^{\mu \nu }(\phi (t, q)). \end{aligned}\right. \end{aligned}$$
(6.4)

Therefore, under suitable assumptions for the coefficients \(A^i, A^\mu , A^{jk}, A^{j\mu }, A^{\mu \nu }\), Eq. (6.4) is solvable, at least locally, by some time-dependent local section \(\phi = (\phi _t)\) over a time interval [0, T]. Then, plugging \(\phi (t)\) into the first two equations of (6.3), we can find X and hence \({\textbf{X}}\). We call X an projective integral process of A.

6.2 The Second-Order Symplectic Structure on \({\mathcal {T}}^{S*} M\) and Stochastic Hamilton’s Equations

It is well known that the classical cotangent bundle \(T^* M\) has a natural symplectic structure, given by the canonical symplectic form \(\omega _0 = dx^i \wedge dp_i\), where \((x^i,p_i)\) are the canonical local coordinates on \(T^* M\) induced by local coordinates \((x^i)\) on M. Clearly \(\omega _0\) is closed, because it is exact as \(\omega _0 = -d \theta _0\), where \(\theta _0 = p_i dx^i\) is called the Poincaré (or tautological) 1-form.

Now we need to define a similar structure on the second-order cotangent bundle \({\mathcal {T}}^{S*} M\), which is a second-order counterpart of the symplectic structure. Firstly, we adapt the coordinate-free definition of the tautological 1-form to the second-order case.

Definition 6.2

The second-order tautological form \(\theta \) is a second-order form on \({\mathcal {T}}^{S*} M\) defined by

$$\begin{aligned} \theta _{\alpha } = d^2\left( \tau _M^{S*}\right) _{\alpha }^* (\alpha ), \quad \alpha \in {\mathcal {T}}^{S*}_q M. \end{aligned}$$

Under the induced coordinate system on \({\mathcal {T}}^{S*} M\) defined in (5.3), the second-order tautological form \(\theta \) has the following coordinate representation

$$\begin{aligned} \theta = p_i d^2 x^i + \textstyle {\frac{1}{2}} o_{jk} dx^j\cdot dx^k. \end{aligned}$$
(6.5)

We introduce the canonical second-order symplectic form \(\omega \) on \({\mathcal {T}}^{S*} M\) by writing \(\omega = -d^2 \theta \). Although we do not define the exterior differential for second-order forms, we can still take \(d^2\) formally on both sides of (6.5), using Leibniz’s rule and the composition rule \(d\circ d=d^2\) (cf. Meyer 1981b, Section 6.(e)), and forcing \(d^3 = 0\) and \((d^2-)\cdot (d-) = (d-)\cdot (d^2-) = 0\). Then, we get

$$\begin{aligned} \begin{aligned} \omega =\,&d\left( d^2 x^i \wedge d p_i + \textstyle {\frac{1}{2}} dx^j\cdot dx^k \wedge d o_{jk} - p_i d^3 x^i + o_{jk} d^2x^j\wedge dx^k \right) \\ =\,&d^2 x^i \wedge d^2 p_i + \textstyle {\frac{1}{2}} dx^j\cdot dx^k \wedge d^2 o_{jk}. \end{aligned} \end{aligned}$$
(6.6)

We call the pair \(({\mathcal {T}}^{S*} M, \omega )\) a second-order symplectic manifold. The complete axiom system for a second-order differential system \((d,d^2,\wedge ,\cdot )\) is beyond the scope of this paper.

Remark 6.3

In the formal expression \((d\circ d) f=d^2 f\), \(f\in C^\infty (M)\), the two differential operators d at LHS are different. The second d is still de Rham’s exterior differential on M, while the first needs to be understood as the exterior differential on TM by regarding the first differential df as a function on TM. Thus, the complete expression should be \(d_{TM}\circ d_M = d^2\). Along this way, the differential operator \(d_{TM}\) can be extended to a linear transform that maps 1-forms to second-order forms and satisfies Leibniz’s rule, see Emery (1989, Theorem 7.1). We shall denote the linear operator extended from \(d_{TM}\) by \({{\varvec{d}}}\) in order to distinguish. In local coordinates, it acts on a 1-form \(\eta = \eta _i dx^i\) by \({{\varvec{d}}}\eta = \eta _i d^2x^i + \frac{1}{2} \frac{\partial \eta _i}{\partial x^j} dx^i\cdot dx^j\), so that \({\hat{\varrho }}^*({{\varvec{d}}}\eta ) = \eta \) and \(d^2 = {{\varvec{d}}}\circ d\). When a linear connection \(\nabla \) is specified, \({{\varvec{d}}}\eta = \eta _i d^\nabla x^i + \frac{1}{2} \nabla \eta (\partial _i,\partial _j) dx^i\cdot dx^j\) which covers (5.8).

As in the classical case, we have the following property for the second-order tautological form.

Lemma 6.4

The second-order tautological form \(\theta \) is the unique second-order form on \({\mathcal {T}}^{S*} M\) with the property that, for every second-order form \(\alpha \) on M, \(\alpha ^{S*} \theta = \alpha \).

Proof

From Lemma 5.8, we have, for any second-order vector \(A\in {\mathcal {T}}^S_q M\),

$$\begin{aligned} \big \langle \big (\alpha ^{S*} \theta \big )_q, A \big \rangle= & {} \big \langle \theta _{\alpha _q}, d^2 \alpha _q(A) \big \rangle = \big \langle d^2\big (\tau _M^{S*}\big )_{\alpha _q}^* (\alpha _q), d^2 \alpha _q(A) \big \rangle \\= & {} \big \langle \alpha _q, d^2\big (\tau _M^{S*}\big )_{\alpha _q} \circ d^2 \alpha _q(A) \big \rangle = \langle \alpha _q, A \rangle , \end{aligned}$$

since \(\tau _M^{S*} \circ \alpha = \textbf{Id}_M\). \(\square \)

Recall that, in Definition 5.7, we have defined the second-order pullbacks of second-order forms. Now, given a smooth map \({\textbf{F}}: {\mathcal {T}}^{S*} M \rightarrow {\mathcal {T}}^{S*} N\) and a second-order 2-form \(\eta \) on \({\mathcal {T}}^{S*} N\), we may also define the second-order pullback \({\textbf{F}}^{S*} \eta \) of \(\eta \) by \({\textbf{F}}\) by allowing \({\textbf{F}}^{S*}\) to be exchangeable with the symmetric product \(\cdot \) as well as the wedge product \(\wedge \). Then, as a corollary of Lemma 6.4, we have

$$\begin{aligned} \alpha ^{S*} \omega = -d^2\alpha . \end{aligned}$$

Definition 6.5

Let \(\omega \) and \(\eta \) be the canonical second-order symplectic forms on \({\mathcal {T}}^{S*} M\) and \({\mathcal {T}}^{S*} N\), respectively. A bundle homomorphism \({\textbf{F}}: ({\mathcal {T}}^{S*} M, {\hat{\varrho }}^*_M, T^* M) \rightarrow ({\mathcal {T}}^{S*} N, {\hat{\varrho }}^*_N, T^*N)\) is called second-order symplectic or a second-order symplectomorphism if \({\textbf{F}}^{S*} \eta = \omega \).

Theorem 6.6

Let \(F: N\rightarrow M\) be a diffeomorphism. The second-order pullback \(F^{S*}: {\mathcal {T}}^{S*} M \rightarrow {\mathcal {T}}^{S*} N\) by F is second-order symplectic; in fact \((F^{S*})^{S*} \vartheta = \theta \), where \(\vartheta \) is the second-order tautological form on \(\mathcal T^{S*} N\).

Proof

For \(q\in M\), \(\alpha _q\in {\mathcal {T}}^{S*}_q M\) and \(A \in \mathcal T^S_{\alpha _q} T^{S*} M\),

$$\begin{aligned} \begin{aligned} \big \langle \big (F^{S*}\big )^{S*} \vartheta , A \big \rangle&= \big \langle \vartheta , d^2\big (F^{S*}\big )_{\alpha _q} A \big \rangle = \big \langle d^2\big (\tau _N^{S*}\big )_{F^{S*}(\alpha _q)}^* \big (F^{S*}(\alpha _q)\big ), d^2(F^{S*})_{\alpha _q} A \big \rangle \\&= \big \langle F^{S*}(\alpha _q), d^2\big (\tau _N^{S*}\big )_{F^{S*}(\alpha _q)} \circ d^2(F^{S*})_{\alpha _q} A \big \rangle \\&= \big \langle \alpha _q, d^2F_{F^{-1}(q)} \circ d^2\big (\tau _N^{S*}\big )_{F^{S*}(\alpha _q)} \circ d^2(F^{S*})_{\alpha _q} A \big \rangle \\&= \big \langle \alpha _q, d^2\big (\tau _M^{S*}\big )_{\alpha _q} A \big \rangle \\&= \big \langle d^2\big (\tau _M^{S*}\big )^*_{\alpha _q} (\alpha _q), A \big \rangle \\&= \langle \theta _{\alpha _q}, A \rangle , \end{aligned} \end{aligned}$$

where we used the fact that \(F\circ \tau _N^{S*}\circ F^{S*} = \tau _M^{S*}\) in the fourth line. \(\square \)

Clearly, the counterparts of Hamiltonian vector fields on \(T^* M\) are now second-order vector fields on \({\mathcal {T}}^{S*} M\). Remark that for a second-order vector field A on \({\mathcal {T}}^{S*} M\), the form \(A \lrcorner \, \omega \) take values in the cotangent bundle \({\mathcal {T}}^{S*} {\mathcal {T}}^{S*} M\).

Definition 6.7

Let \(H: {\mathcal {T}}^{S*} M \rightarrow {\mathbb {R}}\) be a given smooth function. A second-order vector field \(A_H\) on \({\mathcal {T}}^{S*} M\) satisfying

$$\begin{aligned} A_H \lrcorner \, \omega = d^2 H \end{aligned}$$
(6.7)

is called a second-order Hamiltonian vector field of H. We call the triple \(({\mathcal {T}}^{S*} M, \omega , H)\) a second-order Hamiltonian system. The function H is called the second-order Hamiltonian of the system.

According to (6.7), the second-order vector field \(A_H\) satisfies

$$\begin{aligned} A_H H = d^2H (A_H) = \omega (A_H,A_H) = 0. \end{aligned}$$
(6.8)

The condition (6.7) cannot uniquely determine \(A_H\). It is easy to verify that \(A_H\) is of the general form

$$\begin{aligned} \begin{aligned} A_H =\,&\frac{\partial H}{\partial p_i} \frac{\partial }{\partial {x^i}} - \frac{\partial H}{\partial x^i} \frac{\partial }{\partial {p_i}} + \frac{\partial H}{\partial o_{jk}} \frac{\partial ^2}{\partial x^j \partial x^k} - \left( \frac{\partial ^2 H}{\partial x^j \partial x^k} + C_{jk} \right) \frac{\partial }{\partial {o_{jk}}} \\&+ A_{jk} \frac{\partial ^2}{\partial p_j \partial p_k} + A_{ijkl} \frac{\partial ^2}{\partial o_{ij} \partial o_{kl}} + A^j_k \frac{\partial ^2}{\partial x^j \partial p_k} \\&+ A^j_{kl} \frac{\partial ^2}{\partial x^j \partial o_{kl}} + A_{jkl} \frac{\partial ^2}{\partial p_j \partial o_{kl}}, \end{aligned} \end{aligned}$$
(6.9)

where the coefficients \(C_{jk}, A_{jk}, A_{ijkl}, A^j_k, A^j_{kl}, A_{jkl}\) are smooth functions on local chart satisfying

$$\begin{aligned} C_{jk} \frac{\partial H}{\partial o_{jk}}= & {} A_{jk} \frac{\partial ^2 H}{\partial p_j \partial p_k} + A_{ijkl} \frac{\partial ^2 H}{\partial o_{ij} \partial o_{kl}} + A^j_k \frac{\partial ^2 H}{\partial x^j \partial p_k} \\{} & {} + A^j_{kl} \frac{\partial ^2 H}{\partial x^j \partial o_{kl}} + A_{jkl} \frac{\partial ^2 H}{\partial p_j \partial o_{kl}}, \end{aligned}$$

such that the local expression (6.9) is invariant under the canonical change of coordinates on \({\mathcal {T}}^{S*} M\) induced by a change of coordinates on M, governed by the structure group in Lemma 5.2.

Given such a second-order Hamiltonian vector field of H, its horizontal integral process is a \({\mathcal {T}}^{S*} M\)-valued horizontal diffusion \({\textbf {X}}\) determined by the following MDEs on \({\mathcal {T}}^{S*} M\),

$$\begin{aligned} \left\{ \begin{aligned}&(D (x\circ {\textbf {X}}))^i(t) = \frac{\partial H}{\partial p_i}({\textbf {X}}(t)), \\&(Q (x\circ {\textbf {X}}))^{jk}(t) = 2\frac{\partial H}{\partial o_{jk}}({\textbf {X}}(t)), \\&(D (p\circ {\textbf {X}}))_i(t) = - \frac{\partial H}{\partial x^i}({\textbf {X}}(t)), \\&(D (o\circ {\textbf {X}}))_{jk}(t) = - \left( \frac{\partial ^2 H}{\partial x^j \partial x^k} + C_{jk} \right) ({\textbf {X}}(t)), \\&\left( C_{ij} \frac{\partial H}{\partial o_{ij}}\right) ({\textbf {X}}(t)) = \frac{1}{2} (Q (p\circ {\textbf {X}}))_{jk}(t) \frac{\partial ^2 H}{\partial p_j \partial p_k} ({\textbf {X}}(t)) \\&\qquad \qquad \qquad \qquad \qquad \quad + \frac{1}{2} (Q (o\circ {\textbf {X}}))_{ijkl}(t) \frac{\partial ^2 H}{\partial o_{ij} \partial o_{kl}} ({\textbf {X}}(t)) \\&\qquad \qquad \qquad \qquad \qquad \quad + (Q (x\circ {\textbf {X}}, p\circ {\textbf {X}}))^j_k \frac{\partial ^2 H}{\partial x^j \partial p_k} ({\textbf {X}}(t)) \\&\qquad \qquad \qquad \qquad \qquad \quad + (Q (x\circ {\textbf {X}}, o\circ {\textbf {X}}))^j_{kl} \frac{\partial ^2 H}{\partial x^j \partial o_{kl}} ({\textbf {X}}(t)) \\&\qquad \qquad \qquad \qquad \qquad \quad + (Q (p\circ {\textbf {X}}, o\circ {\textbf {X}}))_{jkl} \frac{\partial ^2 H}{\partial p_j \partial o_{kl}} ({\textbf {X}}(t)), \end{aligned}\right. \end{aligned}$$
(6.10)

or, in coordinates,

$$\begin{aligned} \left\{ \begin{aligned}&D^i x = \frac{\partial H}{\partial p_i}, \\&Q^{jk} x = 2\frac{\partial H}{\partial o_{jk}}, \\&D_i p = - \frac{\partial H}{\partial x^i}, \\&D_{jk} o = - \left( \frac{\partial ^2 H}{\partial x^j \partial x^k} + C_{jk} \right) , \\&C_{ij} \frac{\partial H}{\partial o_{ij}} = \frac{1}{2} Q_{jk} p \frac{\partial ^2 H}{\partial p_j \partial p_k} + \frac{1}{2} Q_{ijkl} o \frac{\partial ^2 H}{\partial o_{ij} \partial o_{kl}} + Q^j_k(x,p) \frac{\partial ^2 H}{\partial x^j \partial p_k} \\&\qquad \qquad \quad + Q^j_{kl}(x,o) \frac{\partial ^2 H}{\partial x^j \partial o_{kl}} + Q_{jkl}(p,o) \frac{\partial ^2 H}{\partial p_j \partial o_{kl}}, \end{aligned}\right. \end{aligned}$$

where \(\big ( x^i, p_i, o_{jk}, D^i x, D_i p, D_{jk} o, Q^{jk} x, Q_{jk} p, Q_{ijkl} o, Q^j_k(x,p), Q^j_{kl}(x,o), Q_{jkl}(p,o) \big )\) are canonical coordinates on \({\mathcal {T}}^S {\mathcal {T}}^{S*} M\). The first and third equations has been conjectured in Zambrini (2015) as stochastic Hamilton’s equations in the Euclidean space, since they have the same form as classical Hamilton’s equations (e.g., Abraham and Marsden 1978, Proposition 3.3.2) except that mean derivative D replaces classical time derivative.

At first glance, one may think that the system (6.10) is underdetermined, as there are fewer equations than unknowns (the number of unknowns is equal to the fiber dimension of \({\mathcal {T}}^S {\mathcal {T}}^{S*} M\)). Besides, we haven not yet given (6.10) initial or terminal data. These will become clear after we make the following observations. Firstly, the first two equations of (6.10) constitute MDEs that are equivalent to an Itô SDE for \(x({\textbf{X}})\) in weak sense, as we have seen in Sect. 2.4. So \(x({\textbf{X}})\) should be assigned an initial value, say,

$$\begin{aligned} \text {Law}((x\circ {\textbf{X}})(0)) = \mu _0, \end{aligned}$$
(6.11)

where \(\mu _0\) is a given probability measure on M. Secondly, in the third and fourth equations of (6.10), only the “drift” information of \(p({\textbf{X}})\) and \(o({\textbf{X}})\) is clear. To overcome the lack of information, we need to assign \(p({\textbf{X}})\) and \(o({\textbf{X}})\) terminal values, say,

$$\begin{aligned} \left\{ \begin{aligned}&(p\circ {\textbf{X}})(T) = p^*(x\circ {\textbf{X}}(T)), \\&(o\circ {\textbf{X}})(T) = o^*(x\circ {\textbf{X}}(T)), \end{aligned}\right. \end{aligned}$$
(6.12)

where \((p^*, o^*)\) is a given second-order form. Therefore, the third and fourth equations are understood as backward SDEs, whose drifts rely on diffusion coefficients via the last equation. The system (6.10) together with boundary values (6.11) and (6.12) could be understood as a (coupled) forward–backward system of SDEs (Yong and Zhou 1999) (where “backward” is taken in a different sense from ours in Sect. 2).

Notice that those forward–backward SDEs are not necessarily solvable (see Yong and Zhou 1999, Proposition 7.5.2 for an example). In order to solve (6.10)–(6.12), we have to take the horizontal condition into consideration, and make some compatibility assumption. More precisely, we set \(X=\tau _M^{S*}({\textbf{X}})\) and

$$\begin{aligned} {\textbf{X}}(t) = \alpha (t, X(t)), \end{aligned}$$
(6.13)

for some time-dependent second-order form \(\alpha \) on M, and denote \(p_i(t,x) = p_i(\alpha (t,x))\) and \(o_{jk}(t,x) = o_{jk}(\alpha (t,x))\), so that \(\alpha (t,x) = (p(t,x), o(t,x))\). Assume that for each \(t\in (0,T)\), X(t) has full support. Then, by applying Itô’s formula, in the same way as in (6.4), the system (6.10) reduces to

$$\begin{aligned} \left\{ \begin{aligned}&\bigg ( \frac{\partial }{\partial t} + \frac{\partial H}{\partial p_j} \frac{\partial }{\partial {x^j}} + \frac{\partial H}{\partial o_{jk}} \frac{\partial ^2}{\partial x^j\partial x^k}\bigg )p_i = - \frac{\partial H}{\partial x^i}, \\&\bigg ( \frac{\partial }{\partial t} + \frac{\partial H}{\partial p_k} \frac{\partial }{\partial {x^k}} + \frac{\partial H}{\partial o_{kl}} \frac{\partial ^2}{\partial x^k\partial x^l}\bigg )o_{ij} = - \left( \frac{\partial ^2 H}{\partial x^i \partial x^j} + C_{ij} \right) , \\&C_{ij} \frac{\partial H}{\partial o_{ij}} = \frac{\partial H}{\partial o_{ij}} \bigg ( \frac{\partial p_k}{\partial x^i} \frac{\partial p_l}{\partial x^j} \frac{\partial ^2 H}{\partial p_k \partial p_l} + \frac{\partial o_{kl}}{\partial x^i} \frac{\partial o_{mn}}{\partial x^j} \frac{\partial ^2 H}{\partial o_{kl} \partial o_{mn}} \\&\qquad + 2 \frac{\partial p_k}{\partial x^i} \frac{\partial ^2 H}{\partial x^j \partial p_k}\, + 2 \frac{\partial o_{kl}}{\partial x^i} \frac{\partial ^2 H}{\partial x^j \partial o_{kl}} + 2 \frac{\partial p_k}{\partial x^i} \frac{\partial o_{lm}}{\partial x^j} \frac{\partial ^2 H}{\partial p_k \partial o_{lm}} \bigg ). \end{aligned}\right. \nonumber \\ \end{aligned}$$
(6.14)

Next, by taking partial derivative \(\frac{\partial }{\partial {x^j}}\) on both sides of the first equation of (6.14) and comparing with the next two, we find the following sufficient condition for the last two equations of (6.14):

$$\begin{aligned} o_{ij}(t, x) = \frac{\partial p_i}{\partial x^j}(t, x) = \frac{\partial p_j}{\partial x^i}(t, x), \end{aligned}$$
(6.15)

or equivalent, for the terminal value \((p^*,o^*)\),

$$\begin{aligned} o^*_{ij}(x) = \frac{\partial p^*_i}{\partial x^j}(x) = \frac{\partial p^*_j}{\partial x^i}(x). \end{aligned}$$
(6.16)

Equation (6.15) implies that \(\alpha \) in (6.13) is “exact,” in the sense that \(\alpha = {{\varvec{d}}}\eta \) for the time-dependent 1-form \(\eta = p_i dx^i\), where \({{\varvec{d}}}\) is the extended differential operator defined in Remark 6.3. Similarly, Eq. (6.16) implies that \((p^*,o^*) = {{\varvec{d}}}\eta ^*\) for 1-form \(\eta ^* = p^*_i dx^i\). The second equality of (6.15) [or (6.16)], called Onsager reciprocity or Maxwell relations (Abraham and Marsden 1978, Section 5.3), implies that the 1-form \(\eta \) (or \(\eta ^*\)) is closed. We will refer to Eq. (6.15) or (6.16) as second-order Maxwell relations.

Under the second-order Maxwell relations, the original stochastic Hamilton’s system (6.10) turns to the following MDE-PDE coupled system.

$$\begin{aligned} \left\{ \begin{aligned}&(DX)^i(t) = \frac{\partial H}{\partial p_i}(X(t), p(t, X(t)), o(t, X(t)) ), \\&(QX)^{jk}(t) = 2\frac{\partial H}{\partial o_{jk}}(X(t), p(t, X(t)), o(t, X(t)) ), \\&\bigg ( \frac{\partial }{\partial t} + \frac{\partial H}{\partial p_j}(x, p(t, x), o(t, x)) \frac{\partial }{\partial {x^j}} + \frac{\partial H}{\partial o_{jk}}(x, p(t, x), o(t, x)) \frac{\partial ^2}{\partial x^j\partial x^k}\bigg )p_i(t, x) \\&\qquad = - \frac{\partial H}{\partial x^i}(x, p(t, x), o(t, x)), \\&o_{ij}(t, x) = \frac{\partial p_i}{\partial x^j}(t, x), \end{aligned}\right. \nonumber \\ \end{aligned}$$
(6.17)

The boundary values in (6.11) and (6.12) now read

$$\begin{aligned} \text {Law}(X(0)) = \mu _0, \quad (p,o)(T) = {{\varvec{d}}}\eta ^*. \end{aligned}$$
(6.18)

We first use the terminal value in (6.18), which satisfies (6.16), to solve the last two PDEs in (6.17). This gives (po) and hence the second-order form \(\alpha \). Then, we plug p and o into the first two MDEs and solve them with initial distribution in (6.18). This yields in law the M-valued diffusion \(X = \tau _M^{S*}({\textbf {X}})\) as a projective integral process of \(A_H\).

We call system (6.10) or (6.17) the stochastic Hamilton’s equations (S-H equations in short). The second-order Maxwell relations are sufficient for the component o of \(\alpha \) in (6.13) to solve the last two equations of (6.10), so we refer to it as an integrability condition of (6.10). When restricting settings to Riemannian manifolds, the S-H equations (6.10) can be simplified to a global Hamiltonian-type system on \(T^*M\), as we will see in Sect. 7.4.2.

Lemma 6.8

Let \(H: {\mathcal {T}}^{S*} M\times {\mathbb {R}}\rightarrow {\mathbb {R}}\) be a time-dependent second-order Hamiltonian, and \({\textbf{X}}\) be a horizontal integral process of \(A_H\). Then, the total mean derivative of H along \({\textbf{X}}\) is

$$\begin{aligned} {\textbf{D}}_{\textrm{t}} H = \frac{\partial H}{\partial t}. \end{aligned}$$

Proof

We use (6.10) and local coordinates to derive

$$\begin{aligned} \begin{aligned} {\textbf{D}}_{\textrm{t}} H&= D[H({\textbf{X}}(t),t)] \\ {}&= \frac{\partial H}{\partial t} + D^i x \frac{\partial H}{\partial x^i} + D_i p \frac{\partial H}{\partial p_i} + D_{jk} o \frac{\partial H}{\partial o_{jk}} + \frac{1}{2} Q^{jk} x \frac{\partial ^2 H}{\partial x^j \partial x^k} \\&\quad + \frac{1}{2} Q_{jk} p \frac{\partial ^2 H}{\partial p_j \partial p_k} + \frac{1}{2} Q_{ijkl} o \frac{\partial ^2 H}{\partial o_{ij} \partial o_{kl}} + Q^j_k(x,p) \frac{\partial ^2 H}{\partial x^j \partial p_k} \\&\quad + Q^j_{kl}(x,o) \frac{\partial ^2 H}{\partial x^j \partial o_{kl}} + Q_{jkl}(p,o) \frac{\partial ^2 H}{\partial p_j \partial o_{kl}} \\&= \frac{\partial H}{\partial t} + D^i x \frac{\partial H}{\partial x^i} + D_i p \frac{\partial H}{\partial p_i} + D_{jk} o \frac{\partial H}{\partial o_{jk}} + \frac{1}{2} Q^{jk} x \frac{\partial ^2 H}{\partial x^j \partial x^k} + C_{ij} \frac{\partial H}{\partial o_{ij}}\\ {}&= \frac{\partial H}{\partial t}. \end{aligned} \end{aligned}$$

The result follows. \(\square \)

In particular, when H is time-independent, we have

$$\begin{aligned} {\textbf{D}}_{\textrm{t}} H = 0, \end{aligned}$$
(6.19)

which is also a consequence of (6.8). Equivalently, H is harmonic with respect to the horizontal integral process \({\textbf{X}}\). In this case, we can say that H is stochastically conserved, or is a stochastic conserved quantity. In particular, the expectation \({\textbf{E}}[H({\textbf{X}})]\) is a constant.

6.3 Two Inspirational Examples

Let M be a Riemannian manifold with Riemannian metric g. Assume for simplicity that M is compact. Let \(\nabla \) be the Levi–Civita connection on TM with Christoffel symbols \((\Gamma ^k_{ij})\). In this subsection, we will consider two types of processes on M, to provide some intuition of our stochastic Hamiltonian formalism.

6.3.1 Diffusion Processes on Riemannian Manifolds

Consider a second-order Hamiltonian H on \({\mathcal {T}}^{S*} M\) with the following coordinate expression:

$$\begin{aligned} H(x,p,o) = b^i(x) p_i - \frac{1}{2} g^{ij}(x) \Gamma _{ij}^k(x) p_k + \frac{1}{2} g^{ij}(x) o_{ij} + F(x), \end{aligned}$$
(6.20)

where b is a given smooth vector field on M and F a smooth function on M. One can easily verify that the expression at RHS of (6.20) is indeed invariant under changes of coordinates. We consider the S-H equations (6.17) subject to boundary conditions \(\text {Law} (X(0)) = \mu _0\) and \((p,o)(T) = d^2 S_T\), where \(\mu _0\) is a given probability distribution and \(S_T\) a given smooth function on M.

By the first two equations of system (6.17), the projection diffusion X satisfies the following MDEs,

$$\begin{aligned} \left\{ \begin{aligned}&(DX)^i(t) = b^i(X(t)) - \frac{1}{2} g^{jk}(X(t)) \Gamma _{jk}^i(X(t)), \\&(QX)^{jk}(t) = g^{jk}(X(t)), \end{aligned}\right. \end{aligned}$$
(6.21)

subject to the initial distribution \(\text {Law} (X(0)) = \mu _0\); or equivalently (according to the end of Sect. 2.4), it can be rewritten as the following Itô SDE in weak sense,

$$\begin{aligned} dX^i(t)= & {} \left[ b^i(X(t)) - \frac{1}{2} g^{jk}(X(t)) \Gamma _{jk}^i(X(t)) \right] dt + \sigma _r^i(X(t)) dW^r(t), \nonumber \\ \text {Law} (X(0))= & {} \mu _0, \end{aligned}$$
(6.22)

where \(\sigma \) is the positive-definite square root (1, 1)-tensor of g, i.e., \(\sum _{r=1}^d \sigma ^i_r\sigma ^j_r = g^{ij}\), W denotes an \({\mathbb {R}}^d\)-valued standard Brownian motion. Note that the Eq. (6.21) are independent of coordinates (po), so they form a closed system on the base manifold M and can be solved independently. Indeed, the solution X is a diffusion on M with generator \(A^X = (b^i - \frac{1}{2} g^{jk} \Gamma _{jk}^i) \partial _i + \frac{1}{2}g^{jk} \partial _j\partial _k = \nabla _b + \frac{1}{2} \Delta \).

Now we consider the last two equations of (6.17). The LHS of the third equation reads

$$\begin{aligned} \bigg [ \frac{\partial }{\partial t} + \left( b^j - \frac{1}{2} g^{kl} \Gamma _{kl}^j \right) \frac{\partial }{\partial {x^j}} + \frac{1}{2} g^{jk} \frac{\partial ^2}{\partial x^j\partial x^k}\bigg ] p_i = \left( \frac{\partial }{\partial t} + \langle b, \nabla \rangle + \frac{1}{2} \Delta \right) p_i, \end{aligned}$$

where \(\langle \cdot ,\cdot \rangle \) denotes the pairing of vectors and covectors, \(\Delta \) is the Laplace–Beltrami operator and \(\nabla \) the gradient, with respect to g. In order to find the solution of the third equation of (6.17), we consider the following linear backward parabolic equation (where “backward” has a meaning different from that in Sect. 2.2)

$$\begin{aligned} \frac{\partial S}{\partial t} + \langle b, \nabla S\rangle + \frac{1}{2} \Delta S + F = 0, \quad t\in [0,T), \end{aligned}$$
(6.23)

with terminal value \(S(T,x) = S_T(x)\). We let

$$\begin{aligned} p_i = \frac{\partial S}{\partial x^i}, \end{aligned}$$
(6.24)

and use (6.23) and (6.15) to derive

$$\begin{aligned} \begin{aligned} -\frac{\partial F}{\partial x^i}&= \frac{\partial }{\partial {x^i}} \left( \frac{\partial S}{\partial t} + \langle b, \nabla S\rangle + \frac{1}{2} \Delta S \right) \\&= \left( \frac{\partial }{\partial t} + \langle b, \nabla \rangle + \frac{1}{2} \Delta \right) p_i \\&\quad + \left( \frac{\partial b^j}{\partial x^i} p_j - \frac{1}{2} \frac{\partial g^{kl}}{\partial x^i} \Gamma _{kl}^j p_j - \frac{1}{2} g^{kl}\frac{\partial \Gamma _{kl}^j}{\partial x^i} p_j + \frac{1}{2} \frac{\partial g^{jk}}{\partial x^i} o_{jk} \right) \\&= \left( \frac{\partial }{\partial t} + \langle b, \nabla \rangle + \frac{1}{2} \Delta \right) p_i + \frac{\partial }{\partial x^i}(H-F), \end{aligned} \end{aligned}$$
(6.25)

which agree with the third equation of (6.17).

Finally, we combine (6.24) with (6.15) to conclude that the horizontal integral process \({\textbf{X}}\) is

$$\begin{aligned} {\textbf{X}}(t) = (p,o)(t,X(t)) = \left( \frac{\partial S}{\partial x^i}, \frac{\partial ^2 S}{\partial x^j\partial x^k} \right) (t,X(t)) = d^2S(t,X(t)). \end{aligned}$$

Example 6.9

(Brownian motions) When \(b\equiv 0\) and \(F\equiv 0\), the second-order Hamiltonian is \(H(x,p,o) = \frac{1}{2} g^{ij}(x) (o_{ij} - \Gamma _{ij}^k(x) p_k)\), the solution process X is a standard Brownian motion on M with initial distribution \(\mu _0\). Such second-order Hamiltonian H can be regarded as a “stochastic deformation” of the trivial classical Hamiltonian \(H_0 = 0\). Indeed, H is the g-canonical lift of \(H_0\) that will be defined in forthcoming Sect. 6.6. Therefore, we may regard Brownian motions as “stochastization” or “stochastic deformation” of trivially constant curves on the base manifold M.

We are going to describe in the next example a dynamical approach to diffusions, elaborated afterward (Sect. 7.3), inspired by Schrödinger.

6.3.2 Reciprocal Processes and Diffusion Bridges on Riemannian Manifolds

With the same coefficients bF and boundary data \(\mu _0,S_T\) in Sect.  6.3.1, we consider the S-H system (6.17) with the following second-order Hamiltonian H on \({\mathcal {T}}^{S*} M\):

$$\begin{aligned} H(x,p,o) = \frac{1}{2} g^{ij}(x) p_ip_j + b^i(x) p_i - \frac{1}{2} g^{ij}(x) \Gamma _{ij}^k(x) p_k + \frac{1}{2} g^{ij}(x) o_{ij} + F(x),\nonumber \\ \end{aligned}$$
(6.26)

subject to boundary conditions \(\text {Law} (X(0)) = \mu _0\) and \((p,o)(T) = d^2 S_T\). Here, b and F are called, respectively, vector and scalar potentials in classical mechanics. Again, it is easy to verify that the expression at RHS of (6.26) is indeed invariant under changes of coordinates.

The LHS of the third equation in (6.17) now reads

$$\begin{aligned}{} & {} \bigg [ \frac{\partial }{\partial t} + \left( g^{jk} p_k + b^j - \frac{1}{2} g^{kl} \Gamma _{kl}^j \right) \frac{\partial }{\partial {x^j}} + \frac{1}{2} g^{jk} \frac{\partial ^2}{\partial x^j\partial x^k}\bigg ] p_i \\{} & {} \quad = \left( \frac{\partial }{\partial t} + p \cdot \nabla + \langle b, \nabla \rangle + \frac{1}{2} \Delta \right) p_i, \end{aligned}$$

In order to find the solution of the third equation of (6.17), we first consider the positive solution of following backward parabolic equation on M

$$\begin{aligned} \frac{\partial u}{\partial t} + \langle b, \nabla u\rangle + \frac{1}{2}\Delta u + F u = 0, \quad t\in [0,T), \end{aligned}$$
(6.27)

with terminal value \(u(T,x) = e^{S_T(x)}\), where \(\langle \cdot , \cdot \rangle \) denotes the Riemannian inner product with respect to g. If we let \(S=\ln u\), then it is easy to verify that S satisfies the following Hamilton–Jacobi–Bellman (HJB) equation

$$\begin{aligned} \frac{\partial S}{\partial t} + \langle b, \nabla S\rangle + \frac{1}{2} |\nabla S|^2 + \frac{1}{2} \Delta S + F = 0, \quad t\in [0,T), \end{aligned}$$
(6.28)

with terminal value \(S(T,x) = S_T(x)\), where \(|\cdot |\) denotes the Riemannian norm with respect to g. Now we let

$$\begin{aligned} p_i = \frac{\partial S}{\partial x^i} = \frac{\partial \ln u}{\partial x^i}, \end{aligned}$$
(6.29)

and use (6.28) and (6.15) to derive, in a way similar to (6.25),

$$\begin{aligned} -\frac{\partial F}{\partial x^i}= & {} \frac{\partial }{\partial {x^i}} \left( \frac{\partial S}{\partial t} + \langle b, \nabla S\rangle + \frac{1}{2} |\nabla S|^2 + \frac{1}{2} \Delta S \right) \\= & {} \left( \frac{\partial }{\partial t} + p \cdot \nabla + \langle b, \nabla \rangle + \frac{1}{2} \Delta \right) p_i + \frac{\partial }{\partial x^i}(H-F), \end{aligned}$$

which agree with the third equation of (6.17). Therefore, the projection diffusion X of the system (6.17) satisfies the following MDEs,

$$\begin{aligned} \left\{ \begin{aligned}&(DX)^i(t) = g^{ij}(X(t)) \frac{\partial \ln u}{\partial x^j}(t, X(t)) + b^i(X(t)) - \frac{1}{2} g^{jk}(X(t)) \Gamma _{jk}^i(X(t)), \\&(QX)^{jk}(t) = g^{jk}(X(t)), \end{aligned}\right. \end{aligned}$$
(6.30)

subject to the initial distribution \(\text {Law} (X(0)) = \mu _0\); or equivalently (according to the end of Sect. 2.4), it can be rewritten as the following Itô SDE in weak sense,

$$\begin{aligned} \left\{ \begin{aligned}&dX^i(t) = \left[ g^{ij}(X(t)) \frac{\partial \ln u}{\partial x^j}(t, X(t)) + b^i(X(t)) - \frac{1}{2} g^{jk}(X(t)) \Gamma _{jk}^i(X(t)) \right] dt \\&\qquad \qquad \quad + \sigma _r^i(X(t)) dW^r(t), \\&\text {Law} (X(0)) = \mu _0, \end{aligned}\right. \end{aligned}$$
(6.31)

where \(\sigma \) is the positive-definite square root (1, 1)-tensor of g, i.e., \(\sum _{r=1}^d \sigma ^i_r\sigma ^j_r = g^{ij}\), W denotes an \({\mathbb {R}}^d\)-valued standard Brownian motion.

The solution process X of (6.31) is called a Bernstein process (Bernstein 1932; Cruzeiro et al. 2000) [or the reciprocal process derived from the M-valued diffusion in (6.22) Jamison (1975)]. The time marginal distribution \(\mu _t\) of X satisfies a Born-type formula \(\mu _t(dx) = u(t,x) v(t,x) dx\) (see, e.g., Zambrini 1986, Corollary 3.3.1 or Cruzeiro and Zambrini 1991, Equations (2.9), (4.6) and (4.8)), where v satisfies the adjoint equation of (6.27). The terminal law of X can be determined in the following way: we first solve (6.27) to get u(0, x), and then find out the initial value for v via \(\mu _0(dx) = u(0,x) v(0,x) dx\) and solve the equation for v to get v(Tx), finally the terminal law of X is given by \(\mu _T(dx) = u(T,x)v(T,x)dx\). In particular, when \(\mu _0 = \delta _{q_1}\) and \(\mu _T = \delta _{q_2}\) for \(q_1, q_2 \in M\), the solution X of (6.31) is the Markovian bridge of the diffusion Y conditioning on ending point \(q_2\) Çetin and Danilova (2016).

Again, we combine (6.29) with (6.15) to conclude that the horizontal integral process \({\textbf{X}}\) is

$$\begin{aligned} {\textbf{X}}(t) = (p,o)(t,X(t)) = \left( \frac{\partial S}{\partial x^i}, \frac{\partial ^2 S}{\partial x^j\partial x^k} \right) (t,X(t)) = d^2S(t,X(t)). \end{aligned}$$
(6.32)

Remark 6.10

  1. (i)

    The derivation of the reciprocal process (6.31) from the diffusion (6.22) was the way chosen by Jamison (1975), inspired by Schrödinger’s original problem (Schrödinger 1932). No geometry or dynamical equations like HJB equation (6.28) was involved by him. Like here, Jamison’s construction was involving only the past (nondecreasing) filtration. The dynamical content dates back to Zambrini (1986), Cruzeiro and Zambrini (1991), Chung and Zambrini (2003), where a reciprocal process was constructed from the only data of a Hamiltonian operator as required by Schrödinger’s original problem, and the future (nonincreasing) filtration was also used to study the time-reversed dynamics. Cf. also Example 6.12 and Sect. 7.3.

  2. (ii)

    Equations (6.30) suggest that the transformation from coordinates (xpo) to coordinates (xDxQx) is not invertible. More precisely, the coordinates \((D^i x)\) are transformed from (xp) but the coordinates \((Q^{jk} x)\) are only related to \((x^i)\). Besides, these two equations have nothing to do with the coordinates \((o_{jk})\). However, if we look at the \(\nabla \)-canonical coordinates \((D^i_\nabla x)\) for (6.30), then

    $$\begin{aligned} (D_\nabla X)^i(t) = g^{ij}(X(t)) p_j(t, X(t)) + b^i(X(t)), \end{aligned}$$

    which indicates that the transform from (xp) to \((x,D_\nabla x)\) is invertible. These will help us establish stochastic Lagrangian mechanics and second-order Legendre transforms, in forthcoming Sect. 7.

  3. (iii)

    As observed in Sect. 2.2, every result presented here has a backward version (in the sense of backward mean derivatives with respect to the future filtration \(\{{\mathcal {F}}_t\}\)). Indeed, two forward–backward SDE systems for Bernstein diffusions on Euclidean space were derived in Cruzeiro and Vuillermot (2015): one is under the past filtration and coincides with ours, whereas the other one is under the future filtration.

There are some special cases which are of independent interests and have been considered in the literature.

Example 6.11

(Brownian (free) reciprocal processes and Brownian bridges) Consider the case where \(b\equiv 0\), \(F\equiv 0\). In this case, Y is a Brownian motion on M, so we call X a Brownian reciprocal process. In particular, the Brownian bridge from \(q_1\) to \(q_2\) of time length \(T>0\) is driven by the Itô SDE (6.31) where \(X(0) = q_1\), \(b\equiv 0\) and u satisfies the backward heat equation (6.27) with \(F\equiv 0\) and final value \(u(T,x) = \delta _{q_2}(x)\). See also Hsu (2002, Theorem 5.4.4). Thus, Brownian bridges are understood as stochastic Hamiltonian flows of the second-order Hamiltonian \(H(x,p,o) = \frac{1}{2} g^{ij}(x) p_ip_j - \frac{1}{2} g^{ij}(x) \Gamma _{ij}^k(x) p_k + \frac{1}{2} g^{ij}(x) o_{ij}\), compared with geodesics as Hamiltonian flows of the classical Hamiltonian \(H_0(x,p) = \frac{1}{2} g^{ij}(x) p_ip_j\) (cf. Abraham and Marsden 1978, Theorem 3.7.1). Here, the second-order Hamiltonian H is the g-canonical lift of \(H_0\). We can also say that Brownian bridges are “stochastization” or “stochastic deformation” of geodesics, cf. Example 6.9. Relations between geodesics and Brownian motions have attracted many studies. For example, one can find various interpolation relations between geodesics and Brownian motions in Angst et al. (2015) and Li (2016).

Example 6.12

(Euclidean quantum mechanics Chung and Zambrini 2003; Albeverio et al. 1989, 2006) It is insightful to consider the case \(M={\mathbb {R}}^d\) and \(b\equiv 0\). The Riemannian metric under consideration is the flat Euclidean one. To catch sight of the analogy with quantum mechanics, we involve the reduced Planck constant \(\hbar \) into the second-order Hamiltonian H of (6.26), so that

$$\begin{aligned} H_\hbar (x,p,o) = \frac{1}{2}|p|^2 + \frac{\hbar }{2} \textrm{tr}\,o + F(x). \end{aligned}$$

The system (6.10) then reads as

$$\begin{aligned}\left\{ \begin{aligned}&(DX)^i(t) = p_i(t,X(t)), \\&(QX)^{jk}(t) = \hbar \delta ^{jk}, \\&D [p_i(t,X(t))] = - \frac{\partial F}{\partial x^i}(X(t)), \\&o_{ik}(t, x) = \frac{\partial p_k}{\partial x^i}(t, x). \end{aligned}\right. \end{aligned}$$

Note that the first three equations form a sub-system and can be solved separately, as they are independent of the coordinates \(o_{ij}\)’s. Equation (6.27) and its adjoint now reduce to the following \(\hbar \)-dependent backward and forward heat equations, respectively,

$$\begin{aligned} \hbar \frac{\partial u}{\partial t} + \frac{\hbar ^2}{2}\Delta u + F u = 0, \qquad -\hbar \frac{\partial v}{\partial t} + \frac{\hbar ^2}{2}\Delta v + F v = 0, \end{aligned}$$

which together with the Born-type formula \(\mu _t(dx) = u(t,x) v(t,x) dx\) display the strong analogy to quantum mechanics Zambrini (1986).

The function \(S=\hbar \ln u\) solves the following \(\hbar \)-dependent HJB equation:

$$\begin{aligned} \frac{\partial S}{\partial t} + \frac{1}{2} |\nabla S|^2 + \frac{\hbar }{2} \Delta S + F = 0. \end{aligned}$$

The first three equations then can be solved by letting \(p = \nabla S\). The first and third equations imply a Newton-type equation

$$\begin{aligned} DDX(t) = -\nabla F(X(t)). \end{aligned}$$
(6.33)

This is indeed the equation of motion of the Euclidean version of quantum mechanics, which was the original motivation of Schrödinger in his well-known problem to be discussed in Sect. 7.3. See Chung and Zambrini (2003, p. 158) and Zambrini (2015, Eq. (4.17)) for more. Note that Chung and Zambrini (2003) and Zambrini (2015) used \(V=-F\) to denote the physical scalar potential and used the relation \(S= -\hbar \ln u\) and \(p = -\nabla S\) to formulate the HJB equation from backward heat equation in the case of nondecreasing (past) filtration.

There are two special cases of which more will be studied later.

  1. (i)

    When \(d=1\) and \(F(x) = \frac{1}{2} x^2\), i.e., \(H(x,p,o) = \frac{1}{2}(p^2 + x^2)+ \frac{1}{2}o\), we call its projective integral process X the (forward) stochastic harmonic oscillator. It is a stochastization of the classical harmonic oscillator with Hamiltonian \(H_0(x,p) = \frac{1}{2}(p^2 + x^2)\) (Abraham and Marsden 1978, Example 5.2.3). Likewise, here H is the canonical lift \(H_0\), see Sect. 6.6.

  2. (ii)

    When \(d=1\) and \(F(x) = -\frac{1}{2} x^2\), i.e., \(H(x,p,o) = \frac{1}{2}(p^2 - x^2)+ \frac{1}{2}o\), we call it the (forward) Euclidean harmonic oscillator.

6.4 The Mixed-Order Contact Structure on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\)

In the later subsections we will investigate time-dependent systems. The proper space for consideration is now \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\). Recall in (5.9) that \({\mathcal {T}}^{S*} M\times {\mathbb {R}}= J^2{\hat{\pi }}\), where the latter is the second-order jet bundle of \((M\times {\mathbb {R}}, {\hat{\pi }}, M)\).

In classical differential geometry, the first-order jet bundle \(J^1{\hat{\pi }} = T^* M\times {\mathbb {R}}\) can be equipped with an exact contact structure in several ways (Abraham and Marsden 1978, Section 5.1). Among others, the canonical symplectic form \(\omega _0\) on \(T^* M\) corresponds to a contact structure on \(J^1{\hat{\pi }}\) via \({\tilde{\omega }}_0 = {\hat{\pi }}^* \omega _0\), which is indeed exact as \({\tilde{\omega }}_0 = -d\tilde{\theta }_0\) for \({\tilde{\theta }}_0 = dt + {\hat{\pi }}^*\theta _0\). Another commonly used contact structure is the Poincaré–Cartan form \(\omega ^0_{H_0} = {\tilde{\omega }}_0 + dH_0\wedge dt\) for a given function \(H_0\in C^\infty (J^1{\hat{\pi }})\). It is also exact as \(\omega ^0_{H_0} = - d\theta ^0_{H_0}\) where \(\theta ^0_{H_0} = {\hat{\pi }}^*\theta _0 - H_0dt\). The advantage of the Poincaré–Cartan form, compared with the contact form \(\omega _0\), is that it can be related to the (time-dependent) Hamiltonian vector field \(V_{H_0}\) on \(T^* M\) of \({H_0}\). More precisely, the vector field \(\tilde{V}_{H_0} = \frac{\partial }{\partial {t}} + V_{H_0}\), treated as a vector field on \(J^1{\hat{\pi }}\) and called the characteristic vector field of \(\omega ^0_{H_0}\), is the unique vector field satisfying \(\tilde{V}_{H_0} \lrcorner \, \omega ^0_{H_0} = 0\) and \(\tilde{V}_{H_0}\lrcorner \, dt=1\).

Now we proceed in a similar way for the second-order jet bundle \(J^2{\hat{\pi }}\). Define

$$\begin{aligned} {\tilde{\omega }} = {\hat{\pi }}^{S*} \omega \quad \text {and}\quad {\tilde{\theta }} = dt + {\hat{\pi }}^{S*}\theta . \end{aligned}$$

Then, \({\tilde{\omega }} = -d{\tilde{\theta }}\). We call the pair \((J^2{\hat{\pi }}, {\tilde{\omega }})\) a second-order contact manifold and the pair \((J^2{\hat{\pi }}, {\tilde{\theta }})\) a mixed-order exact contact manifold. In local coordinates, \({\tilde{\omega }}\) has the same expression as \(\omega \) in (6.6), but we stress that it is now a second-order form on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\). The form \({\tilde{\theta }}\) has the local expression

$$\begin{aligned} {\tilde{\theta }} = dt + p_i d^2 x^i + \textstyle {\frac{1}{2}} o_{jk} dx^j\cdot dx^k. \end{aligned}$$

This makes clear that \({\tilde{\theta }}\) is a mixed-order form on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\).

A time-dependent second-order Hamiltonian H is a smooth function on \(J^2{\hat{\pi \cong }} {\mathcal {T}}^{S*} M\times {\mathbb {R}}\). The second-order Hamiltonian vector field \(A_H\) of H is now a time-dependent second-order vector field on \({\mathcal {T}}^{S*} M\), its horizontal integral process share the same equations as (6.10) or (6.17), only with H explicitly depending on time. Define a mixed-order vector field \({\tilde{A}}_H\) on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) by

$$\begin{aligned} {\tilde{A}}_H:= A_H + \frac{\partial }{\partial {t}}, \end{aligned}$$

where \(A_H\) is a second-order Hamiltonian vector field of the form (6.9). We call \({\tilde{A}}_H\) the extended second-order Hamiltonian vector field of H.

We define the second-order counterpart of Poincaré–Cartan form by

$$\begin{aligned} \omega _H:= {\tilde{\omega }} + d^\circ H\wedge dt = d^2 x^i \wedge d^2 p_i + \textstyle {\frac{1}{2}} dx^j\cdot dx^k \wedge d^2 o_{jk} + d^2 H\wedge dt, \end{aligned}$$

and call it the mixed-order Poincaré–Cartan form on \(\mathcal T^{S*} M\times {\mathbb {R}}\). It is exact in the sense that \(\omega _H = - d^\circ \theta _H\), where \(\theta _H = {\hat{\pi }}^{S*}\theta - Hdt = p_i d^2 x^i + \textstyle {\frac{1}{2}} o_{jk} dx^j\cdot dx^k - Hdt\).

The following lemma gives the relations between \(\omega _H\) and \({\tilde{A}}_H\).

Lemma 6.13

The class of extended second-order Hamiltonian vector fields \(\tilde{A}_H\) is the unique class of mixed-order vector fields on \(\mathcal T^{S*} M\times {\mathbb {R}}\) satisfying

$$\begin{aligned} {\tilde{A}}_H \lrcorner \, \omega _H = 0 \quad \text {and}\quad \tilde{A}_H\lrcorner \, dt=1. \end{aligned}$$

Proof

Firstly, we show that \({\tilde{A}}_H\) satisfies the two equalities. The second equality is trivial. For the first one, we pick a mixed-order vector field B on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\); then,

$$\begin{aligned} \begin{aligned} \omega _H({\tilde{A}}_H, B)&= {\tilde{\omega }}({\tilde{A}}_H, B) + d^\circ H({\tilde{A}}_H) dt(B) - dt({\tilde{A}}_H) d^\circ H(B) \\&= \omega \big (A_H, {\hat{\pi }}^{S}_*(B)\big ) + \left[ d^\circ H(A_H) + d^\circ H\big (\textstyle {\frac{\partial }{\partial {t}}}\big ) \right] dt(B) - d^\circ H(B) \\&= d^2 H({\hat{\pi }}^{S}_*(B)) + \textstyle {\frac{\partial H}{\partial t}} dt(B) - d^\circ H(B) \\&= 0. \end{aligned} \end{aligned}$$

To prove the uniqueness, it suffices to show that any mixed-order vector field A on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) satisfying \(A \lrcorner \, \omega _H = 0\) is a multiplier of \({\tilde{A}}_H\). Suppose that A has the local expression

$$\begin{aligned} \begin{aligned} A =\,&A^0 \frac{\partial }{\partial {t}} + A^i \frac{\partial }{\partial {x^i}} + A_i \frac{\partial }{\partial {p_i}} + A^{jk} \frac{\partial ^2}{\partial x^j \partial x^k} + A^2_{jk} \frac{\partial }{\partial {o_{jk}}} \\&+ A^{11}_{jk} \frac{\partial ^2}{\partial p_j \partial p_k} + A_{ijkl} \frac{\partial ^2}{\partial o_{ij} \partial o_{kl}} + A^j_k \frac{\partial ^2}{\partial x^j \partial p_k} + A^j_{kl} \frac{\partial ^2}{\partial x^j \partial o_{kl}} + A_{jkl} \frac{\partial ^2}{\partial p_j \partial o_{kl}}. \end{aligned} \end{aligned}$$

Then, it follows that

$$\begin{aligned} \begin{aligned} 0 = A \lrcorner \, \omega _H =&A^i d^2p_i - A_i d^2 x^i + A^{jk} d^2o_{jk} - \textstyle {\frac{1}{2}} A^2_{jk} dx^j\cdot dx^k \\&+ \text {terms}\left( A^{11}_{jk}, A_{ijkl}, A^j_k, A^j_{kl}, A_{jkl} \right) \\&- A^0 \left( \frac{\partial H}{\partial x^i} d^2 x^i + \frac{\partial H}{\partial p_i} d^2 p_i + \frac{\partial H}{\partial o_{jk}} d^2 o_{jk} + \frac{1}{2} \frac{\partial ^2 H}{\partial x^j \partial x^k} dx^j\cdot dx^k + \cdots \right) \\&+ \left( A^i \frac{\partial H}{\partial x^i} + A_i \frac{\partial H}{\partial p_i} + + A^{jk} \frac{\partial ^2H}{\partial x^j \partial x^k} + A^2_{jk} \frac{\partial H}{\partial o_{jk}} + \cdots \right) dt. \end{aligned} \end{aligned}$$

The vanishing of each coefficient gives

$$\begin{aligned} A^i = A^0 \frac{\partial H}{\partial p_i}, \quad A_i = -A^0 \frac{\partial H}{\partial x^i}, \quad A^{jk} = A^0 \frac{\partial H}{\partial o_{jk}}, \quad A^2_{jk} = -A^0 \left( \frac{\partial ^2 H}{\partial x^j \partial x^k} + \cdots \right) , \quad \cdots . \end{aligned}$$

Therefore, \(A = A^0 {\tilde{A}}_H\). \(\square \)

6.5 Canonical Transformations and Hamilton–Jacobi–Bellman Equations

Let us study the second-order analogs of canonical transformations and their generating functions. To do so, we need to find a change of coordinates from \((x^i, p_i, o_{jk},t)\) to \((y^i, P_i, O_{jk}, s)\) that preserves the form of stochastic Hamilton’s equations (6.10) (with time-dependent second-order Hamiltonian). More precisely, we have the following definition of canonical transformations between mixed-order contact structures, which is adapted from those between classical contact structures in Asorey et al. (1983).

Definition 6.14

Let \(({\mathcal {T}}^{S*} M\times {\mathbb {R}}, {\tilde{\omega }})\) and \((\mathcal T^{S*} N\times {\mathbb {R}}, {\tilde{\eta }})\) be two second-order contact manifolds corresponding to second-order tautological forms \(\theta \) and \(\vartheta \). A bundle isomorphism \({\textbf{F}}: ({\mathcal {T}}^{S*} M\times {\mathbb {R}}, {\hat{\pi }}_{2,1}, T^* M\times {\mathbb {R}})\rightarrow ({\mathcal {T}}^{S*} N\times {\mathbb {R}}, {\hat{\rho }}_{2,1}, T^* N\times {\mathbb {R}})\) is called a canonical transformation if its projection \({\mathbb {F}}\) is a bundle isomorphism from \((T^* M\times {\mathbb {R}}, {\hat{\pi }}^1_{0,1}, {\mathbb {R}})\) to \((T^* N\times {\mathbb {R}}, {\hat{\rho }}^1_{0,1}, {\mathbb {R}})\) projecting to \(F^0:{\mathbb {R}}\rightarrow {\mathbb {R}}\), and there is a function \(H_{{\textbf{F}}} \in C^\infty ({\mathcal {T}}^{S*} M\times {\mathbb {R}})\) such that

$$\begin{aligned} {\textbf{F}}^{R*}{\tilde{\eta }} = \omega _{H_{{\textbf{F}}}}, \end{aligned}$$
(6.34)

where \(\omega _{H_{{\textbf{F}}}} = {\tilde{\omega }} + d^\circ H_{\textbf{F}} \wedge dF^0\).

The map \({\textbf{F}}\) in the definition is also a bundle isomorphism from \(({\mathcal {T}}^{S*} M\times {\mathbb {R}}, {\hat{\pi }}^2_{0,1}, {\mathbb {R}})\) to \(({\mathcal {T}}^{S*} N\times {\mathbb {R}}, {\hat{\rho }}^2_{0,1}, {\mathbb {R}})\) projecting to \(F^0\). Hence, we may assume \({\textbf{F}}(\alpha _q, t) = (\bar{\textbf{F}}(\alpha _q, t), F^0(t))\) for all \((\alpha _q, t) \in {\mathcal {T}}^{S*} M\times {\mathbb {R}}\), where \(\bar{{\textbf{F}}}\) is a smooth map from \(\mathcal T^{S*} M\times {\mathbb {R}}\) to \({\mathcal {T}}^{S*} N\). For each \(t\in {\mathbb {R}}\), we define a map \(\bar{{\textbf{F}}}_t: {\mathcal {T}}^{S*} M \rightarrow \mathcal T^{S*} N\) by \(\bar{{\textbf{F}}}_t(\alpha _q) = \bar{\textbf{F}}(\alpha _q,t)\). We also introduce an injection \(\jmath _t: \mathcal T^{S*} M \rightarrow {\mathcal {T}}^{S*} M\times {\mathbb {R}}\) by \(\jmath _t(\alpha _q) = (\alpha _q,t)\). Then, we have \(\bar{{\textbf{F}}}_t = {\hat{\rho }}_{1,1} \circ {\textbf{F}} \circ \jmath _t\).

Lemma 6.15

The map \(\bar{{\textbf{F}}}_t\) is second-order symplectic for each \(t\in {\mathbb {R}}\) if and only if there is a mixed-order form \(\alpha \) on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) such that

$$\begin{aligned} {\textbf{F}}^{R*}{\tilde{\eta }} = {\tilde{\omega }} + \alpha \wedge dt. \end{aligned}$$

In particular, condition (6.34) implies that each \(\bar{{\textbf{F}}}_t\) is a second-order symplectomorphism.

Proof

The sufficiency follows from

$$\begin{aligned} \begin{aligned} (\bar{{\textbf{F}}}_t)^{S*} \eta&= (\jmath _t)^{R*} \circ {\textbf{F}}^{R*} \circ ({\hat{\rho }}_{1,1})^{S*} \eta = (\jmath _t)^{R*} \circ {\textbf{F}}^{R*} {\tilde{\eta }} \\&= (\jmath _t)^{R*} {\tilde{\omega }} + (\jmath _t)^{R*} \alpha \wedge (\jmath _t)^{R*} dt = \omega + (\jmath _t)^{R*} \alpha \wedge 0 = \omega . \end{aligned} \end{aligned}$$

For the necessity, we observe that

$$\begin{aligned} (\jmath _t)^{R*}({\textbf{F}}^{R*}{\tilde{\eta }} - {\tilde{\omega }}) = (\bar{{\textbf{F}}}_t)^{S*} \eta - \omega = 0. \end{aligned}$$

So we can write \({\textbf{F}}^{R*}{\tilde{\eta }} - {\tilde{\omega }} = \alpha \wedge dt + \gamma \), where \(\gamma \) is a mixed-order form which does not involve dt. This leads to \(\gamma = ({\hat{\pi }}_{1,1})^{R*} \circ (\jmath _t)^{R*}\gamma = ({\hat{\pi }}_{1,1})^{R*} \circ (\jmath _t)^{R*}({\textbf{F}}^{R*}{\tilde{\eta }} - {\tilde{\omega }} - \alpha \wedge dt) = 0\). The result follows. \(\square \)

The following lemma gives some equivalent statements to the condition (6.34).

Lemma 6.16

Condition (6.34) is equivalent to the following:

  1. (i)

    \({\textbf{F}}^{R*}{\tilde{\vartheta }} - {\tilde{\theta }} + H_{{\textbf{F}}} dF^0\) is mixed-order closed;

  2. (ii)

    for all \(K\in C^\infty ({\mathcal {T}}^{S*} N\times {\mathbb {R}})\), \({\textbf{F}}^{R*} \eta _K = \omega _H\);

  3. (iii)

    for all \(K\in C^\infty ({\mathcal {T}}^{S*} N\times {\mathbb {R}})\), \({\textbf{F}}^{R}_* {\tilde{A}}_H = {\tilde{A}}_K\);

where \(H = (K\circ {\textbf{F}} + H_{{\textbf{F}}})\dot{F}^0\).

Proof

The equivalence between (6.34) and (i) is clear. For (6.34\(\Rightarrow \) (ii), since \({\textbf{F}}\) projects to \(F^0\),

$$\begin{aligned} \begin{aligned} {\textbf{F}}^{R*} \eta _K&= {\textbf{F}}^{R*} {\tilde{\eta }} + d^\circ (K\circ {\textbf{F}})\wedge d(t\circ {\textbf{F}}) = {\tilde{\omega }} + d^\circ H_{{\textbf{F}}} \wedge dF^0 + d^\circ (K\circ {\textbf{F}})\wedge dF^0 \\&= {\tilde{\omega }} + d^\circ H \wedge dt = \omega _H. \end{aligned} \end{aligned}$$

The converse (ii) \(\Rightarrow \) (6.34) is straightforward by letting \(K\equiv 0\). To show (ii) \(\Rightarrow \) (iii), by applying Lemma 6.13, it suffices to prove that

$$\begin{aligned} {\textbf{F}}^{R}_* {\tilde{A}}_H \lrcorner \, \eta _K = 0 \quad \text {and}\quad {\textbf{F}}^{R}_* {\tilde{A}}_H\lrcorner \, dt=1, \end{aligned}$$

while

$$\begin{aligned} {\textbf{F}}^{R}_* {\tilde{A}}_H \lrcorner \, \eta _K = (\textbf{F}^{R*})^{-1} ({\tilde{A}}_H \lrcorner \, {\textbf{F}}^{R*}\eta _K ) = ({\textbf{F}}^{R*})^{-1} ({\tilde{A}}_H \lrcorner \, \omega _H ) = 0, \end{aligned}$$

and

$$\begin{aligned} {\textbf{F}}^{R}_* {\tilde{A}}_H\lrcorner \, dt = ({\textbf{F}}^{R*})^{-1} ({\tilde{A}}_H \lrcorner \, {\textbf{F}}^{R*}dt ) = ({\textbf{F}}^{R*})^{-1} (\dot{F}^0 {\tilde{A}}_H \lrcorner \, dt ) = ({\textbf{F}}^{R*})^{-1} (\dot{F}^0) = 1. \end{aligned}$$

(iii) \(\Rightarrow \) (ii) is similar. \(\square \)

Definition 6.17

Let \({\textbf{F}}: {\mathcal {T}}^{S*} M\times {\mathbb {R}}\rightarrow {\mathcal {T}}^{S*} N\times {\mathbb {R}}\) be canonical. If we can locally write

$$\begin{aligned} {\textbf{F}}^{R*}{\tilde{\vartheta }} - {\tilde{\theta }} + H_{{\textbf{F}}} d F^0 = -d^{\circ } G \end{aligned}$$
(6.35)

for \(G\in C^\infty (M\times {\mathbb {R}})\), then we call G a generating function for the canonical transformation \({\textbf{F}}\).

We use (xpot) for local coordinates on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) and (yPOs) for those on \({\mathcal {T}}^{S*} N\times {\mathbb {R}}\). Recall that \({\textbf{F}}(\alpha _q, t) = (\bar{\textbf{F}}(\alpha _q, t), F^0(t))\). Then, using (A.4), the relation (6.35) reads in coordinates as

$$\begin{aligned} \begin{aligned}&\left[ \dot{F}^0 + (P_i\circ {\textbf{F}}) \frac{\partial \bar{{\textbf{F}}}^i}{\partial t} \right] dt + (P_i\circ {\textbf{F}}) \frac{\partial \bar{{\textbf{F}}}^i}{\partial x^j} d^2x^j \\ {}&\quad + \frac{1}{2} \left[ (P_i\circ {\textbf{F}}) \frac{\partial ^2\bar{{\textbf{F}}}^i}{\partial x^k\partial x^l} + (O_{ij}\circ {\textbf{F}})\frac{\partial \bar{{\textbf{F}}}^i}{\partial x^k}\frac{d\bar{{\textbf{F}}}^j}{dx^l} \right] dx^k \cdot dx^l \\&\quad - \left( dt + p_i d^2 x^i + \frac{1}{2} o_{jk} dx^j\cdot dx^k \right) + H_{{\textbf{F}}} dF^0 \\ {}&\quad + \frac{\partial G}{\partial t} dt + \frac{\partial G}{\partial x^i} d^2 x^i + \frac{1}{2} \frac{\partial ^2 G}{\partial x^j \partial x^k} dx^j\cdot dx^k = 0. \end{aligned} \end{aligned}$$

Balancing the coefficient of dt, we get

$$\begin{aligned} \frac{\partial G}{\partial t} + H_{{\textbf{F}}} + (P_i\circ {\textbf{F}}) \frac{\partial \bar{{\textbf{F}}}^i}{\partial t} + \dot{F}^0 - 1 = 0. \end{aligned}$$

By Lemma 6.16, the new Hamiltonian function K after transformation \({\textbf{F}}\) is related to the old Hamiltonian H by \((H - K\circ {\textbf{F}})\dot{F}^0 = H_{{\textbf{F}}}\). Let us further assume that we can choose coordinates in which \((y^i)\) and \((x^i)\) are independent, so that the independent variables in (6.35) are (xyt). Then, relation (6.35) means

$$\begin{aligned}{} & {} \left( P_i d^2 y^i + \textstyle {\frac{1}{2}} O_{jk} dy^j\cdot dy^k + dF^0 \right) - \left( p_i d^2 x^i + \textstyle {\frac{1}{2}} o_{jk} dx^j\cdot dx^k + dt \right) \nonumber \\{} & {} \quad + (H dt - K d F^0 ) = - d^\circ G, \end{aligned}$$
(6.36)

which implies that the generating function of the canonical transformation G(xyt) satisfies

$$\begin{aligned} \left\{ \begin{aligned}&p_i = \frac{\partial G}{\partial x^i}, \quad o_{jk} \frac{\partial x^k}{\partial y^l} = \frac{\partial ^2 G}{\partial x^j \partial x^k} \frac{\partial x^k}{\partial y^l} + \frac{\partial ^2 G}{\partial x^j \partial y^l}, \\&P_i = - \frac{\partial G}{\partial y^i}, O_{jk} = - \frac{\partial ^2 G}{\partial y^j \partial y^k} - \frac{\partial ^2 G}{\partial y^j \partial x^l} \frac{\partial x^l}{\partial y^k}, \\&(K - 1) \dot{F}^0 - H + 1 = \frac{\partial G}{\partial t}. \end{aligned}\right. \end{aligned}$$
(6.37)

The expressions for \((o_{jk})\) and \((O_{jk})\) are due to the mixed differential term in \(d^\circ G\) and correspond to the relation (6.15).

Remark 6.18

Unlike the canonical transformations of classical Hamiltonian systems which have four types of generating functions related via classical Legendre transform (see Goldstein et al. 2002, Section 9.1), here we can only have the type using (xyt) as independent variables but not others. This can be attributed to the ill-behaveness of the second-order analog of Legendre transform, as indicated in Remark 6.10.(iii). However, if the configuration space M is a Riemannian manifold, stochastic Hamiltonian mechanics can be simplified to share the same phase space \(T^* M\) as classical Hamiltonian mechanics, so that we can also have four types of generating functions. See Sect. 7.4.2 for details and examples of canonical transformations.

The Hamilton–Jacobi–Bellman (HJB) equation can be introduced as a special case of a time-dependent canonical transformation (6.37). In the case where \(F^0 = \textbf{Id}_{\mathbb {R}}\) and the new Hamiltonian K vanishes formally, we denote by S the corresponding generating function G. It follows from (6.37) that S solves the Hamilton–Jacobi–Bellman equation,

$$\begin{aligned} \frac{\partial S}{\partial t} + H\left( x^i, \frac{\partial S}{\partial x^i}, \frac{\partial ^2 S}{\partial x^j \partial x^k}, t \right) = 0. \end{aligned}$$
(6.38)

We will refer to Eq. (6.38) as the HJB equation associated with second-order Hamiltonian H, and a solution S of (6.38) as a second-order Hamilton’s principal function of H.

More generally, we have

Theorem 6.19

Let \(A_H\) be a second-order Hamiltonian vector field on \(({\mathcal {T}}^{S*} M, \omega )\) and let \(S\in C^\infty (M\times {\mathbb {R}})\). Then, the following statements are equivalent:

  1. (i)

    for every M-valued diffusion X satisfying

    $$\begin{aligned} (DX(t), QX(t)) = d^2 \big (\tau _M^*\big )_{d^2 S(t,X(t))} A_H, \end{aligned}$$

    the \({\mathcal {T}}^{S*} M\)-valued process \(d^2S\circ X\) is a horizontal integral process of \(A_H\);

  2. (ii)

    S satisfies the Hamilton–Jacobi–Bellman equation

    $$\begin{aligned} \frac{\partial S}{\partial t} + H(d^2 S, t) = f(t), \end{aligned}$$
    (6.39)

    for some function f depending only on t.

Proof

Let \({\textbf{X}} = d^2S\circ X\) and set \(x^i = x^i\circ d^2\,S\), \(p_i = p_i\circ d^2\,S\), \(o_{jk} = o_{jk}\circ d^2\,S\). Then,

$$\begin{aligned} p_i(t,x) = \frac{\partial S}{\partial x^i}(t,x), \quad o_{jk}(t,x) = \frac{\partial ^2 S}{\partial x^j \partial x^k}(t,x). \end{aligned}$$
(6.40)

These imply that the last equation of the system (6.17) holds. Since

$$\begin{aligned} d^2 (\tau _M^*)_{{\textbf{X}}(t)} A_H = \frac{\partial H}{\partial p_i}({\textbf{X}}(t)) \frac{\partial }{\partial {x^i}} + \frac{\partial H}{\partial o_{jk}}({\textbf{X}}(t)) \frac{\partial ^2}{\partial x^j \partial x^k}, \end{aligned}$$

the first two equations in (6.10) or (6.17) hold. Hence, to turn the process \({\textbf{X}} = d^2S\circ X\) into a horizontal integral process of \(A_H\), it is sufficient and necessary to make sure that the third equation in (6.17) holds. Plugging the first equation of (6.40) into the third equation, it reads as

$$\begin{aligned} \bigg ( \frac{\partial }{\partial t} + \frac{\partial H}{\partial p_j} \frac{\partial }{\partial {x^j}} + \frac{\partial H}{\partial o_{jk}} \frac{\partial ^2}{\partial x^j\partial x^k}\bigg ) \frac{\partial S}{\partial x^i} = - \frac{\partial H}{\partial x^i}. \end{aligned}$$

A straightforward reinterpretation yields

$$\begin{aligned} \frac{\partial }{\partial {x^i}} \left[ \frac{\partial S}{\partial t} + H\left( x^j, \frac{\partial S}{\partial x^j}, \frac{\partial ^2 S}{\partial x^j \partial x^k}, t \right) \right] = 0. \end{aligned}$$

The result follows. \(\square \)

Remark 6.20

If S solves the HJB equation (6.39), then \({\tilde{S}} = S - {\tilde{f}}\) solve (6.38) with \({\tilde{f}}\) a primitive function of f. As a matter of fact, one can always integrate the time-dependent function f into the second-order Hamiltonian function H such that the HJB equation (6.39) has the same form as (6.38). More precisely, if we let \({\tilde{H}} = H - f\), then Theorem 6.19 also holds with \({\tilde{H}}\) and zero function in place of H and f, respectively. A similar argument holds for S-H equations (6.10). Indeed, adding a function f depending only on time to a second-order Hamiltonian does not change its S-H equations.

Example 6.21

The function \(S=\ln u\) considered in Sect. 6.3 satisfies the Hamilton–Jacobi–Bellman equation (6.28), which is exactly \(\frac{\partial S}{\partial t} + H(d^2 S) = 0\) with the second-order Hamiltonian H given in (6.26). Hence, this theorem yields that the process \(d^2S\circ X\) is a horizontal integral process of \(A_H\), which coincides with (6.32). The Euclidean case for such argument has been discovered in Chung and Zambrini (2003, p. 180) or Zambrini (2015, Eq. (4.20)).

By (6.38) and (6.40), the total mean derivative of a second-order Hamilton’s principal function S is given by

$$\begin{aligned} {\textbf{D}}_t S= & {} \frac{\partial S}{\partial t} + D^i x \frac{\partial S}{\partial x^i} + \frac{1}{2} Q^{jk} x \frac{\partial ^2 S}{\partial x^j \partial x^k} \nonumber \\= & {} p_i D^i x + \frac{1}{2} o_{jk} Q^{jk} x - H(x,p,o,t). \end{aligned}$$
(6.41)

where \((p(t,x),o(t,x)) = d^2 S(t,x)\) as in (6.40).

6.6 Second-Order Hamiltonian Functions from Classical Ones

In the presence of a linear connection \(\nabla \) on M, we are able to reduce (or produce) second-order Hamiltonian functions to (from) classical ones.

Let be given a second-order Hamiltonian function \(H: {\mathcal {T}}^{S*} M\times {\mathbb {R}}\rightarrow {\mathbb {R}}\). We make use of the fiber-linear bundle injection \({\hat{\iota }}^*_\nabla : T^* M \rightarrow {\mathcal {T}}^{S*} M\) in (5.5) to define a classical Hamiltonian by

$$\begin{aligned} H_0 = H\circ \big ({\hat{\iota }}^*_\nabla \times \textbf{Id}_{{\mathbb {R}}}\big ): T^* M\times {\mathbb {R}}\rightarrow {\mathbb {R}}. \end{aligned}$$
(6.42)

In canonical coordinates, it maps as \(H_0(x, p, t) = H(x, p, (\Gamma _{jk}^i(x) p_i), t)\). If we introduce a family of auxiliary variables by

$$\begin{aligned} {\hat{o}}_{jk} = {\hat{o}}_{jk}(x,p):= \Gamma _{jk}^i(x) p_i. \end{aligned}$$
(6.43)

Then, we can write

$$\begin{aligned} H_0(x,p,t) = H(x,p,{\hat{o}}(x,p),t). \end{aligned}$$

We say H reduces to \(H_0\) under the connection \(\nabla \), or \(H_0\) is the \(\nabla \)-reduction of H.

Clearly, the way to lift from a classical Hamiltonian \(H_0: T^* M\times {\mathbb {R}}\rightarrow {\mathbb {R}}\) to a second-order Hamiltonian function that reduces to \(H_0\) under \(\nabla \) is not unique. But there is a canonical reduction when we are provided with a symmetric (2, 0)-tensor field g (not necessarily Riemannian), given by

$$\begin{aligned} {\overline{H}}^g_0(x,p,o,t)&:= H_0(x,p,t) + \textstyle {\frac{1}{2}} g^{jk}(x) \left( o_{jk} - \Gamma ^i_{jk}(x)p_i \right) \nonumber \\&= H_0(x,p,t) + \textstyle {\frac{1}{2}} g^{jk}(x) o_{jk}^\nabla . \end{aligned}$$
(6.44)

Then, \(H_0\) is the \(\nabla \)-reduction of \({\overline{H}}^g_0\), and

$$\begin{aligned} \textstyle {\frac{1}{2}} o_{jk} g^{jk} - {\overline{H}}^g_0(x,p,o,t) = \textstyle {\frac{1}{2}} {\hat{o}}_{jk} g^{jk} - \overline{H}^g_0(x,p,{\hat{o}},t). \end{aligned}$$
(6.45)

We call \({\overline{H}}^g_0\) the \((g,\nabla )\)-canonical lift of \(H_0\). If g is a Riemannian metric and \(\nabla \) is the associated Levi–Civita connection, then we simply call \({\overline{H}}^g_0\) the g-canonical lift of \(H_0\). If there is a classical Hamiltonian \(H_0\) such that the second-order Hamiltonian H is the \((g,\nabla )\)- (or g-) canonical lift of \(H_0\), we say H is \((g,\nabla )\)- (or g-) canonical.

As an example, the second-order Hamiltonian H of (6.26) is g-canonical and reduces to \(H_0(x,p) = \frac{1}{2} g^{ij}(x) p_ip_j + b^i(x)p_i + F(x)\).

Furthermore, for the canonical transformation \({\textbf{F}}: \mathcal T^{S*} M \rightarrow {\mathcal {T}}^{S*} N\) in Definition 6.14, we can reduce its associated function \(H_{{\textbf{F}}} \in C^\infty ({\mathcal {T}}^{S*} M\times {\mathbb {R}})\) to a classical function \(H^0_{{\textbf{F}}} \in C^\infty (T^* M\times {\mathbb {R}})\) via (6.42). As a consequence of (6.34), the projection of \({\textbf{F}}\), i.e., the map \({\mathbb {F}}: T^* M\times {\mathbb {R}}\rightarrow T^* N\times {\mathbb {R}}\) satisfies \({\mathbb {F}}^*{\tilde{\eta }}_0 = \omega ^0_{H^0_{{\textbf{F}}}}\) where \(\omega ^0_{H^0_{{\textbf{F}}}} = {\tilde{\omega }}_0 + d H^0_{{\textbf{F}}} \wedge dF^0\). It follows that \({\mathbb {F}}\) is a classical canonical transformation (Abraham and Marsden 1978, Definition 5.2.6).

We will go back to this issue in Sect. 7.4 where the second-order Legendre transform will be developed. In particular, we will show there that for the canonical second-order Hamiltonian in (6.44), the corresponding second-order Hamilton’s equations (6.17) can be rewritten on the cotangent bundle \(T^* M\) in a global fashion, see Theorem 7.22.

7 Stochastic Lagrangian Mechanics

In this section, we specify a Riemannian metric g for the manifold M, and a g-compatible linear connection \(\nabla \). Note that such g and \(\nabla \) always exist but are not unique in general.

We will denote by \(|\cdot |\) and \(\langle \cdot ,\cdot \rangle \) the Riemannian norm and inner product, respectively. Also, denote by \({\check{g}}\) the inverse metric tensor of g, and \((\Gamma _{jk}^i)\) the Christoffel symbols of \(\nabla \). We observe that \({\check{g}}\) is a (2, 0)-tensor field. Denote by R the Riemann curvature tensor and \(\textrm{Ric}\) the Ricci (1, 1)-tensor.

7.1 Mean Covariant Derivatives

Definition 7.1

(Vector fields and 1-forms along diffusions) Let X be diffusion on M. By a vector field along X, we mean a TM-valued process V, such that \(\tau _M (V(t)) = X(t)\) for all t. Similarly, by a 1-form along X, we mean a \(T^* M\)-valued process \(\eta \), such that \(\tau ^*_M (\eta (t)) = X(t)\) for all t.

Clearly, for a time-dependent vector field V on M, the restriction of V on X, i.e., \(\{V_{(t,X(t))}\}\), is a vector field along X. In this case, we call \(\{V_{(t,X(t))}\}\) a vector field restricted on X. In this way, vector fields restricted on X are just TM-valued horizontal diffusions projecting to X. Similarly for 1-forms.

Definition 7.2

(Parallelisms along diffusions) Let \(X \in I_{t_0}(M)\). A vector field V along X is said to be parallel along X if the following Stratonovich SDE in local coordinates holds,

$$\begin{aligned} d V^i(t) + \Gamma _{jk}^i(X(t)) V^j(t) \circ dX^k(t) = 0. \end{aligned}$$
(7.1)

A 1-form \(\eta \) along X is said to be parallel along X if

$$\begin{aligned} d \eta _j(t) - \Gamma _{jk}^i(X(t)) \eta _i(t) \circ dX^k(t) = 0. \end{aligned}$$

Definition 7.3

(Stochastic parallel displacements) Given a diffusion \(X\in I_{t_0} (M)\) and a (random) vector \(v \in T_{X(t_0)} M\), the stochastic parallel displacement of v along X is the extension of v to a parallel vector field V along X, that is, V satisfies the SDE (7.1) with initial condition \(V(t_0) = v\). We denote \(\Gamma (X)_{t_0}^t v:= V(t)\) and \(\Gamma (X)_t^{t_0} V(t):= v\). The stochastic parallel displacement of a (random) covector \(\eta \in T^*_{X(t_0)} M\) along X is defined in a similar fashion.

Definition 7.4

(Damped parallel displacements) Let \(X\in I_{t_0} (M)\). Given a (random) vector \(v \in T_{X(t_0)} M\) and covector \(\eta _0 \in T^*_{X(t_0)} M\), the damped parallel displacement of v along X is the extension of v to a vector field V along X that satisfies the SDE

$$\begin{aligned}{} & {} d V^i(t) + \Gamma _{jk}^i(X(t)) V^j(t) \circ dX^k(t) + \frac{1}{2} R^i_{kjl}(X(t)) V^j(t) (QX)^{kl}(t) dt = 0, \nonumber \\{} & {} V(t_0) = v. \end{aligned}$$
(7.2)

The damped parallel displacement of \(\eta _0\) along X is the extension of \(\eta \) to a vector field \(\eta \) along X that satisfies the SDE

$$\begin{aligned}{} & {} d \eta _j(t) - \Gamma _{jk}^i(X(t)) \eta _i(t) \circ dX^k(t) - \frac{1}{2} R^i_{kjl}(X(t)) \eta _i(t) (QX)^{kl}(t) dt = 0, \nonumber \\{} & {} \eta (t_0) = \eta _0. \end{aligned}$$
(7.3)

We denote \(\overline{\Gamma }(X)_{t_0}^t v:= V(t)\), \(\overline{\Gamma }(X)_{t_0}^t \eta _0:= \eta (t)\), and \(\overline{\Gamma }(X)_t^{t_0} V(t):= v\), \(\overline{\Gamma }(X)_t^{t_0} \eta (t):= \eta _0\).

If V and \(\eta \) are restrictions on X, that is, \(V(t) = V_{(t,X(t))}\) and \(\eta (t) = \eta _{(t,X(t))}\), then equations (7.2) and (7.3) can be rewritten, respectively, as

$$\begin{aligned}{} & {} \frac{\partial V}{\partial t} dt + \nabla ^{}_{\circ dX} V + \frac{1}{2} R(V, \circ dX) \circ dX = 0, \\{} & {} \frac{\partial \eta }{\partial t} dt + \nabla ^{}_{\circ dX} \eta - \frac{1}{2} R(\eta , \circ dX) \circ dX = 0, \end{aligned}$$

where we mean by \(R(\eta , V) W\) the 1-form \([R(\eta ^\sharp , V) W]^\flat \). The Stratonovich stochastic differentials can be transformed into Itô ones. For example, (7.3) is equivalent to

$$\begin{aligned} d \eta _j(t)= & {} \Gamma _{jk}^i(X(t)) \eta _i(t) dX^k(t) + \frac{1}{2} (QX)^{kl}(t)\left( \frac{\partial \Gamma _{jk}^i}{\partial x^l} + \Gamma _{jk}^m \Gamma _{ml}^i \right) (X(t)) \eta _i(t) dt \nonumber \\{} & {} + \frac{1}{2} R^i_{kjl}(X(t)) \eta _i(t) (QX)^{kl}(t) dt. \end{aligned}$$
(7.4)

Remark 7.5

The notion of stochastic parallel displacements was introduced by Itô (1975) and Dynkin (1968). The notion of damped parallel displacement is due to Malliavin (1997). It was originally introduced by Dohrn and Guerra (1979), where they call it geodesic correction to the stochastic parallel displacement.

Lemma 7.6

Let \(X \in I_{t_0}(M)\).

  1. (i)

    Let \(\eta \) be a 1-form on M parallel along X. If V is a vector field on M which is also parallel along X, then \(\eta (V)(t) = \eta (V)(t_0)\) for all \(t\ge t_0\); if \(v \in T_{X(t_0)} M\), then \(\eta (\Gamma (X)_{t_0}^t v)(t) = \eta (v)(t_0)\) for all \(t\ge t_0\).

  2. (ii)

    Let \(\eta \) be a 1-form on along X satisfying the SDE (7.3). If V is a vector field along X satisfying the SDE (7.2), then \(\eta (V)(t) = \eta (V)(t_0)\) for all \(t\ge t_0\); if \(v \in T_{X(t_0)} M\), then \(\eta (\overline{\Gamma }(X)_{t_0}^t v)(t) = \eta (v)(t_0)\) for all \(t\ge t_0\).

Proof

We only prove Assertion (ii), as (i) is similar. Since Stratonovich stochastic differentials obey Leibniz’s rule, we have

$$\begin{aligned} \begin{aligned} d[\eta (V)]&= \eta _i \circ dV^i + V^j \circ d\eta _j \\&= -\eta _i \Gamma _{jk}^i V^j \circ dX^k - \frac{1}{2} \eta _i R^i_{kjl} V^j (QX)^{kl} dt + V^j \Gamma _{jk}^i \eta _i \circ dX^k \\&\quad + \frac{1}{2} V^j R^i_{kjl} \eta _i (QX)^{kl} dt \\&= 0. \end{aligned} \end{aligned}$$

This proves the first statement of (ii). The second statement of (ii) follows by letting \(V(t):= \overline{\Gamma }(X)_{t_0}^t v\). \(\square \)

Definition 7.7

(Mean covariant derivatives along diffusions) Given a diffusion X on M. Let V and \(\eta \) be time-dependent vector field and 1-form along X, respectively. The (forward) mean covariant derivative of V with respect to X is a time-dependent vector field \(\frac{{\textbf{D}}V}{dt}\) along X, defined by

$$\begin{aligned} \frac{{\textbf{D}}V}{dt} (t) = \lim _{\epsilon \rightarrow 0^+} {\textbf{E}}\left[ \frac{\Gamma (X)_{t+\epsilon }^t V(t+\epsilon ) - V(t) }{\epsilon } \Bigg | {\mathcal {P}}_t \right] . \end{aligned}$$
(7.5)

The damped mean covariant derivative of V with respect to X is a time-dependent vector field \(\frac{\overline{{\textbf{D}}}V}{dt}\) along X with \(\overline{\Gamma }\) instead of \(\Gamma \) in (7.5). Similarly, we can define \(\frac{{\textbf{D}}\eta }{dt}\) and \(\frac{\overline{{\textbf{D}}}\eta }{dt}\).

Lemma 7.8

  1. (i)

    Let V and \(\eta \) be vector field and 1-form along X. If \(\eta \) is parallel along X, then

    $$\begin{aligned} \textstyle {{\textbf{E}}\left[ \eta \left( \frac{{\textbf{D}}V}{dt} \right) \right] = {\textbf{E}}\left( D[\eta (V)] \right) .} \end{aligned}$$
    (7.6)

    If \(\eta \) satisfies the SDE (7.3), then (7.6) holds true with \(\frac{\overline{{\textbf{D}}}}{dt}\) instead of \(\frac{{\textbf{D}}}{dt}\).

  2. (ii)

    Let V be a vector field restricted on X. Then,

    $$\begin{aligned} \frac{\overline{{\textbf{D}}}V}{dt}= & {} \frac{{\textbf{D}}V}{dt} + \frac{1}{2} (QX)^{ij} R(V,\partial _i)\partial _j = \frac{\partial V}{\partial t} + \nabla ^{}_{D_\nabla X} V \\{} & {} + \frac{1}{2} (QX)^{ij} \left( \nabla ^2_{\partial _i,\partial _j} V + R(V,\partial _i)\partial _j \right) . \end{aligned}$$
  3. (iii)

    Let \(\eta \) be a 1-form restricted on X. Then,

    $$\begin{aligned} \frac{\overline{{\textbf{D}}}\eta }{dt}= & {} \frac{{\textbf{D}}\eta }{dt} - \frac{1}{2} (QX)^{ij} R(\eta ,\partial _j)\partial _i = \frac{\partial \eta }{\partial t} + \nabla ^{}_{D_\nabla X} \eta \\{} & {} + \frac{1}{2} (QX)^{ij} \left( \nabla ^2_{\partial _i,\partial _j} \eta - R(\eta ,\partial _j)\partial _i \right) . \end{aligned}$$
  4. (iv)

    Let V and \(\eta \) be a vector field and a 1-form restricted on X. Then,

    $$\begin{aligned} {\textbf{D}}_{\textrm{t}}[\eta (V)]= & {} \eta \left( \frac{{\textbf{D}}V}{dt} \right) + \frac{{\textbf{D}}\eta }{dt} (V) + (QX)^{ij} (\nabla _{\partial _i} \eta ) (\nabla _{\partial _j} V) \\= & {} \eta \left( \frac{\overline{{\textbf{D}}}V}{dt} \right) + \frac{\overline{{\textbf{D}}}\eta }{dt} (V) + (QX)^{ij} (\nabla _{\partial _i} \eta ) (\nabla _{\partial _j} V). \end{aligned}$$

Proof

  1. (i)

    By Lemma 7.6.(i), we have

    $$\begin{aligned} \begin{aligned} {\textbf{E}}\left[ \eta \left( \frac{{\textbf{D}}V}{dt} \right) (t) \right]&= \lim _{\epsilon \rightarrow 0} {\textbf{E}}\left[ \frac{ \eta (t) (\Gamma (X)_{t+\epsilon }^t V(t+\epsilon ) ) - \eta (t)(V(t)) }{\epsilon } \right] \\&= \lim _{\epsilon \rightarrow 0} {\textbf{E}}\left[ \frac{ \eta ( V ) (t+\epsilon ) - \eta ( V ) (t) }{\epsilon } \right] \\&= {\textbf{E}}\left( D[\eta (V)(t)] \right) . \end{aligned} \end{aligned}$$

    This proves the first statement of (i). The second statement of (i) follows by a similar argument with \(\frac{\overline{{\textbf{D}}}}{dt}\) in place of \(\frac{{\textbf{D}}}{dt}\) and \(\overline{\Gamma }\) in place of \(\Gamma \).

  2. (ii)

    It suffices to derive the expression for \(\frac{\overline{{\textbf{D}}}V}{dt}\). Suppose that \(\eta \) is a 1-form satisfying the SDE (7.3) and the diffusion X satisfies \(QX(t) = (\sigma \circ \sigma ^*)(t,X(t))\). Then, we apply Itô’s formula to \(\eta (V)(X(t))\) and make use of (2.20) and (7.4). We get

    $$\begin{aligned} \begin{aligned} d[\eta (V)]&= d(\eta _i V^i) = \eta _i \left( \frac{\partial V^i}{\partial t} dt + \frac{\partial V^i}{\partial x^j} dX^j + \frac{1}{2} \frac{\partial ^2 V^i}{\partial x^j\partial x^k} d[X^j, X^k] \right) \\&\quad + V^j d\eta _j + d[\eta _j, V^j] \\&= \eta _i \left( \frac{\partial V^i}{\partial t} + \frac{\partial V^i}{\partial x^j} (DX)^j + \frac{1}{2} \frac{\partial ^2 V^i}{\partial x^j\partial x^k} (QX)^{jk} \right) dt + \eta _i \frac{\partial V^i}{\partial x^j} \sigma _r^j dB^r \\&\quad + V^j \left[ \Gamma _{jk}^i (DX)^k + \frac{1}{2} (QX)^{kl}\left( \frac{\partial \Gamma _{jk}^i}{\partial x^l} + \Gamma _{jk}^m \Gamma _{ml}^i \right) + \frac{1}{2} R^i_{kjl} (QX)^{kl} \right] \eta _i dt \\&\quad + V^j \Gamma _{jk}^i \eta _i \sigma _r^k dB^r + \Gamma _{jk}^i \eta _i \frac{\partial V^j}{\partial x^l} (QX)^{kl} dt \\&= \eta _i \left[ \frac{\partial V^i}{\partial t} + \left( \frac{\partial V^i}{\partial x^k} + V^j \Gamma _{jk}^i \right) (D_\nabla X)^k \right] dt \\&\quad + \frac{1}{2} \eta _i (QX)^{kl} \left[ -\frac{\partial V^i}{\partial x^j} \Gamma ^j_{kl} + \frac{\partial ^2 V^i}{\partial x^k\partial x^l} + V^j \left( -\Gamma _{jm}^i \Gamma ^m_{kl} + \frac{\partial \Gamma _{jk}^i}{\partial x^l} + \Gamma _{jk}^m \Gamma _{ml}^i \right) \right. \\&\quad \left. + 2 \Gamma _{jk}^i \frac{\partial V^j}{\partial x^l} \right] dt \\&\quad + \frac{1}{2} \eta _i R^i_{kjl} (QX)^{kl} V^j dt + \eta _i \left( \frac{\partial V^i}{\partial x^k} + V^j \Gamma _{jk}^i \right) \sigma _r^k dB^r \\&= \eta \left( \frac{\partial V}{\partial t} + \nabla ^{}_{D_\nabla X} V + \frac{1}{2} (QX)^{ij} \left( \nabla ^2_{\partial _i,\partial _j} V + R(V,\partial _i)\partial _j \right) \right) dt \\&\quad + \eta \left( \nabla _{\sigma _r} V \right) dB^r. \end{aligned} \end{aligned}$$

    Hence, the result (i) implies

    $$\begin{aligned} {\textbf{E}}\left[ \eta \left( \frac{\overline{{\textbf{D}}}V}{dt} \right) \right]= & {} {\textbf{E}}\left( D[\eta (V)(t)] \right) \\= & {} {\textbf{E}}\left[ \eta \left( \frac{\partial V}{\partial t} + \nabla ^{}_{D_\nabla X} V + \frac{1}{2} (QX)^{ij} \left( \nabla ^2_{\partial _i,\partial _j} V + R(V,\partial _i)\partial _j \right) \right) \right] . \end{aligned}$$

    The arbitrariness of \(\eta \) yields (ii).

  3. (iii)

    Similar to (ii).

  4. (iv)

    We only prove the first equality as the second is similar. By (4.6),

    $$\begin{aligned} \begin{aligned} {\textbf{D}}_{\textrm{t}} [\eta (V)]&= \left( \frac{\partial }{\partial t} + (D_\nabla X)^i \partial _i + \frac{1}{2} (QX)^{ij} \nabla ^2_{\partial _i,\partial _j} \right) [\eta (V)] \\&= \left( \frac{\partial \eta }{\partial t} \right) (V) + \eta \left( \frac{\partial V}{\partial t} \right) + \left( \nabla ^{}_{D_\nabla X} \eta \right) (V) + \eta \left( \nabla ^{}_{D_\nabla X} V \right) \\&\quad + \frac{1}{2} (QX)^{ij} \left[ \left( \nabla ^2_{\partial _i,\partial _j} \eta \right) (V) + \eta \left( \nabla ^2_{\partial _i,\partial _j} V \right) \right. \\&\quad \left. + \left( \nabla _{\partial _i} \eta \right) \left( \nabla _{\partial _j} V \right) + \left( \nabla _{\partial _j} \eta \right) \left( \nabla _{\partial _i} V \right) \right] \\&= \eta \left( \frac{{\textbf{D}}V}{dt} \right) + \frac{{\textbf{D}}\eta }{dt} (V) + (QX)^{ij} (\nabla _{\partial _i} \eta ) (\nabla _{\partial _j} V). \end{aligned} \end{aligned}$$

    The result follows.

\(\square \)

If \(QX(t) = {\check{g}}(X(t))\), then

$$\begin{aligned} \frac{\overline{{\textbf{D}}}V}{dt} = \frac{\partial V}{\partial t} + \nabla ^{}_{D_\nabla X} V + \frac{1}{2} \Delta V + \frac{1}{2} \textrm{Ric}(V), \end{aligned}$$

and similarly,

$$\begin{aligned} \frac{\overline{{\textbf{D}}}\eta }{dt} = \frac{\partial \eta }{\partial t} + \nabla ^{}_{D_\nabla X} \eta + \frac{1}{2} \Delta \eta - \frac{1}{2} \textrm{Ric}(\eta ) = \frac{\partial \eta }{\partial t} + \nabla ^{}_{D_\nabla X} \eta + \frac{1}{2} \Delta _{\textrm{LD}} \eta , \end{aligned}$$
(7.7)

where \(\Delta \) is the connection Laplacian, and \(\Delta _{\textrm{LD}} = -( dd^*+d^* d)\) is the Laplace–de Rham operator on forms. The relation \(\Delta _{\textrm{LD}} = \Delta - \textrm{Ric}\) is due to the Weitzenböck identity (Petersen 2016, Theorem 9.4.1). We remark here that the operator \(\Delta + \textrm{Ric}\) acting on vector fields is also called Laplace–de Rham operator in Dohrn and Guerra (1979).

In the context of fluid dynamics, the operator \(\frac{\partial }{\partial t}+ \nabla _v\), with v a vector field, is often referred to as material derivative or hydrodynamic derivative. So the mean covariant derivative \(\frac{{\textbf{D}}}{dt}\) and its damped variant \(\frac{\overline{{\textbf{D}}}}{dt}\) can be regarded as stochastic deformations of material derivative.

7.2 A Stochastic Stationary-Action Principle

In this subsection, we will establish a type of stochastic stationary-action principle: the stochastic Hamilton’s principle. Another version for systems with conserved energy, the stochastic Maupertuis’s principle, can be found in “Appendix C.”

In contrast to second-order Hamiltonians, not all real-valued functions on \({\mathcal {T}}^S M\) can be used as second-order Lagrangians in stochastic Lagrangian mechanics. This has been hinted in Sect. 6.3, as we have mentioned in Remark 6.10. For this reason, we will produce a class of second-order Lagrangians from classical Lagrangians, via the fiber-linear bundle projection \(\varrho _\nabla \) in (3.3) and the \(\nabla \)-canonical coordinates \((D_\nabla ^i x)\) in (3.2).

Definition 7.9

By an admissible second-order Lagrangian, we mean a function \(L:{\mathbb {R}}\times {\mathcal {T}}^S M\rightarrow {\mathbb {R}}\) such that there exists a classical Lagrangian \(L_0: {\mathbb {R}}\times TM\rightarrow {\mathbb {R}}\) satisfying \(L = L_0 \circ (\textbf{Id}_{\mathbb {R}}\times \varrho _\nabla )\). We call L the \(\nabla \)-lift of \(L_0\).

In local coordinates, the \(\nabla \)-lift L of \(L_0\) is expressed as

$$\begin{aligned} L(t, x, Dx, Qx) = L_0 \circ \varrho _\nabla (t, x, Dx, Qx) = L_0(t, x, D_\nabla x). \end{aligned}$$
(7.8)

Let \(T>0\). Our stochastic variational problem consists in finding the extrema (maxima or minima) of the stochastic action functional

$$\begin{aligned} {\mathcal {S}} [X;0,T]&:= {\textbf{E}}\int _0^T L\left( t, X(t), D X(t), Q X(t) \right) {\textrm{d}}t \nonumber \\&= {\textbf{E}}\int _0^T L_0\left( t, X(t), D_\nabla X(t) \right) {\textrm{d}}t \end{aligned}$$
(7.9)

over a suitable domain of diffusions X on M, where L is an admissible second-order Lagrangian lifted from \(L_0\).

In order to formulate a well-posed stochastic variational problem in an economical way, we assume that the manifold M is compact and the metric g is geodesically complete (which will be used to characterize the variations of diffusions in Lemma 7.13), and that the connection \(\nabla \) is the associated Levi–Civita connection. The geodesic completeness can be ensured, for example, if M is connected (see, e.g., Lee 2013, p. 346). Whenever the metric g is given, the associated Levi–Civita connection is uniquely determined, due to the fundamental theorem of Riemannian geometry (Kobayashi and Nomizu 1963, Theorem IV.2.2). We will refer to such a geodesically complete Riemannian metric as a reference metric tensor.

For a fixed point \(q \in M\) and a probability distribution \(\mu \in {\mathcal {P}}(M)\) on M, we define an admissible class of diffusions by

$$\begin{aligned} {\mathcal {A}}_g([0,T];q, \mu ) = \left\{ X\in I_{(0,q)}^{(T,\mu )}(M): QX(t) = {\check{g}}(X(t)), \forall t\in [0,T], \text {a.s.} \right\} ,\nonumber \\ \end{aligned}$$
(7.10)

where \(I_{(0,q)}^{(T,\mu )}(M)\) denotes the set all M-valued diffusion processes starting from q at \(t=0\) and with final distribution \(\mu \), i.e., \({\textbf{P}}\circ (X(T))^{-1} = \mu \). The action functional \({\mathcal {S}}\) is now defined on the set \({\mathcal {A}}_g([0,T];q, \mu )\), that is, \({\mathcal {S}}: {\mathcal {A}}_g([0,T];q, \mu ) \rightarrow {\mathbb {R}}\).

Note that the admissible class \({\mathcal {A}}_g\) is similar to the Wiener space, so that a candidate for its “tangent space” is Cameron–Martin space. Denote by \({\mathcal {H}}([0,T];q)\) the Hilbert space of absolutely continuous curves \(v:[0,T]\rightarrow T_q M\) such that \(\int _0^T |\dot{v}(t)|^2 {\textrm{d}}t < \infty \). Let \({\mathcal {H}}_0([0,T];q)\) be the subspace consisting of all \(v\in {\mathcal {H}}([0,T];q)\) satisfying \(v(0) = v(T) = 0\).

Definition 7.10

Let \(X\in {\mathcal {A}}_g([0,T];q, \mu )\). For a curve \(v\in \mathcal H_0([0,T];q)\), the vector field along X given by \(V(t):=\Gamma (X)_0^t v(t)\) is called a tangent vector to \({\mathcal {A}}_g([0,T];q, \mu )\) at X. The tangent space to \({\mathcal {A}}_g ([0,T];q, \mu )\) at X is the set of all such tangent vectors, that is,

$$\begin{aligned} T_X {\mathcal {A}}_g([0,T];q, \mu ):= \left\{ \Gamma (X)_0^\cdot v(\cdot ): v\in {\mathcal {H}}_0([0,T];q) \right\} . \end{aligned}$$

Definition 7.11

By a variation (or deformation) of a diffusion \(X\in {\mathcal {A}}_g([0,T];q, \mu )\) along \(v\in {\mathcal {H}}_0([0,T];q)\), we mean a one-parameter family of diffusions \(\{X^v_\epsilon \}_{\epsilon \in (-\varepsilon ,\varepsilon )}\), where for each \(t\in [0,T]\), \(X^v_\epsilon (t)\) satisfies the following ODE

$$\begin{aligned} \frac{\partial }{\partial \epsilon }X^v_\epsilon (t) = \Gamma \big (X^v_\epsilon \big )_0^t v(t), \quad X^v_0(t) = X(t). \end{aligned}$$
(7.11)

The diffusion \(X\in {\mathcal {A}}_g([0,T];q, \mu )\) is called a stationary (or critical) point of \({\mathcal {S}}\), if the first variation \(\delta {\mathcal {S}}\) vanishes at X, i.e.,

$$\begin{aligned} \frac{{\textrm{d}}}{{\textrm{d}}\epsilon }\bigg |_{\epsilon =0} {\mathcal {S}}\big [X^v_\epsilon ;0,T\big ] = 0, \quad \text {for all } v\in {\mathcal {H}}_0([0,T];q). \end{aligned}$$
(7.12)

Remark 7.12

  1. (i)

    The variations of diffusions on manifolds, via differential equation (7.11), is standard in stochastic analysis on path spaces of Riemannian manifolds. See for example Driver (1992, Eq. (2.3)) and Hsu (1995, Theorem 4.1), where it is shown that Wiener measure is quasi-invariant under such variations. This kind of variations has some equivalent constructions. For instance, the previous two references also provided an approach by lifting to the frame bundle and projecting to the Euclidean space (a stochastic analog of Cartan’s development), while Fang and Malliavin (1993) provided an alternative perspective via Bismut connection.

  2. (ii)

    The stochastic variational problem (7.9)–(7.12) in the Euclidean context has also been familiar in stochastic optimal transport/control. See Sects. 7.3 and 7.4.4 for connections to those areas.

  3. (iii)

    Unlike the infinitesimal variation used in Definition 4.11 for studying symmetries of SDEs, the infinitesimal variation here in (7.11) needs to be a parallel vector field.

The following lemma is the key for establishing stochastic Hamilton’s principle. The first statement shows that the variation \(X^v_\epsilon \) is well defined on the path space \({\mathcal {A}}_g([0,T];q, \mu )\). The second one describes the infinitesimal changes of \(D_\nabla X^v_\epsilon \) with respect to the variation parameter \(\epsilon \). The proof of the latter is based on a geodesic approximation technique, which is originally due to Itô (1962).

Lemma 7.13

Given \(X\in {\mathcal {A}}_g([0,T];q, \mu )\) and \(v\in {\mathcal {H}}_0([0,T];q)\). We have

  1. (i)

    for each \(\epsilon \in (-\varepsilon ,\varepsilon )\), \(X^v_\epsilon \in {\mathcal {A}}_g([0,T];q, \mu )\); and

  2. (ii)

    for all \(t\in [0,T]\),

    $$\begin{aligned} \frac{D}{d\epsilon }\bigg |_{\epsilon =0} D_\nabla X^v_\epsilon (t) = \Gamma (X)_0^t \dot{v}(t) + \frac{1}{2}(QX)^{ij}(t) R\left( \Gamma (X)_0^t v(t),\partial _i\right) \partial _j, \end{aligned}$$
    (7.13)

    where \(\dot{v}(t) = \frac{{\textrm{d}}}{{\textrm{d}}t}v(t) \in T_{v(t)} T_{q} M \cong T_{q} M\), \(\frac{D}{d\epsilon }\) is the (classical) covariant derivative with respect to the parameter \(\epsilon \).

Proof

(i) Let \(\xi \) and \(\xi _\epsilon \) be the anti-development (Hsu 2002, Definition 2.3.1) of X and \(X^v_\epsilon \), respectively, with fixed initial frame \(r(0) \in O_{q}M\). Equivalently, for example, \(\xi \) is an \({\mathbb {R}}^d\)-valued diffusion related to X by the following SDEs (Hsu 2002, Section 2.3)

$$\begin{aligned}\left\{ \begin{aligned}&dX^i(t) = r_j^i(t) \circ d\xi ^j(t), \\&dr_j^i(t) = -\Gamma _{kl}^i(X(t)) r_j^l(t)r_m^k(t) \circ d\xi ^m(t). \end{aligned}\right. \end{aligned}$$

Applying the fact that \(\sum _{k=1}^d r_k^i r_k^j= g^{ij}\) (e.g., Kobayashi and Nomizu 1963, Proposition 1.5) and the condition \(QX(t) = \check{g}(X(t))\), we have

$$\begin{aligned} r_k^i(t) r_l^j(t) \delta ^{kl} = g^{ij}(X(t)) = (QX)^{ij}(t) = r_k^i(t) r_l^j(t) (Q\xi )^{kl}(t), \end{aligned}$$
(7.14)

and consequently, \(Q\xi \equiv {\textbf{I}}_d\). Meanwhile, it follows from Fang and Malliavin (1993, Section 3.5) (or Driver 1992, Theorem 5.1, Hsu 1995, Section 3) that

$$\begin{aligned} d\xi _\epsilon (t) = \exp \left( \epsilon \int _0^t \Omega \left( \left( r(0)^{-1} v\right) (s), \circ d\xi (s) \right) \right) \circ {\textrm{d}}\xi (t) + \epsilon d\left( r(0)^{-1} v\right) (t), \end{aligned}$$

where \(\Omega \) is the curvature form on the orthogonal frame bundle OM, taking values in \(\mathfrak {so}(d)\), and the frame r(0) is viewed as an isomorphism from \({\mathbb {R}}^d\) to \(T_{q} M\). It follows that \(Q\xi _\epsilon = Q\xi \equiv {\textbf{I}}_d\). For the reason similar to (7.14), we have \(QX^v_\epsilon (t) = {\check{g}}(X^v_\epsilon (t))\). The result follows. See (Driver 1992, Theorem 8.3) for a more elaborate proof.

(ii) Fix \(n,m \in {\mathbb {N}}_+\). Let \(0=t_0<t_1<\cdots <t_n=T\) be a division of the time interval [0, T], and let \(-\varepsilon =\epsilon _{m-}<\cdots<\epsilon _{-1}<0= \epsilon _0<\epsilon _1<\cdots <\epsilon _m=\varepsilon \) be one of the variation parameter interval \((-\varepsilon ,\varepsilon )\). Denote \(\Delta t_i:= t_i - t_{i-1}\). Consider the polygonal curve \(x^n = \{x^n(t)\}_{t\in [0,T]}\), which is an approximation of X made of minimizing geodesic segments joining \(X(t_{i-1})\) with \(X(t_i)\) for all \(1\le i\le n\). This is attainable by the geodesic completeness. We will construct an approximation scheme for the variational processes \(X^v_\epsilon \)’s.

For \(\epsilon \in [\epsilon _0,\epsilon _1]\), we construct the approximation \(x^n_\epsilon \) of \(X^v_\epsilon \) as follows. We extend each \(X(t_i)\), \(0\le i\le n\), to a geodesic

$$\begin{aligned} \gamma _0^{(i)}(\epsilon ) = \exp _{X(t_i)} \left( \epsilon \Gamma (x^n)_0^{t_i} v(t_i) \right) , \quad \epsilon \in [\epsilon _0,\epsilon _1]. \end{aligned}$$

Let \(x^n_\epsilon = \{x^n_\epsilon (t)\}_{t\in [0,T]}\) be the polygonal curve consisting of minimizing geodesic segments joining \(\gamma _0^{(i-1)}(\epsilon )\) with \(\gamma _0^{(i)}(\epsilon )\) for all \(1\le i\le n\).

Then, we construct \(x^n_\epsilon \) for \(\epsilon \in [\epsilon _j,\epsilon _{j+1}]\), \(1\le j\le m-1\), by induction. Suppose \(x^n_\epsilon \), \(\epsilon \in [\epsilon _{j-1},\epsilon _j]\), has been defined. Then, in particular, we have a curve \(x^n_{\epsilon _j}\). Extend each \(x^n_{\epsilon _j}(t_i)\), \(0\le i\le n\), to a geodesic by

$$\begin{aligned} \gamma _j^{(i)}(\epsilon ) = \exp _{x^n_{\epsilon _j}(t_i)} \left( (\epsilon -\epsilon _j) \Gamma \big (x^n_{\epsilon _j}\big )_0^{t_i} v(t_i) \right) , \quad \epsilon \in [\epsilon _j,\epsilon _{j+1}]. \end{aligned}$$

Let \(x^n_\epsilon \) be the polygonal curve consisting of minimizing geodesic segments joining \(\gamma _j^{(i-1)}(\epsilon )\) with \(\gamma _j^{(i)}(\epsilon )\) for all \(1\le i\le n\). In a similar way, we can define \(x^n_\epsilon \) for \(\epsilon \in [\epsilon _j,\epsilon _{j+1}]\), \(-m\le j\le -1\).

Now we have a family of polygonal curves \(\{x^n_\epsilon : \epsilon \in (-\varepsilon ,\varepsilon )\}\), which satisfies \(x^n_0 = x^n\) and

$$\begin{aligned} \frac{\partial ^{\textrm{sign}(\epsilon )}}{\partial \epsilon }\bigg |_{\epsilon =\epsilon _j} x^n_\epsilon (t_i) = \Gamma \big (x^n_{\epsilon _j}\big )_0^{t_i} v(t_i). \end{aligned}$$

As for each \(\epsilon \in (-\varepsilon ,\varepsilon )\) and \(1\le i\le n\), \(\{x^n_\epsilon (t)\}_{t\in [t_{i-1},t_i]}\) is a geodesic, the vector field

$$\begin{aligned} J(t):=\frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} x^n_\epsilon (t), \quad t\in [t_{i-1},t_i] \end{aligned}$$

is a Jacobi field along \(\{x^n(t)\}_{t\in [t_{i-1},t_i]}\). This leads to the following Jacobi equation

$$\begin{aligned} \frac{D^2}{dt^2} J(t) + R\left( J(t), \dot{x}^n(t) \right) \dot{x}^n(t) = 0, \quad t\in [t_{i-1},t_i], \end{aligned}$$
(7.15)

with boundary values

$$\begin{aligned} J(t_{i-1}) = \Gamma (x^n)_0^{t_{i-1}} v(t_{i-1}), \quad J(t_i) = \Gamma (x^n)_0^{t_i} v(t_i). \end{aligned}$$
(7.16)

Since the connection is torsion-free, we can exchange the covariant derivative and standard derivative to have

$$\begin{aligned} \frac{D}{dt} J(t_{i-1})= & {} \frac{D}{dt} \frac{\partial }{\partial \epsilon } x^n_\epsilon (t) \bigg |_{\epsilon =0,t=t_{i-1}} = \frac{D}{d\epsilon } \frac{\partial }{\partial t} x^n_\epsilon (t) \bigg |_{\epsilon =0,t=t_{i-1}} \nonumber \\= & {} \frac{D}{d\epsilon }\bigg |_{\epsilon =0} \dot{x}^n_\epsilon (t_{i-1}), \end{aligned}$$
(7.17)

On the other hand, Taylor’s theorem yields

$$\begin{aligned} \Gamma (x^n)_{t_i}^{t_{i-1}} J(t_i)= & {} J(t_{i-1}) + \frac{D}{dt} J(t_{i-1}) \Delta t_i \nonumber \\{} & {} + \frac{1}{2}\frac{D^2}{dt^2} J(t_{i-1}) (\Delta t_i)^2 + o\left( (\Delta t_i)^2 \right) . \end{aligned}$$
(7.18)

Combining (7.15)–(7.18), we have

$$\begin{aligned} \frac{D}{d\epsilon }\bigg |_{\epsilon =0} \dot{x}^n_\epsilon (t_{i-1})= & {} \Gamma (x^n)_0^{t_{i-1}} \frac{v(t_i) - v(t_{i-1})}{\Delta t_i} \\{} & {} + \frac{1}{2} R\left( \Gamma (x^n)_0^{t_{i-1}} v(t_{i-1}), \dot{x}^n(t_{i-1}) \right) \dot{x}^n(t_{i-1}) \Delta t_i + o\left( \Delta t_i \right) . \end{aligned}$$

A standard limit theorem yields the result (ii). \(\square \)

Remark 7.14

  1. (i)

    The constraint \(QX(t) = {\check{g}}(X(t))\) in (7.10) looks strong. A possibly better viewpoint is to force all diffusions under consideration to have the same nondegenerate diffusion tensor a, i.e., \(QX(t) = a(X(t))\). Then, the inverse of a defines a Riemannian metric g, cf. Ikeda and Watanabe (1989, Section V.4). As can be seen from the first part of the above proof, the constraint of fixing the diffusion tensor is a natural one in the literature of variational calculus on the path space. An intuitive reason for this constraint is to assure that the induced measures are equivalent, which is necessary for Eq. (7.11) to be well-posed, cf. Driver (1992). The assumption of Levi–Civita connection \(\nabla \) may be relaxed to that the connection \(\nabla \) is g-compatible and torsion skew symmetric (Driver 1992, Definition 8.1), in which case the second assertion of this lemma needs to add the effect of torsion.

  2. (ii)

    One may expect from the limits of (7.15) and (7.16) that there is a “stochastic” Jacobi equation with two boundary values describing the difference between a diffusion and an “infinitesimally close” diffusion, cf. Arnaudon and Thalmaier (1998).

For a smooth function f on TM, we denote by \(d_{\dot{x}} f\) the differential of f with respect to the coordinates \((\dot{x}^i)\). Since \(T_{(x,\dot{x})} T_x M \cong T_x M\), \(d_{\dot{x}} f\) is treated as a 1-form on \(T_x M\) and

$$\begin{aligned} d_{\dot{x}} f = \frac{\partial f}{\partial \dot{x}^i} d x^i. \end{aligned}$$
(7.19)

We call \(d_{\dot{x}} f\) the vertical differential of f. Regarding the differential with respect to the coordinates \((x^i)\), we introduce the horizontal differential which depends on the connection \(\nabla \), by

$$\begin{aligned} d_x f = \left( \frac{\partial f}{\partial x^i} - \Gamma _{ij}^k \dot{x}^j \frac{\partial f}{\partial \dot{x}^k} \right) d x^i. \end{aligned}$$
(7.20)

It is easy to check that both definitions (7.19) and (7.20) are invariant under change of coordinates. In fact, by the classical theory (Saunders 1989, Section 3.5 and Example 4.6.7), we know that the connection \(\nabla \) can uniquely determine a TTM-valued 1-form on TM horizontal over M, which is given in local coordinates by

$$\begin{aligned} \Gamma = dx^i \otimes \left( \frac{\partial }{\partial x^i} - \Gamma _{ij}^k \dot{x}^j \frac{\partial }{\partial \dot{x}^k} \right) . \end{aligned}$$

Hence, the horizontal differential is \(d_x f = \Gamma (df)\), where df is the total differential of f. Given a vector field V on M, \(f\circ V: q\mapsto f(V_q)\) is a smooth function on V. Then, it is easy to check that

$$\begin{aligned} d (f\circ V) = d_x f \circ V + (d_{\dot{x}} f \circ V)(\nabla _{\partial _i} V) dx^i. \end{aligned}$$
(7.21)

The following integration-by-parts formula will be used. Its proof is straightforward from definitions of stochastic integrals and mean derivatives, cf. Cruzeiro and Zambrini (1991, Lemma 4.4).

Lemma 7.15

Let \(X = \{X(t)\}_{t\in [0,T]}\) be a real-valued continuous semimartingale such that DX exists, let f be a real-valued continuous process on [0, T], of finite variation. Then,

$$\begin{aligned} {\textbf{E}}\int _0^T X(t) \dot{f}(t) {\textrm{d}}t = E\left[ f(T)X(T)-f(0)X(0) \right] - {\textbf{E}}\int _0^T f(t) DX(t) {\textrm{d}}t. \end{aligned}$$

Now we are in position to present the stochastic version of Hamilton’s principle.

Theorem 7.16

(Stochastic Hamilton’s principle) Let \(L_0\) be a regular Lagrangian on \({\mathbb {R}}\times TM\). A diffusion \(X\in {\mathcal {A}}_g([0,T];q, \mu )\) is a stationary point of \({\mathcal {S}}\), if and only if X satisfies the following stochastic Euler–Lagrange (S-EL) equation

$$\begin{aligned} \frac{\overline{{\textbf{D}}}}{dt} \big ( d_{\dot{x}} L_0\left( t, X(t), D_\nabla X(t) \right) \big ) = d_x L_0\left( t, X(t), D_\nabla X(t) \right) , \end{aligned}$$
(7.22)

where \(\frac{\overline{{\textbf{D}}}}{dt}\) is the damped mean covariant derivative with respect to X.

We remark that since \(QX(t) = {\check{g}}(X(t))\), the operator \(\frac{\overline{{\textbf{D}}}}{dt}\) in (7.22) is just the one of (7.7). The unknown in (7.22) is the process X, so the conditions \(X(0) = q\) and \({\textbf{P}}\circ (X(T))^{-1} = \mu \), indicated in the assumption \(X\in {\mathcal {A}}_g([0,T];q, \mu )\), can be regarded as boundary conditions of (7.22).

Proof

Denote \(V(t) = \Gamma (X)_0^t v(t)\). It follows from (7.13) and (7.21) that

$$\begin{aligned} \begin{aligned} \frac{{\textrm{d}}}{{\textrm{d}}\epsilon }\bigg |_{\epsilon =0} {\mathcal {S}}[X^v_\epsilon ;0,T]&= {\textbf{E}}\int _0^T \frac{{\textrm{d}}}{{\textrm{d}}\epsilon }\bigg |_{\epsilon =0} L_0\left( t, X^v_\epsilon (t), D_\nabla X^v_\epsilon (t) \right) {\textrm{d}}t \\&= {\textbf{E}}\int _0^T \left[ d_x L_0 \left( \frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0}X^v_\epsilon (t) \right) + d_{\dot{x}} L_0 \left( \frac{D}{d\epsilon }\bigg |_{\epsilon =0} D_\nabla X^v_\epsilon (t) \right) \right] {\textrm{d}}t \\&= {\textbf{E}}\int _0^T \left[ d_x L_0 \left( V(t)\right) + d_{\dot{x}} L_0 \left( \Gamma (X)_0^t \dot{v}(t) \right) \right. \\&\left. \quad + \frac{1}{2} (QX)^{ij}(t) d_{\dot{x}} L_0 \left( R(V(t),\partial _i) \partial _j \right) \right] {\textrm{d}}t. \end{aligned}\nonumber \\ \end{aligned}$$
(7.23)

By Lemmas 7.6.(ii) and 7.15 and the fact that \(v(0) = v(T) = 0\), we have

$$\begin{aligned}&{\textbf{E}}\int _0^T d_{\dot{x}} L_0 \left( \Gamma (X)_0^t \dot{v}(t) \right) {\textrm{d}}t \nonumber \\&\quad = {\textbf{E}}\int _0^T \Gamma (X)_t^0 (d_{\dot{x}} L_0) \left( \dot{v}(t) \right) {\textrm{d}}t \nonumber \\&\quad = -{\textbf{E}}\int _0^T \lim _{\epsilon \rightarrow 0} {\textbf{E}}\left[ \left( \frac{\Gamma (X)_{t+\epsilon }^0 (d_{\dot{x}} L_0) - \Gamma (X)_t^0 (d_{\dot{x}} L_0)}{\epsilon }\right) \left( v(t) \right) \Bigg | {\mathcal {P}}_t \right] {\textrm{d}}t \nonumber \\&\quad = -{\textbf{E}}\int _0^T \lim _{\epsilon \rightarrow 0} {\textbf{E}}\left[ \left( \frac{\Gamma (X)_{t+\epsilon }^t (d_{\dot{x}} L_0) - d_{\dot{x}} L_0}{\epsilon } \right) \left( \Gamma (X)_0^t v(t) \right) \Bigg | {\mathcal {P}}_t \right] {\textrm{d}}t\nonumber \\&\quad = -{\textbf{E}}\int _0^T \lim _{\epsilon \rightarrow 0} {\textbf{E}}\left[ \frac{\Gamma (X)_{t+\epsilon }^t (d_{\dot{x}} L_0) - d_{\dot{x}} L_0}{\epsilon }\Bigg | {\mathcal {P}}_t \right] \left( \Gamma (X)_0^t v(t) \right) {\textrm{d}}t\nonumber \\&\quad = -{\textbf{E}}\int _0^T \frac{{\textbf{D}}}{dt} (d_{\dot{x}} L_0) \left( V(t) \right) {\textrm{d}}t. \end{aligned}$$
(7.24)

Thus, by Lemma 7.8.(iii),

$$\begin{aligned} \begin{aligned} \frac{{\textrm{d}}}{{\textrm{d}}\epsilon }\bigg |_{\epsilon =0} {\mathcal {S}}[X^v_\epsilon ;0,T]&= {\textbf{E}}\int _0^T \left[ d_x L_0 \left( V(t)\right) - \frac{{\textbf{D}}}{dt} (d_{\dot{x}} L_0) \left( V(t) \right) \right. \\&\left. \quad + \frac{1}{2} (QX)^{ij}(t) R(d_{\dot{x}} L_0,\partial _j) \partial _i \left( V(t) \right) \right] {\textrm{d}}t \\&= {\textbf{E}}\int _0^T \left( d_x L_0 - \frac{\overline{{\textbf{D}}}}{dt} (d_{\dot{x}} L_0)\right) \left( V(t) \right) {\textrm{d}}t. \end{aligned} \end{aligned}$$

The arbitrariness of v yields the desired result. \(\square \)

Remark 7.17

  1. (i)

    For a special class of Lagrangians in the Euclidean context, the stochastic Euler–Lagrange equation (7.22) has been established in Cruzeiro and Zambrini (1991, Subsection 5.1) where they called it stochastic Newton equation, see also Zambrini (2015). For general Lagrangians on Riemmannian manifolds, Eq. (7.22) is new (to the authors’ best knowledge). See Sect. 7.3 for discussions of a special case.

  2. (ii)

    The second author and his collaborator formulated a weak stochastic Euler–Lagrange equation in Lassalle and Zambrini (2016). They mean by “weak” that their stochastic Euler–Lagrange equation holds in the sense of stochastic integrals. The main differences between their formulation and ours is that we get rid of the stochastic integral (martingale) part in our equation since we use mean derivatives instead of stochastic differentials.

7.3 An Inspirational Example: Schrödinger’s Problem

The inspirational example of stochastic Hamiltonian mechanics presented in Sect. 6.3 also provides an example of our stochastic Lagrangian mechanics. Consider the following Lagrangian defined on \({\mathbb {R}}\times TM\):

$$\begin{aligned} L_0(t,x,\dot{x}) = \frac{1}{2} |\dot{x}-b(t,x)|^2 - F(t,x), \end{aligned}$$
(7.25)

where b is a given time-dependent vector field on M. It actually relates to the second-order Hamiltonian H in (6.26) via the second-order Legendre transform, which will be considered in Sect. 7.4. For such Lagrangian, we can directly figure out the relation between stochastic Euler–Lagrange equation (7.22) and Hamilton–Jacobi–Bellman equation. We denote by \(I_0^T(M)\) the set all M-valued diffusion processes over time interval [0, T].

Theorem 7.18

(S-EL & HJB) Let \(L_0\) be as in (7.25). If \(X\in I_0^T(M)\) satisfies

$$\begin{aligned} D_\nabla X(t) = \nabla S(t,X(t)) + b(t,X(t)) \end{aligned}$$
(7.26)

for a function \(S:{\mathbb {R}}\times M\rightarrow {\mathbb {R}}\), then X is a solution of the stochastic Euler–Lagrange equation (7.22) if and only if S solves the following Hamilton–Jacobi–Bellman equation

$$\begin{aligned} \frac{\partial S}{\partial t} + \langle b, \nabla S\rangle + \frac{1}{2} |\nabla S|^2 + \frac{1}{2} \Delta S + F = f, \end{aligned}$$
(7.27)

with f a function depending only on t.

Proof

For a function g on \({\mathbb {R}}\times M\), we will denote by dg the exterior differential of g on M, i.e., with respect to coordinates \((x^i)\). Condition (7.26) can be rewritten in local coordinates as

$$\begin{aligned} \dot{x} = \nabla S+ b. \end{aligned}$$
(7.28)

Then, it is clear that

$$\begin{aligned} d_{\dot{x}} L_0 = \frac{\partial L_0}{\partial \dot{x}^i} d x^i = g_{ij}(\dot{x}^j - b^j) d x^i = d S. \end{aligned}$$
(7.29)

Since \(\nabla g = 0\), we use Leibniz’s rule to derive

$$\begin{aligned} d_x L_0(\partial _k)&= \frac{1}{2} d[g(\dot{x}-b,\dot{x}-b)](\partial _k) - d F(\partial _k) = -g\left( \nabla _{\partial _k} b, \dot{x}-b \right) - d F(\partial _k)\nonumber \\&= -dS\left( \nabla _{\partial _k} b \right) - d F(\partial _k). \end{aligned}$$
(7.30)

Now we take the differential with respect to x to the HJB equation (7.27). Obviously,

$$\begin{aligned} d\frac{\partial S}{\partial t}= \frac{\partial }{\partial t}dS = \frac{\partial }{\partial t} d_{\dot{x}} L_0. \end{aligned}$$

For the second term,

$$\begin{aligned} \begin{aligned} d( \langle b, \nabla S\rangle )(\partial _k)&= d[dS(b)](\partial _k) = \left( \nabla _{\partial _k}dS\right) (b) + dS\left( \nabla _{\partial _k}b\right) \\&= \nabla ^2_{\partial _k,b} S + dS\left( \nabla _{\partial _k}b\right) = \left( \nabla _b dS\right) (\partial _k) + dS\left( \nabla _{\partial _k}b\right) . \end{aligned} \end{aligned}$$

For the third term, we use again \(\nabla g = 0\). Then, we have

$$\begin{aligned} \begin{aligned} \frac{1}{2} d\left( |\nabla S|^2 \right) (\partial _k)&= \frac{1}{2}d[dS\otimes dS({\check{g}})](\partial _k) =\left( (\nabla _{\partial _k} dS) \otimes dS\right) ({\check{g}}) \\&= \left( \nabla _{\partial _k} dS\right) (\nabla S) = \left( \nabla _{\nabla S} dS\right) (\partial _k). \end{aligned} \end{aligned}$$

For the fourth term, in the same way we have

$$\begin{aligned} \begin{aligned}&d(\Delta S)(\partial _k)\\ {}&\quad = d\left( g^{ij} \nabla ^2_{\partial _i,\partial _j} S\right) (\partial _k) = d\left( \nabla ^2 S({\check{g}}) \right) (\partial _k) = \left( \nabla _{\partial _k} \nabla ^2 S\right) ({\check{g}}) = g^{ij} \nabla ^3_{\partial _k,\partial _i, \partial _j} S \\&\quad = g^{ij} \left[ \left( \nabla ^3_{\partial _k,\partial _i, \partial _j} S - \nabla ^3_{\partial _i,\partial _k, \partial _j} S \right) + \left( \nabla ^3_{\partial _i,\partial _k, \partial _j} S - \nabla ^3_{\partial _i,\partial _j, \partial _k} S \right) + \nabla ^3_{\partial _i,\partial _j, \partial _k} S \right] \\&\quad = g^{ij} \left[ \left( \nabla ^2_{\partial _k,\partial _i} dS - \nabla ^2_{\partial _i,\partial _k} dS \right) (\partial _j) + 0 + \nabla ^2_{\partial _i,\partial _j} dS (\partial _k) \right] \\&\quad = g^{ij} \left[ R(\partial _k,\partial _i) d S (\partial _j) + \nabla ^2_{\partial _i,\partial _j} dS (\partial _k) \right] \\&\quad = g^{ij} \left[ - R(d S, \partial _j)\partial _i(\partial _k) + \nabla ^2_{\partial _i,\partial _j} dS (\partial _k) \right] = [\Delta dS - \textrm{Ric}(dS)](\partial _k) \\ {}&\quad = \Delta _{\textrm{LD}} (dS) (\partial _k). \end{aligned} \end{aligned}$$

Combining these together and applying (7.26)–(7.30) as well as (7.7), we obtain

$$\begin{aligned} \begin{aligned}&d\left( \frac{\partial S}{\partial t} + \langle b, \nabla S\rangle + \frac{1}{2} |\nabla S|^2 + \frac{1}{2} \Delta S + F \right) (\partial _k) \\&\quad = \left( \frac{\partial }{\partial t} + \nabla _{b+\nabla S} + \frac{1}{2}\Delta _{\textrm{LD}} \right) (dS)(\partial _k) + dS\left( \nabla _{\partial _k}b\right) + dF(\partial _k) \\&\quad = \frac{\overline{{\textbf{D}}}}{dt} (dS)(\partial _k) + dS\left( \nabla _{\partial _k}b\right) + dF(\partial _k) = \left[ \frac{\overline{{\textbf{D}}}}{dt} (d_{\dot{x}} L_0) - d_x L_0\right] (\partial _k). \end{aligned} \end{aligned}$$

The result follows. \(\square \)

Remark 7.19

Equation (7.29) gives the relation between Lagrangians and second-order Hamilton’s principal functions. It is valid for more general Lagrangians, see Remark 7.23.(i).

Theorem 7.18 strongly suggests some relations between stochastic Lagrangian (and also Hamiltonian) mechanics and Schrödinger’s problem in the reinterpretation of optimal transport. In the setting of the latter (see, e.g., Cruzeiro et al. 2000; Léonard 2014; Léonard et al. 2014), there is a given reversible positive measure \({\textbf{R}}\) on the path space \({\mathcal {C}}_0^T = C([0,T], M)\), called reference measure, as well as two probability distributions \(\mu _0,\mu _T\in {\mathcal {P}}(M)\). Schrödinger’s problem aims to minimize the following relative entropy:

$$\begin{aligned} H({\textbf{P}}|{\textbf{R}}) = {\left\{ \begin{array}{ll} \int _{{\mathcal {C}}_0^T} \log \left( \frac{{\textrm{d}}{\textbf{P}}}{{\textrm{d}}{\textbf{R}}} \right) {\textrm{d}}{\textbf{P}}, &{}\quad {\textbf{P}}\ll {\textbf{R}}, \\ +\infty , &{}\quad \text {otherwise}, \end{array}\right. } \end{aligned}$$
(7.31)

over all probability measures \({\textbf{P}}\) on \({\mathcal {C}}_0^T\) such that \(\mu _0,\mu _T\) are the initial and final time marginal distributions of \({\textbf{P}}\), i.e., \({\textbf{P}}_0 = \mu _0\) and \({\textbf{P}}_T = \mu _T\), where \({\textbf{P}}_t:= {\textbf{P}}\circ (X(t))^{-1}\) is the time marginal distribution of \({\textbf{P}}\) and \(X(t): {\mathcal {C}}_0^T\rightarrow M, X(t,\omega ) = \omega (t)\) is the coordinate mapping. Denote, respectively, by \(X_{\textbf{R}}\) and \(X_{\textbf{P}}\), the coordinate process X under the measure \({\textbf{R}}\) and \({\textbf{P}}\). Then, Girsanov theorem implies that (Léonard 2012a, Theorem 1) a necessary condition for the finite entropy condition \(H({\textbf{P}}|{\textbf{R}}) < \infty \) is \(QX_{\textbf{P}}= QX_{\textbf{R}}\), \({\textbf{P}}\)-a.s.. Furthermore, if \({\textbf{R}}\) is a diffusion measure, i.e., \(X_{\textbf{R}}\) is a diffusion process, then a similar application of Girsanov theorem yields that a necessary condition for \(H({\textbf{P}}|\textbf{R}) < \infty \) is that \({\textbf{P}}\) is also a diffusion measure and there exists a time-dependent vector field w such that

$$\begin{aligned} \left( DX_{\textbf{P}}(t), QX_{\textbf{P}}(t) \right) = \left( DX_{\textbf{R}}(t) + w(t, X(t)), QX_{\textbf{R}}(t) \right) ,\quad \forall t\in [0,T], \text { a.s.}. \end{aligned}$$

The solution \({\textbf{P}}\) of Schrödinger’s problem, i.e., minimizing (7.31), is related to the reference measure \({\textbf{R}}\) by a time-symmetric version of Doob’s h-transform (Léonard 2014, Section 3). Its coordinate process \(X_{\textbf{P}}\) is sometimes called a Schrödinger bridge or Schrödinger process. When the reference measure \({\textbf{R}}\) is Markovian, i.e., the law of a Markov process, the solution process \(X_{\textbf{P}}\) is also called a reciprocal (Bernstein 1932; Jamison 1975) or Bernstein process (Cruzeiro et al. 2000; Cruzeiro and Vuillermot 2015).

If the manifold M is endowed with a Riemannian metric g, and the reference coordinate process \(X_{\textbf{R}}\) has generator

$$\begin{aligned} A^{X_{\textbf{R}}} = \langle b, \nabla \rangle + \textstyle {{\frac{1}{2}}} \Delta + F, \end{aligned}$$

for some time-dependent vector field b on M, then the density \(\mu (t,x) = \frac{{\textrm{d}}\,{\textbf{P}}^*_t}{{\textrm{d}}\,\textrm{Vol}}(x)\) of the minimizer \({\textbf{P}}^*\) of (7.31) solves the following Kolmogorov forward equation

$$\begin{aligned} \left\{ \begin{aligned}&\frac{\partial }{\partial t} \mu (t,x) + \textrm{div}\,\left[ \mu (\nabla S + b) \right] - \frac{1}{2} \Delta \mu (t,x) = 0, \quad (t,x)\in (0,1]\times M, \\&\mu (0,x) = \mu _0(x), \quad x\in M. \end{aligned}\right. \qquad \quad \end{aligned}$$
(7.32)

where S solves the HJB equation (7.27) with \(f \equiv 0\), or (6.28).

Moreover, an analog of Benamou–Brenier formula was derived (see Léonard 2014). Consider the problem of minimizing the average action

$$\begin{aligned} \int _0^T \int _{M} \left( \frac{1}{2}|v(t,x) - b(t,x)|^2 - F(t,x) \right) \rho (t,{\textrm{d}}x) {\textrm{d}}t \end{aligned}$$
(7.33)

among all pairs \((\rho ,v)\), where is \(\rho =(\rho (t))_{t\in [0,T]}\) is a measurable path in \({\mathcal {P}}(M)\), \(v=(v(t))_{t\in [0,T]}\) is a measurable time-dependent vector field and the following constraints are satisfied (in the weak sense of PDEs):

$$\begin{aligned} \left\{ \begin{aligned}&\frac{\partial }{\partial t} \rho + \textrm{div}\,\left( \rho v \right) - \frac{1}{2} \Delta \rho = 0, \\&\rho (0) = \mu _0, \ \rho (T) = \mu _T, \end{aligned}\right. \end{aligned}$$
(7.34)

The relation between \(\rho \) in (7.33) and \({\textbf{P}}\) in (7.31) is just that \(\rho \) is the time marginal of \({\textbf{P}}\), namely,

$$\begin{aligned} \rho (t) = {\textbf{P}}_t = {\textbf{P}}\circ (X(t))^{-1}. \end{aligned}$$
(7.35)

The minimizer of (7.33) is the pair \((\mu ,\nabla S+b)\) where \(\mu \) solves (7.32) and S solves (6.28).

These results are summarized in the following equivalent relations:

$$\begin{aligned} \begin{aligned}&\inf \left\{ H({\textbf{P}}|{\textbf{R}}): {\textbf{P}}\in {\mathcal {P}}\big ({\mathcal {C}}_0^T\big ), {\textbf{P}}_0 = \mu _0, {\textbf{P}}_T = \mu _T \right\} - H\left( \mu _0|{\textbf{R}}_0 \right) \\&\quad = \inf \left\{ \int _0^T \int _{M} \left( \frac{1}{2}|v(t,x) - b(t,x)|^2 - F(t,x) \right) \rho (t,{\textrm{d}}x) {\textrm{d}}t: (\rho ,v) \text { satisfies (7.34)} \right\} \\&\quad = \int _0^T \int _{M} \left( \frac{1}{2}|\nabla S(t,x)|^2 - F(t,x) \right) \mu (t,{\textrm{d}}x) {\textrm{d}}t. \end{aligned}\nonumber \\ \end{aligned}$$
(7.36)

Now if the coordinate process \(X_{\textbf{R}}\) under the reference measure \({\textbf{R}}\) is a nondegenerate M-valued diffusion in \(I_0^T(M)\) which is diffusion-homogeneous, then assigning such a reference measure \({\textbf{R}}\) amounts to assigning a pair \((b_{\textbf{R}},g_{\textbf{R}})\in \Gamma (TM \otimes \textrm{Sym}^2(T^* M))\), where \(g_{\textbf{R}}\) is a positive-definite symmetric (0, 2)-tensor, i.e., a Riemannian metric tensor. More precisely, we let \(A^{X_{\textbf{R}}} = ({\mathfrak {b}}, a) + F\) be the generator of \(X_{\textbf{R}}\). Since \(X_{\textbf{R}}\) is nondegenerate and diffusion-homogeneous, a is a time-independent nondegenerate symmetric (2, 0)-tensor field. Let \(g_{\textbf{R}} = {\hat{a}}\) be the inverse of a, so that \(g_{\textbf{R}}\) is a Riemannian metric tensor. We then equip the Riemannian manifold \((M, g_{\textbf{R}})\) with the associated Levi–Civita connection \(\nabla \). The isomorphism (2.19) implies that

$$\begin{aligned} A^{X_{\textbf{R}}} = b_{\textbf{R}}^i \partial _i + \textstyle {{\frac{1}{2}}} g_\textbf{R}^{ij} \nabla ^2_{\partial _i,\partial _j} + F = \langle b_{\textbf{R}}, \nabla \rangle + \textstyle {{\frac{1}{2}}} \Delta + F, \end{aligned}$$

where \(b_{\textbf{R}}\) is the time-dependent vector field given by \(b_{\textbf{R}}^i = ( {\mathfrak {b}}^i + \textstyle {{\frac{1}{2}}} g_{\textbf{R}}^{jk} \Gamma ^i_{jk} )\), and \(\nabla \) and \(\Delta \) are the gradient and Laplace–Beltrami operator with respect to \(g_{\textbf{R}}\), respectively.

We set that \({\textbf{P}}\) is a diffusion measure and \(QX_{\textbf{P}}= QX_{\textbf{R}} = {\check{g}}_{\textbf{R}}\), \({\textbf{P}}\)-a.s., which is a necessary condition for \(H({\textbf{P}}|{\textbf{R}}) < \infty \). Then, by (3.4), the generator of \(X_{\textbf{P}}\) is given by

$$\begin{aligned} (DX_{\textbf{P}}(t), QX_{\textbf{P}}(t) ) = (D_\nabla X_{\textbf{P}})^i(t) \partial _i|_{X(t)} + \textstyle {{\frac{1}{2}}} \Delta |_{X(t)}. \end{aligned}$$

From (7.34) and (7.35), one can see that \(v(t,X(t)) = D_\nabla X_{\textbf{P}}(t)\) and the action (7.33) equals to

$$\begin{aligned} {\textbf{E}}_{\textbf{P}}\int _0^T \left( \frac{1}{2}|D_\nabla X(t) - b_\textbf{R}(t,X(t))|^2 - F(t,X(t)) \right) {\textrm{d}}t. \end{aligned}$$
(7.37)

So the minimizing problem turns into minimizing the action (7.37) over all diffusion measures \({\textbf{P}}\in {\mathcal {P}}({\mathcal {C}}_0^T)\) with \({\textbf{P}}_0 = \mu _0\), \({\textbf{P}}_T = \mu _T\) and \(QX_{\textbf{P}}= {\check{g}}_\textbf{R}\), \({\textbf{P}}\)-a.s.. If \(\mu _0 = \delta _{q}\) and \(\mu _T = \mu \), this brings us back to our stochastic variational problem, that is, to minimize the action functional \({\mathcal {S}}\) in (7.9) over \({\mathcal {A}}_{g_{\textbf{R}}}([0,T];q, \mu )\), with Lagrangian \(L_0(t, x,\dot{x}) = \frac{1}{2} |\dot{x}-b_{\textbf{R}}(t,x)|^2 - F(t,x)\). Note that in this case, since \({\textbf{P}}_0 = \mu _0\) is Dirac, the relative entropy in (7.31) and \(H(\mu _0|{\textbf{R}}_0)\) are always infinite, while their difference \(H({\textbf{P}}|{\textbf{R}}) - H(\mu _0|{\textbf{R}}_0)\) can be finite as in (7.36). Moreover, by Theorem 7.16 and 7.18, a necessary condition for \(X_{\textbf{P}}\) to be the minimizer of \({\mathcal {S}}\) is that \(X_{\textbf{P}}\) satisfies (7.26) and (7.27), which coincides with (7.32).

Remark 7.20

  1. (i)

    Compared to the Lagrangian (7.25) used here for addressing Schrödinger’s problem, there is another type of Lagrangians used in the Euclidean version of quantum mechanics in Cruzeiro and Zambrini (1991, Eq. (5.4)). The latter has an additional term of divergence of b, which helps to express part of the action functional as a Stratonovich integral. The stochastic Euler–Lagrange equation (7.22) applied to their Lagrangians recovers the equations of motion in Cruzeiro and Zambrini (1991, Theorem 5.3).

  2. (ii)

    In the seminal paper (Otto 2001), F. Otto provided a geometric perspective for numerous PDEs by introducing a Riemannian structure in the Wasserstein space. It is known as Otto’s calculus. A similar idea can ascend to V.I. Arnold, who established a geometric framework for hydrodynamics by studying the Riemannian nature of the infinite-dimensional group of diffeomorphisms (Arnold and Khesin 2021). The recent paper (Gentil et al. 2020) formulated Schrödinger’s problem via Otto calculus, where the equation of motion is given by an infinite-dimensional Newton equation, cf. Khesin et al. (2021) and von Renesse (2012) on related matters. All these works can be called a “geometrization” of (stochastic) dynamics. In contrast, the present framework can be called a “stochastization” of geometric mechanics. The difference and relations between our framework and theirs are similar to those between two ways of producing HJ equations for quantum mechanics mentioned in the introduction. More precisely, while (second-order) HJB equations play a key role in our framework, various HJ equations with density-dependent potential terms were derived by them (see Gentil et al. 2020, Corollary 23; Khesin et al. 2021, Proposition 2.4).

7.4 Second-Order Legendre Transform

7.4.1 From \({\mathcal {T}}^{S*} M\) to \({\mathcal {T}}^S M\) and Back

Let us fix a linear connection \(\nabla \) on M. Here, for simplicity, we consider time-independent Hamiltonians and Lagrangians.

We first produce second-order Lagrangians from second-order Hamiltonians. To this end, we first reduce the second-order Hamiltonian to a classical one. Given a time-independent second-order Hamiltonian \(H: {\mathcal {T}}^{S*} M\rightarrow {\mathbb {R}}\), its \(\nabla \)-reduction is the classical Hamiltonian \(H_0 = H\circ {\hat{\iota }}^*_\nabla : T^* M\rightarrow {\mathbb {R}}\), as in (6.42). If \(H_0\) is hyperregular (see Abraham and Marsden 1978, Section 3.6), then its fiber derivative \({\textbf {F}}H_0: T^* M\rightarrow TM\), which is given in canonical coordinates by \(\dot{x}^i = \frac{\partial H_0}{\partial p_i}\), is a diffeomorphism and defines the classical Legendre transform (Abraham and Marsden 1978, Section 3.6):

$$\begin{aligned} L_0(x, \dot{x}) = p_i \dot{x}^i - H_0(x, p) = p_i \dot{x}^i - H\left( x, p, {\hat{o}} \right) , \end{aligned}$$
(7.38)

where \(({\hat{o}}_{jk})\) is a family of auxiliary variables introduced in (6.43). Then, we lift \(L_0\) to an admissible second-order Lagrangian \(L: {\mathcal {T}}^S M\rightarrow {\mathbb {R}}\) as in Definition 7.9, that is, \(L = L_0 \circ \varrho _\nabla \). Combining (7.38) with (7.8), the relation between L and H is

$$\begin{aligned} L(x, Dx, Qx)= & {} p_i D_\nabla x^i - H\left( x, p, {\hat{o}} \right) \nonumber \\= & {} p_i D^ix + \textstyle {\frac{1}{2}} {\hat{o}}_{jk} Q^{jk} x - H(x, p, {\hat{o}}). \end{aligned}$$
(7.39)

We call (7.39) the second-order Legendre transform. In particular, if we restrict the admissible second-order Lagrangian L to the subbundle of \({\mathcal {T}}^S M\) with coordinate constraint \(Q^{jk} x = g^{jk}(x)\) for some symmetric (2, 0)-tensor field g [which is just the condition in (7.10)], and let H be \((g,\nabla )\)-canonical, then by (6.45), we have

$$\begin{aligned} L(x, Dx, Qx) = p_i D^ix + \textstyle {\frac{1}{2}} o_{jk} Q^{jk} x - H(x, p, o). \end{aligned}$$
(7.40)

Consequently, we can find the relation between second-order Hamilton’s principal functions and action functionals. By (6.41) and (7.40),

$$\begin{aligned} {\textbf{D}}_tS = L(t, x, Dx, Qx) = L_0(t, x, D_\nabla x). \end{aligned}$$

One concludes, from Dynkin’s formula, that for an M-valued diffusion \(X\in {\mathcal {A}}_g([0,T];q, \mu )\),

$$\begin{aligned} {\textbf{E}}S(T, X(T)) - S(0, q) = {\textbf{E}}\int _0^T L_0\left( t, X(t), D_\nabla X(t) \right) {\textrm{d}}t = {\mathcal {S}}[X;0,T], \end{aligned}$$

and

$$\begin{aligned} S(t,x) - S(0, q)= & {} {\textbf{E}}_{(t,x)} [S(t, X(t)) - S(0, X(0))] \\= & {} {\textbf{E}}_{(t,x)} \int _0^t L_0\left( s, X(s), D_\nabla X(s) \right) {\textrm{d}}s, \end{aligned}$$

where \({\textbf{E}}_{(t,x)}\) is the conditional expectation \({\textbf{E}}(\cdot |X(t)=x)\). These mean that the action functional is the expectation of second-order Hamilton’s principal function (up to an undetermined constant), while the second-order Hamilton’s principal function is the conditional expectation version of action functional.

Conversely, let us be given an admissible second-order Lagrangian \(L: {\mathcal {T}}^S M\rightarrow {\mathbb {R}}\) which is the \(\nabla \)-lift of a classical Lagrangian \(L_0: T M\rightarrow {\mathbb {R}}\). If \(L_0\) is hyperregular, then its fiber derivative

$$\begin{aligned} {\textbf {F}}L_0: T M\rightarrow T^*M, \quad (x,\dot{x})\mapsto (x,d_{\dot{x}}L_0), \end{aligned}$$
(7.41)

which is written in coordinates as \(p_i = \frac{\partial L_0}{\partial \dot{x}^i}\), is a diffeomorphism and defines the classical inverse Legendre transform:

$$\begin{aligned} H_0(x, p) = p_i \dot{x}^i - L_0(x, \dot{x}). \end{aligned}$$
(7.42)

We replace coordinates \((\dot{x}^i)\) by \((D_\nabla ^i x)\), due to (3.2). Now, given a symmetric (2, 0)-tensor field g, we lift \(H_0\) to the \((g,\nabla )\)-canonical \({\overline{H}}^g_0\) in (6.44). The relation between \({\overline{H}}^g_0\) and L is

$$\begin{aligned} \begin{aligned} {\overline{H}}^g_0(x,p,o)&= p_i D_\nabla ^i x - L_0(x, D_\nabla x) + \textstyle {\frac{1}{2}} g^{jk}(x) \left( o_{jk} - \Gamma ^i_{jk}(x)p_i \right) \\&= p_i D^i x + \textstyle {\frac{1}{2}} o_{jk} Q^{jk}x - L(x, Dx, Qx)\\&\quad + \textstyle {\frac{1}{2}} \left( g^{jk}(x) - Q^{jk}x \right) o^\nabla _{jk}, \end{aligned} \end{aligned}$$
(7.43)

where \((o^\nabla _{jk})\) is the tensorial conjugate diffusivities defined in (5.6). We call (7.43) the \((g,\nabla )\)-canonical inverse second-order Legendre transform. When g is Riemannian and \(\nabla \) is the associated Levi–Civita connection, we call (7.43) the g-canonical inverse second-order Legendre transform. In particular, when restricting L onto the subbundle of \({\mathcal {T}}^S M\) with coordinate constraint \(Q^{jk} x = g^{jk}(x)\), we have

$$\begin{aligned} {\overline{H}}^g_0(x,p,o) = p_i D^i x + \textstyle {\frac{1}{2}} o_{jk} Q^{jk}x - L(x, Dx, Qx). \end{aligned}$$
(7.44)

Following the procedure in classical mechanics (Abraham and Marsden 1978, Definition 3.5.11), for a given classical Lagrangian \(L_0:TM\rightarrow {\mathbb {R}}\), we define a function \(A_0:TM\rightarrow {\mathbb {R}}\) by \(A_0(v_x) = \textbf{FL}_0(v_x)\cdot v_x\), and the classical energy \(E_0:TM\rightarrow {\mathbb {R}}\) by \(E_0 = A_0-L_0\). Notice that in local coordinates, \(A_0 = \dot{x}^i \frac{\partial L_0}{\partial \dot{x}^i}\) and \(E_0 = \dot{x}^i \frac{\partial L_0}{\partial \dot{x}^i} - L_0\).

Example 7.21

It is easy to check that the \(\nabla \)-lift of the classical Lagrangian \(L_0\) in (7.25) is the second-order Legendre transform of the second-order Hamiltonian H in (6.26). And conversely, the latter is the g-canonical inverse second-order Legendre transform of the former. The classical energy associated with this Lagrangian is given by

$$\begin{aligned} E_0(t,x,\dot{x}) = \frac{1}{2} |\dot{x}-b(t,x)|^2 + \langle \dot{x}-b(t,x), b(t,x) \rangle + F(t,x). \end{aligned}$$
(7.45)

Each term at RHS corresponds to a kinetic energy, a vector potential energy and a scalar potential energy, respectively.

7.4.2 Stochastic Hamiltonian Mechanics on Riemannian Manifolds

Given a reference metric tensor g, i.e., a geodesically complete Riemannian metric as in Sect. 7.2, let \(\nabla \) be the associated Levi–Civita connection. If a second-order Hamiltonian H is the g-canonical lift of a classical Hamiltonian \(H_0\), namely, \(H = {\overline{H}}_0^g\) as in (6.44), then the stochastic Hamilton’s equations (6.17) can reduce to a simpler Hamilton-type system on \(T^*M\), which is exactly equivalent to the stochastic Euler–Lagrange equation (7.22) via the classical Legendre transform (7.41) and (7.42).

Similarly to (7.19) and (7.20), we introduce, for a smooth function f on \(T^* M\), the vertical gradient \(\nabla _p f\) and horizontal differential \(d_x f\) which are given in local coordinates (xp) by

$$\begin{aligned} \nabla _p f = \frac{\partial f}{\partial p_i} \frac{\partial }{\partial {x^i}}, \quad d_x f = \left( \frac{\partial f}{\partial x^i} + \Gamma _{ij}^k p_k \frac{\partial f}{\partial p_j} \right) dx^i. \end{aligned}$$

Both are invariant under change of coordinates. Still by the classical theory, the connection \(\nabla \) can uniquely determine a \(TT^*M\)-valued 1-form on \(T^*M\) horizontal over M, given by

$$\begin{aligned} \Gamma ^* = dx^i \otimes \left( \frac{\partial }{\partial x^i} + \Gamma _{ij}^k p_k \frac{\partial }{\partial p_j} \right) . \end{aligned}$$

Hence, we have \(d_x f = \Gamma ^*(df)\). Given a 1-form \(\eta \) on M, \(f\circ \eta : q\mapsto f(\eta _q)\) is a smooth function on M. Then, it is easy to verify that

$$\begin{aligned} d(f\circ \eta ) = d_x f \circ \eta + \nabla _{(\nabla _p f\circ \eta )} \eta . \end{aligned}$$
(7.46)

Theorem 7.22

Given a smooth function \(H_0: T^*M\times {\mathbb {R}}\rightarrow {\mathbb {R}}\).

  1. (i)

    Let \(H = {\overline{H}}_0^g: {\mathcal {T}}^{S*} M\times {\mathbb {R}}\rightarrow {\mathbb {R}}\) be the g-canonical lift of H. Let \({\textbf{X}}\) be the horizontal integral process of stochastic Hamilton’s equations (6.17) corresponding to H and \(X=\tau _M^{S*}({\textbf{X}})\). Define a \(T^*M\)-valued horizontal diffusion by \({\mathbb {X}}:= {\hat{\varrho }}^*({\textbf{X}})\). Then, \(\mathbb X(t) = p(t,X(t))\) solves the following system on \(T^*M\),

    $$\begin{aligned} \left\{ \begin{aligned}&D_\nabla X(t) = \nabla _p H_0 ({\mathbb {X}}(t),t), \\&\frac{\overline{{\textbf{D}}}}{dt} p(t,X(t)) = -d_x H_0 ({\mathbb {X}}(t),t), \end{aligned}\right. \end{aligned}$$
    (7.47)

    subject to \(QX(t) = {\check{g}}(X(t))\), where \(\frac{\overline{{\textbf{D}}}}{dt}\) is the damped mean covariant derivative with respect to X. In this case, we refer to the system (7.47) as the g-canonical reduction of (6.17), or global stochastic Hamilton’s equations.

  2. (ii)

    If \(H_0\) is hyperregular, then the global stochastic Hamilton’s equations (7.47) are equivalent to the stochastic Euler–Lagrange equation (7.22) via the classical Legendre transform \(p=d_{\dot{x}} L_0\) and \(H_0(x,p,t) = p\cdot \dot{x} - L_0(t,x,\dot{x})\).

  3. (iii)

    Let \(S\in C^\infty (M\times {\mathbb {R}})\). Then, the following statements are equivalent:

  1. (a)

    for every M-valued diffusion X satisfying

    $$\begin{aligned} D_\nabla X(t) = \nabla _p H_0 (dS(t, X(t) ),t), \quad QX(t) = \check{g}(X(t)), \end{aligned}$$
    (7.48)

    the \(T^* M\)-valued process \(dS\circ X\) solves the global stochastic Hamilton’s equations (7.47);

  2. (b)

    S satisfies the following Hamilton–Jacobi–Bellman equation

    $$\begin{aligned} \frac{\partial S}{\partial t} + H_0(d S, t) + \frac{1}{2}\Delta S = f(t), \end{aligned}$$
    (7.49)

    for some function f depending only on t.

Proof

  1. (i)

    Since \(H = {\overline{H}}_0^g = H_0 + \textstyle {\frac{1}{2}} g^{jk} (o_{jk} - \Gamma ^i_{jk} p_i )\), \((QX)^{jk} = 2\frac{\partial H}{\partial o_{jk}}\) if and only if \(QX(t) = {\check{g}}(X(t))\). Since,

    $$\begin{aligned} \frac{\partial H}{\partial p_i} = \frac{\partial H_0}{\partial p_i} - \frac{1}{2} g^{jk} \Gamma ^i_{jk} = dx^i( \nabla _p H_0 ) - \frac{1}{2} (QX)^{jk} \Gamma ^i_{jk}, \end{aligned}$$

    we have \((DX)^i = \frac{\partial H}{\partial p_i}\) if and only if \(D_\nabla X= \nabla _p H_0\), due to (2.20). This proves the first equation of (7.47). Furthermore,

    $$\begin{aligned} \frac{\partial H}{\partial x^i}= & {} \frac{\partial H_0}{\partial x^i} + \frac{1}{2} \partial _i g^{jk} \left( o_{jk} - \Gamma ^l_{jk} p_l \right) - \frac{1}{2} g^{jk} \partial _i \Gamma ^l_{jk} p_l \\= & {} \frac{\partial H_0}{\partial x^i} - g^{jm} \Gamma _{im}^k \left( o_{jk} - \Gamma ^l_{jk} p_l \right) - \frac{1}{2} g^{jk} \partial _i \Gamma ^l_{jk} p_l. \end{aligned}$$

    On the other hand, by applying Lemma 7.8 (ii) and (iv), and the equation \(D_\nabla X= \nabla _p H_0\), we have

    $$\begin{aligned} \begin{aligned} (D (p\circ {\textbf {X}}))_i&= {\textbf{D}}_t p_i = {\textbf{D}}_t [p(\partial _i)] = \frac{\overline{{\textbf{D}}}p}{dt} (\partial _i) + p \left( \frac{\overline{{\textbf{D}}}\partial _i}{dt} \right) \\ {}&\quad + (QX)^{jk} (\nabla _{\partial _j} p) (\nabla _{\partial _k} \partial _i) \\&= \frac{\overline{{\textbf{D}}}p}{dt} (\partial _i) + p \left( \nabla ^{}_{D_\nabla X}\partial _i + \frac{1}{2} g^{jk} \nabla ^2_{\partial _j,\partial _k} \partial _i + \frac{1}{2} g^{jk} R(\partial _i,\partial _j)\partial _k \right) \\&\quad + g^{jk} (\nabla _{\partial _j} p) (\nabla _{\partial _k} \partial _i) \\&= \frac{\overline{{\textbf{D}}}p}{dt} (\partial _i) + p_l \left( \frac{\partial H_0}{\partial p_j} \Gamma _{ij}^l + \frac{1}{2}g^{jk} \partial _i \Gamma _{jk}^l \right) \\ {}&\quad + g^{jk} \Gamma _{ik}^m \left( \partial _j p_m - \Gamma _{jm}^l p_l \right) . \end{aligned}\nonumber \\ \end{aligned}$$
    (7.50)

    Hence,

    $$\begin{aligned} \begin{aligned} (D (p\circ {\textbf {X}}))_i + \frac{\partial H}{\partial x^i} = \frac{\overline{{\textbf{D}}}p}{dt} (\partial _i) + d_x H_0(\partial _i) + g^{jm} \Gamma _{im}^k \left( \partial _j p_k - o_{jk} \right) . \end{aligned} \end{aligned}$$

    The second equation of (7.47) follows from (6.15).

  2. (ii)

    The equivalence between (7.47) and (7.22) follows from the following calculations:

    $$\begin{aligned} \nabla _p H_0= & {} \nabla _p (p\cdot \dot{x} - L_0) = \dot{x}, \\ d_x H_0= & {} \left( \frac{\partial H_0}{\partial x^i} + \Gamma _{ij}^k p_k \frac{\partial H_0}{\partial p_j} \right) dx^i \\= & {} \left( -\frac{\partial L_0}{\partial x^i} + \Gamma _{ij}^k \frac{\partial L_0}{\partial \dot{x}^k} \dot{x}^j \right) dx^i = - d_x L_0. \end{aligned}$$
  3. (iii)

    By (7.7), conditions (7.48) and (7.46),

    $$\begin{aligned} \begin{aligned} \frac{\overline{{\textbf{D}}}}{dt} (dS)&= \left( \frac{\partial }{\partial t} + \nabla ^{}_{D_\nabla X} + \frac{1}{2}\Delta _{\textrm{LD}} \right) (dS) \\&= d\frac{\partial S}{\partial t} + \nabla _{(\nabla _p H_0\circ dS)} dS - \frac{1}{2}( dd^*+d^* d)dS \\&= d\frac{\partial S}{\partial t} + d(H_0\circ dS) - d_x H_0 \circ dS - \frac{1}{2} dd^*dS \\&= d\left( \frac{\partial S}{\partial t} + H_0\circ dS + \frac{1}{2} \Delta S \right) - d_x H_0 \circ dS. \end{aligned} \end{aligned}$$

    The result follows.

\(\square \)

Remark 7.23

  1. (i)

    Assertions (ii) and (iii) of Theorem 7.22 generalize Theorem 7.18, since from the Legendre transform \(p=d_{\dot{x}} L_0\) we observe that the S-EL equation (7.22) is related to HJB equation (7.49) via Eq. (7.29). However, assertion (iii) is a special case of Theorem 6.19, since HJB equation (7.49) is just the one in (6.39) with \(H = {\overline{H}}_0^g\) the g-canonical lift of \(H_0\), due to the observation that \(\overline{H}_0^g(d^2\,S, t) = H_0(d S, t) + \frac{1}{2}\Delta S\).

  2. (ii)

    The advantage of Theorem 7.22 is that it formulates stochastic Hamiltonian mechanics in a global way similar to stochastic Lagrangian mechanics, while its disadvantage is that it depends on the choice of Riemannian structures. However, unlike stochastic Hamiltonian mechanics of Sect. 6, neither global S-H equations (7.47) nor HJB equation (7.49) encodes any new symplectic or contact structures, as the Hamiltonian functions therein are still classical.

  3. (iii)

    By a direct calculation similar to (7.50), one easily obtains the following local version of stochastic Euler–Lagrange equation (7.22):

    $$\begin{aligned} {\textbf{D}}_t \left( \frac{\partial L_0}{\partial \dot{x}^i} \right) = \frac{\partial L_0}{\partial x^i} + \frac{1}{2} g^{jk} \partial _i \Gamma ^l_{jk} \frac{\partial L_0}{\partial \dot{x}^l} - \frac{1}{2} \partial _i g^{jk} \left( \frac{\partial ^2 L_0}{\partial x^j \partial \dot{x}^k} - \Gamma ^l_{jk} \frac{\partial L_0}{\partial \dot{x}^l} \right) .\nonumber \\ \end{aligned}$$
    (7.51)

    This local version is related to stochastic Hamilton’s equations (6.10) via the canonical second-order Legendre transform (7.43).

  4. (iv)

    Similarly to Remark 6.20, if we let \({\tilde{H}} = H - f\), then Theorem 7.22 holds with \({\tilde{H}}\) and zero function in place of H and f. We will refer to Eq. (7.49) with \(f\equiv 0\) as the HJB equation associated with Hamiltonian \(H_0\), or the HJB equation associated with the Lagrangian \(L_0\) related to \(H_0\) via the Legendre transform (when \(H_0\) is hyperregular).

On Riemannian manifolds, canonical transformations of Sect. 6.5 can also be reduced to tangent bundles. We consider a bundle isomorphism \({\textbf{F}}\) from \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) to \({\mathcal {T}}^{S*} N\times {\mathbb {R}}\), projecting to a time-change map \(F^0:{\mathbb {R}}\rightarrow {\mathbb {R}}\). The transformation \({\textbf{F}}\) is a map from coordinates \((x^i, p_i, o_{jk},t)\) to \((y^i, P_i, O_{jk}, s)\) satisfying \(s=F^0(t)\). Both base manifolds M and N are equipped with some Riemannian metrics and the corresponding Levi–Civita connections.

By the inverse second-order Legendre transform (7.44) and the integrability condition (6.15), the action functional in (7.9) can be rewritten as

$$\begin{aligned} \begin{aligned}&{\mathcal {S}} [X;0,T]\\&\quad = {\textbf{E}}\int _0^T \left[ p_i(t,X(t)) (DX)^i(t) + \frac{1}{2} \frac{\partial p_j}{\partial x^k}(t,X(t)) (QX)^{jk}(t) - {\overline{H}}^g_0({\textbf{X}}(t),t) \right] {\textrm{d}}t \\&\quad = {\textbf{E}}\int _0^T \left[ p_i(t,X(t)) \circ dX^i(t) - \overline{H}^g_0({\textbf{X}}(t),t) {\textrm{d}}t \right] , \end{aligned} \end{aligned}$$

where \(\circ \,d\) denotes the Stratonovich stochastic differential and \({\overline{H}}^g_0 = H_0 + \frac{1}{2} g^{jk} (o_{jk} - \Gamma ^i_{jk}p_i )\). We denote simply \(x^i = x^i\circ {\textbf{X}}\), \(p_i = p_i\circ {\textbf{X}}\) and \(H= {\overline{H}}^g_0\). Then, \(\mathcal S = {\textbf{E}}\int _0^T (p_i\circ dx^i(t) - H {\textrm{d}}t)\). Now we make a change of coordinates from \((x^i,p_i,t)\) to \((y^i,P_i,s)\) satisfying \(s=F^0(t)\), and denote that \(y^i = y^i\circ {\textbf{X}}\) and \(P_i = P_i\circ {\textbf{X}}\). We have

$$\begin{aligned} {\mathcal {S}} = {\textbf{E}}\int _0^T \left( P_i\circ dy^i(s) - K {\textrm{d}}s \right) = {\textbf{E}}\int _0^T \left( P_i\circ d(y^i\circ F^0)(t) - K \dot{F}^0 {\textrm{d}}t \right) , \end{aligned}$$

where the function K plays the role of the second-order Hamiltonian in new coordinate system.

As in Sect. 6.5, the general condition for a transformation to be canonical is to preserve the form of stochastic Hamilton’s system (7.47). This is equivalent to preserve the form of stochastic stationary-action principle (7.12), according to Theorem 7.22.(ii). It follows from \(\delta {\mathcal {S}} = 0\) that

$$\begin{aligned} \delta \,{\textbf{E}}\int _0^T \left( p_i\circ dx^i(t) - H {\textrm{d}}t\right) = \delta \,{\textbf{E}}\int _0^T \left( P_i\circ d(y^i\circ F^0)(t) - K \dot{F}^0 {\textrm{d}}t\right) =0. \end{aligned}$$

Since the underlying process X has zero variation at the endpoints, both equalities will be satisfied if the integrands are related by the following SDE:

$$\begin{aligned} p_i\circ dx^i - H dt = P_i\circ dy^i - K \dot{F}^0 dt + dG, \end{aligned}$$
(7.52)

where G is a function of phase space coordinates (xpt) or (yPs) or any mixture of them and called the generating function. Note that in contrast with the classical theory of canonical transformation and also (6.36), here Eq. (7.52) for canonical transformations is a stochastic differential equation, instead of equation for forms.

Consider the type one generating function \(G_1\), that is, \(G = G_1(x,y,t)\) is given as a function of the old and new generalized position coordinates (cf. Goldstein et al. 2002, Section 9.1). Then, using Itô’s formula \(dG_1 = \frac{\partial G_1}{\partial t}dt + \frac{\partial G_1}{\partial x^i} \circ dx^i + \frac{\partial G_1}{\partial y^i} \circ dy^i\), and vanishing the coefficients of every (stochastic) differentials \(\circ dx\), \(\circ dy\) and dt in (7.52), we get

$$\begin{aligned} p_i = \frac{\partial G_1}{\partial x^i}, \quad P_i = - \frac{\partial G_1}{\partial y^i}, \quad K \dot{F}^0 - H = \frac{\partial G_1}{\partial t}, \end{aligned}$$

which recovers (6.37). By taking \(F^0 = \textbf{Id}_{\mathbb {R}}\) (i.e., no time-change) and requiring the new Hamiltonian \(K_0\) to be identically zero, and writing \(G_1\) as S the last equation turns into the following HJB equation

$$\begin{aligned} \frac{\partial S}{\partial t}(x,y,t) + H_0\left( x^i, \frac{\partial S}{\partial x^i}(x,y,t), t \right) + \frac{1}{2}\Delta _x S(x,y,t) + \frac{1}{2}\Delta _y S(x,y,t) = 0, \end{aligned}$$

where (xy) are regarded as coordinates on the product manifold \(M\times N\) equipped with the direct-sum Riemannian metric and its corresponding Levi–Civita connection, \(\Delta _x\) and \(\Delta _y\) are the Laplacian on M and N, respectively, so that \(\Delta _x + \Delta _y\) is the Laplacian on \(M\times N\) under the aforementioned connection.

In contrast to the mixed-order contact approach to canonical transformations of Sect. 6.5, since the changes of coordinates proceed on \(T^* M\), one can easily formulate four types of generating functions that are related to each other through classical Legendre transforms in the same way as in classical mechanics (Goldstein et al. 2002, Section 9.1). For example, the type two generating function takes the form \(G = G_2(x,P,t) - y^i P_i\), for which we have

$$\begin{aligned} p_i = \frac{\partial G_2}{\partial x^i}, \quad y^i = \frac{\partial G_2}{\partial P_i}, \quad K \dot{F}^0 - H = \frac{\partial G_2}{\partial t}. \end{aligned}$$
(7.53)

In this case, since \((x^i)\) and \((y^i)\) are no longer independent variables, Riemannian structures on M and N should be related by the transformation. In view of this, we only consider point transformations, a subclass of canonical transformations. That is, we assume \(G_2\) to be the form

$$\begin{aligned} G_2(x,P,t) = f^i(x,t) P_i + h(x,t) \end{aligned}$$

for some diffeomorphisms \(f: M\rightarrow N\)’s and \(h: M\rightarrow {\mathbb {R}}\). The second equation of (7.53) implies

$$\begin{aligned} y^i = f^i(x,t). \end{aligned}$$

So we equip N with the (time-dependent) pushforward Riemannian metric of g on M by f, and with the Levi–Civita connection.

Example 7.24

(Canonical transformations for one-dimensional Bernstein’s reciprocal processes) Consider the scalar case of Example 6.11, that is, the \({\mathbb {R}}\)-valued Brownian reciprocal process with second-order Hamiltonian \(H(x,p,o) = H_0(x,p) + \frac{1}{2}o = \frac{1}{2}|p|^2+ \frac{1}{2}o\). The equations of motion are \(DDX=0\), \(QX = 1\) [cf. (6.33)]. In the following, we will consider two canonical transformations which transform Brownian reciprocal processes to reciprocal processes derived from diffusions with linear potentials and quadratic potentials, respectively.

  1. (i)

    Consider the time-dependent change of coordinates from (xpt) to (yPt) (without time-change) induced by \(G_2(x,P,t)= (x + \textstyle {{\frac{1}{2}}} t^2) P -tx\). By (7.53),

    $$\begin{aligned} y = x+ \textstyle {{\frac{1}{2}}} t^2, \quad p = P-t, \quad K = H + Pt-y + \textstyle {{\frac{1}{2}}} t^2. \end{aligned}$$
    (7.54)

    For the latent second-order coordinates, we have

    $$\begin{aligned} O = \frac{\partial P}{\partial y} = \frac{\partial p}{\partial x} = o. \end{aligned}$$

    Hence, by the last equation of (7.53), the new second-order Hamiltonian is

    $$\begin{aligned} K(y,P,O,t) = K_0(y,P,t)+ \frac{1}{2}O= \frac{1}{2}|P|^2 - y + t^2 + \frac{1}{2}O, \end{aligned}$$

    which is still of the form (6.26), with \(b\equiv 0\) and \(F(t,y) = -y+t^2\). The equations of motion under new coordinates are \(DDY=1\) and \(QY = 1\). By Remark 6.20, K share the same equations of motion with \({\tilde{K}}(y,P,O) = \frac{1}{2}|P|^2 - y + \frac{1}{2}O\). In other words, (7.54) transforms Brownian reciprocal processes to reciprocal processes derived from diffusions with linear potentials. This example is taken from Lescot and Zambrini (2007, Theorem 4.1.(1)), where the authors used (7.54) to transform free heat equations to heat equations with linear potentials. We refer readers to Lescot and Zambrini (2007) for more applications of canonical transformations of contact Hamiltonian systems to Euclidean quantum mechanics in Example 6.12.

  2. (ii)

    Consider the following change of coordinates from (xpt) to (yPs) (with time-change)

    $$\begin{aligned} x = y\sqrt{1-t^2}, \quad P= p\sqrt{1-t^2} + yt, \quad s=\textrm{arctanh}\,t. \end{aligned}$$
    (7.55)

    Clearly, the map \((x,p)\mapsto (y,P)\) is induced by the type three generating function \(G_3(y,p,t) = -py\sqrt{1-t^2} -\frac{y^2}{2}t\) via relations \(x= -\frac{\partial G_3}{\partial p}\) and \(P= -\frac{\partial G_3}{\partial y}\). The relation between the latent coordinates o and O is

    $$\begin{aligned} O = \frac{\partial P}{\partial y} = \frac{\partial p}{\partial x} \frac{\partial x}{\partial y} \sqrt{1-t^2} + t = (1-t^2) o +t. \end{aligned}$$
    (7.56)

    The new second-order Hamiltonian K satisfies \(K \frac{{\textrm{d}}s}{{\textrm{d}}t} - H = \frac{\partial G_3}{\partial t}\). Hence, combining with (7.55) and (7.56), we obtain

    $$\begin{aligned} K (y,P,O,s)= & {} (1-t^2) \left( \frac{1}{2} |p|^2 + \frac{py t}{\sqrt{1-t^2}} - \frac{1}{2} |y|^2 + \frac{1}{2} o \right) \\= & {} \frac{1}{2} |P|^2 - \frac{1}{2} |y|^2 + \frac{1}{2} O - \frac{1}{2} \tanh s. \end{aligned}$$

    This differs with the second-order Hamiltonian of Euclidean harmonic oscillators in Example 6.12.(ii), i.e., \({\tilde{K}} (y,P,O) = \frac{1}{2} |P|^2 - \frac{1}{2} |y|^2 + \frac{1}{2} O\), by a term depending only on time. So by virtue of Remark 6.20, K and \({\tilde{K}}\) share the same equations of motion \(DDY=Y\), \(QY = 1\). Therefore, (7.55) transforms free reciprocal processes to Euclidean harmonic oscillators.

Example 7.25

(Canonical transformations for vanishing potentials) Let (Mg) be Riemannian. Take \(G_2(x,P,t) = x^i P_i - S(x,t)\) for some function S. Then,

$$\begin{aligned} y = x, \quad p_i = P_i - \frac{\partial S}{\partial x^i}, \quad K = H - \frac{\partial S}{\partial t}. \end{aligned}$$

Since the transformation on base manifold M is identity, it does not change the Riemannian metric, and

$$\begin{aligned} o_{ij} = \frac{\partial p_i}{\partial x^j} = \frac{\partial P_i}{\partial y^j} - \frac{\partial ^2 S}{\partial x^i\partial x^j}. \end{aligned}$$
  1. (i)

    We consider the Hamiltonian \(H_0(x, p) \equiv b^i(x) p_i - F(x)\), whose corresponding second-order Hamiltonian \(H = \overline{H}^g_0\) has a diffusion with generator \(\frac{1}{2} \Delta + b\cdot \nabla - F\) for solution process (see Sect. 6.3.1). Then, the new Hamiltonian is

    $$\begin{aligned} K_0(y,P,t)= & {} b^i(y) P_i - \langle b(y), \nabla S(y,t)\rangle \\{} & {} - F(y) - \frac{1}{2} \Delta S(y,t) - \frac{\partial S}{\partial t}(y,t). \end{aligned}$$

    If we choose S solving the backward PDE (6.23), then \(K_0(y,P) = b^i(y) P_i\) has a diffusion process with generator \(\frac{1}{2} \Delta + \nabla _b\) for solution. In particular, such a canonical transformation can transform a diffusion process with a scalar potential into a free motion.

  2. (ii)

    Consider the Hamiltonian \(H_0(x,p,t) = \frac{1}{2}g^{ij}(x) p_i p_j + g^{ij}(x)p_i\frac{\partial S}{\partial x^j}(x,t) + b^i(x) p_i - F(x)\), whose corresponding second-order Hamiltonian \(H = \overline{H}^g_0\) has a Schrödinger’s bridge with vector potential \((b+\nabla S)\) and scalar potential \(-F\) for solution process. Then, the new Hamiltonian is

    $$\begin{aligned} K_0(y,P,t)= & {} \frac{1}{2}g^{ij}(y) P_iP_j + b^i(y) P_i - \langle b(y), \nabla S(y,t)\rangle - \frac{1}{2}|\nabla S(y,t)|^2 - F(y) \\{} & {} - \frac{1}{2}\Delta S(y,t) - \frac{\partial S}{\partial t}(y,t). \end{aligned}$$

    To transform \(K_0\) into the standard form \(K_0(y,P,t) = \frac{1}{2}g^{ij}(y) P_iP_j + b^i(y) P_i\) whose solution is a Schrödinger’s bridge with vector potential b, we only need to assume that S solves HJB equation (6.28). In particular, such a canonical transformation transforms a Schrödinger’s bridge with a scalar potential into a free one.

Regarding the classical energy introduced in the end of Sect. 7.4.1, for a given classical Lagrangian \(L_0:{\mathbb {R}}\times TM\rightarrow {\mathbb {R}}\), we introduce its generalized (or deformed) energy \(E:{\mathbb {R}}\times TM\rightarrow {\mathbb {R}}\) by

$$\begin{aligned} E(t,x,\dot{x}) = E_0(t,x,\dot{x}) + \textstyle {\frac{1}{2}} \Delta S(t,x), \end{aligned}$$

where S is the solution of the Hamilton–Jacobi–Bellman equation (7.49) associated with \(L_0\) (with \(f\equiv 0\)). The term \(\frac{1}{2}\Delta S\) stands for the stochastic deformation.

7.4.3 Small-Noise Limits

In this part, we will see, informally, how our stochastic framework degenerates into classical mechanics as the noise goes to zero. Let \(\epsilon >0\) be a small parameter which we refer to as diffusivity. The limit when \(\epsilon \rightarrow 0\) is called the small-noise limit.

Let \({\mathcal {A}}^\epsilon _g([0,T];q, \mu )\) be the small-noise version of the admissible class (7.10), that is, with the constraint \(QX(t) = \epsilon {\check{g}}(X(t))\). The \(\epsilon \)-dependent stochastic variational problem is to minimize the action functional \({\mathcal {S}} [X;0,T]\) in (7.9) among all \(X\in {\mathcal {A}}^\epsilon _g([0,T];q, \mu )\). Then, the same procedure as Sect. 7.2 yields the following \(\epsilon \)-dependent stochastic Euler–Lagrange equation:

$$\begin{aligned} \frac{\overline{{\textbf{D}}}^\epsilon }{dt} \big ( d_{\dot{x}} L_0\left( t, X_\epsilon (t), D_\nabla X_\epsilon (t) \right) \big ) = d_x L_0\left( t, X_\epsilon (t), D_\nabla X_\epsilon (t) \right) , \end{aligned}$$
(7.57)

which is an equivalent condition for \(X_\epsilon \in {\mathcal {A}}^\epsilon _g([0,T];q, \mu )\) to be a stationary point of \({\mathcal {S}}\). Here \(\frac{\overline{{\textbf{D}}}^\epsilon }{dt}\) is the damped mean covariant derivative with respect to \(X_\epsilon \) so that

$$\begin{aligned} \frac{\overline{{\textbf{D}}}^\epsilon }{dt} = \frac{\partial }{\partial t} + \nabla ^{}_{D_\nabla X} + \frac{\epsilon }{2} \Delta _{\textrm{LD}}. \end{aligned}$$

Now as \(\epsilon \rightarrow 0\), since \(QX_\epsilon \rightarrow 0\), \(X_\epsilon \) tends to some deterministic curve \(\gamma =(\gamma (t))_{t\in [0,T]}\) (in a suitable probabilistic sense), and \(D_\nabla X_\epsilon (t)\) tends to \(\dot{\gamma }(t)\). Thus, we can write informally

$$\begin{aligned}&{\mathcal {A}}^\epsilon _g([0,T];q, \mu ) \rightarrow {\mathcal {A}}^0_g([0,T];q, \mu ) \\&\quad := \left\{ \gamma \text { is adapted with paths in } C^2([0,T],M) : \gamma (0) = q, {\textbf{P}}\circ (\gamma (T))^{-1} = \mu \right\} . \end{aligned}$$

The \(\epsilon \)-dependent stochastic variational problem tends to the following deterministic variational problem

$$\begin{aligned} \min _{\gamma \in {\mathcal {A}}^0_g([0,T];q, \mu )} \int _0^T L_0\left( t, \gamma (t), \dot{\gamma }(t) \right) {\textrm{d}}t. \end{aligned}$$
(7.58)

And the \(\epsilon \)-dependent stochastic Euler–Lagrange equation (7.57) tends to

$$\begin{aligned} \frac{D}{dt} \big ( d_{\dot{x}} L_0\left( t, \gamma (t), \dot{\gamma }(t) \right) \big ) = d_x L_0\left( t, \gamma (t), \dot{\gamma }(t) \right) , \end{aligned}$$
(7.59)

where, \(\frac{D}{dt} = \frac{\partial }{\partial t} + \nabla _{\dot{\gamma }}\) is the material derivative along \(\gamma \). This is the classical Euler–Lagrange equation in global form, cf. Villani (2009, p. 153).

We introduce the following \(\epsilon \)-dependent version of the g-canonical lift (6.44):

$$\begin{aligned} H_\epsilon (x,p,o,t):= H_0(x,p,t) + \textstyle {\frac{\epsilon }{2}} g^{jk}(x) \left( o_{jk} - \Gamma ^i_{jk}(x)p_i \right) . \end{aligned}$$

Let \({\textbf{X}}_\epsilon \) be a horizontal integral process of stochastic Hamilton’s equations (6.10) corresponding to \(H_\epsilon \) and \(X_\epsilon =\tau _M^{S*}({\textbf{X}}_\epsilon )\). Since \((Q (x\circ {\textbf {X}}_\epsilon ))^{jk}= 2\frac{\partial H_\epsilon }{\partial o_{jk}} = \epsilon \check{g} \rightarrow 0\) as \(\epsilon \rightarrow 0\), \({\textbf{X}}_\epsilon \) converges to a \(T^* M\)-valued process. And since \(\frac{\partial H_\epsilon }{\partial p_i} \rightarrow \frac{\partial H_0}{\partial p_i}\) and \(\frac{\partial H_\epsilon }{\partial x^i} \rightarrow \frac{\partial H_0}{\partial x^i}\), the limit \(T^* M\)-valued process satisfies classical Hamilton’s equations,

$$\begin{aligned} \left\{ \begin{aligned} \dot{x}^i(t)&= \textstyle {{\frac{\partial H_0}{\partial p_i}}} (x(t), p(t), t), \\ \dot{p}_i(t)&= - \textstyle {{\frac{\partial H_0}{\partial x^i}}} (x(t), p(t), t). \end{aligned}\right. \end{aligned}$$
(7.60)

Let \({\mathbb {X}}_\epsilon := {\hat{\varrho }}^*({\textbf{X}}_\epsilon )\). Then, \(\mathbb X_\epsilon (t) = p(t,X_\epsilon (t))\) solves the system of global stochastic Hamilton’s equations (7.47), with \({\mathbb {X}}_\epsilon \), \(X_\epsilon \) and \(\frac{\overline{{\textbf{D}}}^\epsilon }{dt}\) in place of \({\mathbb {X}}\), X and \(\frac{\overline{{\textbf{D}}}}{dt}\), respectively, subject to \(QX_\epsilon (t) = \epsilon {\check{g}}(X_\epsilon (t))\). As \(\epsilon \) goes to 0, this system tend to the following deterministic system,

$$\begin{aligned} \left\{ \begin{aligned}&\dot{x}(t) = \nabla _p H_0 (x(t), p(t), t), \\&\frac{D}{dt} p(t) = -d_x H_0 (x(t), p(t),t), \end{aligned}\right. \end{aligned}$$
(7.61)

This is indeed the global form of (7.60) which is equivalent to the global Euler–Lagrange equation (7.59) via the classical Legendre transform.

The corresponding \(\epsilon \)-dependent Hamilton–Jacobi–Bellman equation is now

$$\begin{aligned} \frac{\partial S}{\partial t} + H_\epsilon (d^2 S, t) = \frac{\partial S}{\partial t} + H_0(d S, t) + \frac{\epsilon }{2}\Delta S = f(t), \end{aligned}$$

which, as \(\epsilon \rightarrow 0\), goes to the classical Hamilton–Jacobi equation

$$\begin{aligned} \frac{\partial S}{\partial t} + H_0(d S, t) = f(t). \end{aligned}$$

The latter corresponds to (7.59)–(7.61) via classical Hamilton–Jacobi theory (e.g., Abraham and Marsden 1978, Chapter 5).

We list here some previous works that have independent interests in the above small-noise limits, in some special cases. The time-asymptotic large deviation for Brownian bridges of Example 6.11 was studied in Hsu (1990). The second author of the present paper and his collaborator proved in Privault et al. (2016) a large deviation result for one-dimensional Bernstein bridges which are solution processes of Euclidean quantum mechanics in Example 6.12. The paper (Léonard 2012b) proved that the \(\Gamma \)-limit of Schrödinger’s problem in Sect. 7.3 with small variance is the Monge–Kantorovich problem. The latter is the optimal transport problem associated with the classical variational problem (7.58) (Villani 2009, Chapter 7). See Mikami (2021, Section 2.3) for more on small-noise limits of stochastic optimal transport.

Remark 7.26

There are various terminologies in other areas related to the small-noise limit. In thermodynamics (Huang and Zambrini 2023), \(\epsilon \) stands for the Boltzmann constant which relates to the diffusion coefficient via Einstein relation, as consistent with Schrödinger’s original statistical problem (Schrödinger 1932); when applied to quantum mechanics as in Example 6.12, the small-noise limit is called the semiclassical limit and the parameter \(\epsilon \) stands for the reduced Planck constant \(\hbar \); when/if applied to hydrodynamics (cf. Arnaudon et al. 2014; Chen et al. 2023), it is often called the vanishing viscosity limit and \(\epsilon \) stands for the kinematic viscosity \(\nu \). The latter may be expected to solve Kolmogorov’s conjecture that the “stochastization” of dynamical systems is related to hydrodynamic PDEs as viscosity vanishes (Arnold and Khesin 2021). In physics, diffusivity, Planck constant and viscosity are indeed related to each other (Trachenko and Brazhkin 2021).

7.4.4 Relations to Stochastic Optimal Control

Following the way of converting problems of classical calculus of variations into optimal control problems (see Fleming and Soner 2006), we can regard the stochastic variational problem of Sect. 7.2 as a stochastic optimal control problem.

Assume that (Mg) is compact (for simplicity). Consider a stochastic control model in which the state evolves according to an M-valued diffusion X governed by a system of MDEs on the time interval [tT], of the form

$$\begin{aligned} \left\{ \begin{aligned}&D_\nabla X(s) = U(s), \\&Q X(s) = g(X(s)), \end{aligned} \right. \end{aligned}$$
(7.62)

or equivalently, by an Itô SDE of the form

$$\begin{aligned} dX^i(s) = \left( U^i(s) - \frac{1}{2} g^{jk}(X(s)) \Gamma _{jk}^i(X(s)) \right) ds + \sigma _r^i(X(s)) dW^r(s), \quad s\in [t,T], \end{aligned}$$

where \(\sigma \) is the positive-definite square root (1, 1)-tensor of g, i.e., \(\sum _{r=1}^d \sigma ^i_r\sigma ^j_r = g^{ij}\), W is an \({\mathbb {R}}^d\)-valued standard Brownian motion and, most importantly, U is a TM-valued process called the control process. There are no control constraints for U as it is admissible in the sense of Fleming and Soner (2006, Definition 2.1). As endpoint condition, we require that \(X(t) = x\).

The control problem on a finite time interval \(s\in [t,T]\) is to choose U to minimize

$$\begin{aligned} J(t,x;X,U):= {\textbf{E}}_{(t,x)} \left[ \int _t^T L_0\left( s, X(s), U(s) \right) {\textrm{d}}s - S_T(X(T)) \right] , \end{aligned}$$
(7.63)

among all pairs (XU) satisfying the system (7.62) and the endpoint condition, where \(S_T\) is a given smooth function on M. The real-valued smooth function \(L_0\) on \({\mathbb {R}}\times TM\) is called running cost function and J the payoff functional. The problem is called a stochastic Bolza problem. In the case \(S_T\equiv 0\), this stochastic control problem is of the same form as our stochastic variational problem of Sect. 7.2. For this reason, we call the latter stochastic control problem to be in Lagrange form. By an argument similar to Theorem 7.16, one can derive the same equation as (7.22), but with boundary conditions \(X(t) = x\) and \(d_{\dot{x}}L_0(T,X(T),D_\nabla X(T)) = dS_T(X(T))\).

The starting point of dynamic programming is to regard the infimum of J being minimized as a function S(tx) of the initial data:

$$\begin{aligned} S(t,x) = -\inf _{(X,U)} J (t,x;X,U). \end{aligned}$$

Then, Bellman’s principle of dynamic programming (Fleming and Soner 2006, Section III.7) states that for \(t\le t+\epsilon \le T\),

$$\begin{aligned} 0= \inf _{X\in I_0^T(M)} {\textbf{E}}_{(t,x)} \left[ \int _t^{t+\epsilon } L_0\left( s, X(s), D_\nabla X(s) \right) {\textrm{d}}s - S(t+\epsilon ,X(t+\epsilon )) + S(t,x) \right] . \end{aligned}$$

Divide the equation by \(\epsilon \), let \(\epsilon \rightarrow 0^+\), and then use Dynkin’s formula. We get the dynamic programming equation

$$\begin{aligned} 0 = \inf \left[ L_0(t,x,D_\nabla x) - ({\textbf{D}}_{\textrm{t}} S)(t,x, Dx, Qx) \right] , \end{aligned}$$
(7.64)

subjected to terminal data \(S(T,x) = S_T(x)\). By (4.5) and (7.62),

$$\begin{aligned} {\textbf{D}}_{\textrm{t}} S = \partial _t S + D^i x \partial _i S + \textstyle {{\frac{1}{2}}} Q^{ij} x \partial _i \partial _j S = \partial _t S + \left( D^i_\nabla x - \textstyle {{\frac{1}{2}}} \Gamma ^i_{jk} g^{jk} \right) \partial _i S + \textstyle {{\frac{1}{2}}} g^{ij} \partial _i \partial _j S. \end{aligned}$$

We let

$$\begin{aligned} H(x,p,o,t) = \sup \left[ \left( D^i_\nabla x - \textstyle {{\frac{1}{2}}} \Gamma ^i_{jk}(x) g^{jk}(x) \right) p_i + \textstyle {{\frac{1}{2}}} g^{ij}(x) o_{ij} - L_0(t,x,D_\nabla x) \right] \end{aligned}$$

where the supremum can be ignored if \(L_0\) is convex, so that \(H = {\overline{H}}^g_0\) is exactly the canonical inverse second-order Legendre transform in (7.43). Then, the dynamic programming equation (7.64) can be written as the HJB equation (6.38), cf. Fleming and Soner (2006, Section IV.3).

There is also a stochastic version of Pontryagin’s maximum principle (Yong and Zhou 1999, Theorem 3.3.2). The crucial objects in stochastic Pontryagin’s principle are first- and second-order adjoint processes, p and o, respectively. Corresponding to the stochastic control problem (7.62)–(7.63), its adjoint processes p and o satisfy the following backward SDEs (Yong and Zhou 1999, Section 3.3.2) (where “backward” is again in a different sense from ours in Sect. 2),

$$\begin{aligned} \left\{ \begin{aligned} dp_i(t)&= \left[ \frac{1}{2} \left( \partial _i g^{kl} \Gamma _{kl}^j + g^{kl} \partial _i \Gamma _{kl}^j \right) (X(t)) p_j(t) - \sum _{r=1}^d \partial _i \sigma _r^j (X(t)) z_{jr}(t) \right. \\&\quad \left. + \frac{\partial L_0}{\partial x^i} \left( t, X(t), U(t) \right) \right] dt \\&\quad + z_{ir}(t) dW^r(t), \\ p_i(T)&= \partial _i S_T(X(T)), \end{aligned}\right. \end{aligned}$$
(7.65)

and

$$\begin{aligned} \left\{ \begin{aligned} do_{ij}(t)&= - \Bigg [ \frac{\partial ^2 {\overline{H}}^g_0}{\partial x^i \partial x^j} (X(t),p(t),o(t),t) \\&\qquad \ + o_{ik}(t) \left( o_{jl}(t) \frac{\partial ^2 H_0}{\partial p_k \partial p_l} (X(t),p(t),t) + 2\frac{\partial ^2 H_0}{\partial x^j \partial p_k} (X(t),p(t),t) \right) \\&\qquad \ - o_{ik}(t) \left( \partial _j g^{lm} \Gamma _{lm}^k + g^{lm} \partial _j \Gamma _{lm}^k \right) (X(t)) \\&\qquad \ + \sum _{r=1}^d \left( \partial _j \sigma _r^k (X(t)) Z_{ikr}(t) + \partial _j \sigma _r^l (X(t)) Z_{ilr}(t) \right) \Bigg ] dt + Z_{ijr}(t) dW^r(t), \\ o_{ij}(T)&= \partial _i \partial _j S_T(X(T)), \end{aligned}\right. \end{aligned}$$
(7.66)

which are called first- and second-order adjoint equation, respectively. The unknowns in (7.65) and (7.66) are the pairs (pz) and (oZ), respectively. Suppose that \(p_i(t) = p_i(t,X(t))\) and \(o_{ij}(t) = o_{ij}(t,X(t))\) for time-dependent second-order form (po) that satisfies second-order Maxwell relations (6.15). Then,

$$\begin{aligned} z_{ir} = \frac{\partial p_i}{\partial x^j} \sigma ^j_r, \quad Z_{ijr} = \frac{\partial o_{ij}}{\partial x^k} \sigma ^k_r. \end{aligned}$$

Plugging them into (7.65) and (7.66), we get

$$\begin{aligned} D_i p&= \frac{1}{2} \left( \partial _i g^{kl} \Gamma _{kl}^j + g^{kl} \partial _i \Gamma _{kl}^j \right) p_j - \frac{1}{2} \partial _i g^{jk} \frac{\partial p_j}{\partial x^k} + \frac{\partial L_0}{\partial x^i} = -\frac{\partial {\overline{H}}^g_0}{\partial x^i}, \nonumber \\ D_{ij} o&= - \left( \frac{\partial ^2 {\overline{H}}^g_0}{\partial x^i \partial x^j} + \frac{\partial p_k}{\partial x^i} \frac{\partial p_l}{\partial x^j} \frac{\partial ^2 {\overline{H}}^g_0}{\partial p_k \partial p_l} + 2 \frac{\partial p_k}{\partial x^i} \frac{\partial ^2 {\overline{H}}^g_0}{\partial x^j \partial p_k} + 2 \frac{\partial o_{kl}}{\partial x^i} \frac{\partial ^2 {\overline{H}}^g_0}{\partial x^j \partial o_{kl}} \right) . \end{aligned}$$
(7.67)

These coincide with the corresponding equations in the S-H system (6.10) for second-order Hamiltonian \(\overline{H}^g_0\). The first equality of (7.67) also recovers (7.51).

7.5 Stochastic Variational Symmetries

Definition 7.27

Given an action functional \({\mathcal {S}}\) as in (7.9), a bundle automorphism F on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) projecting to \(F^0\) is called a variational symmetry of \({\mathcal {S}}\) if, whenever \([t_1,t_2]\) is a subinterval of [0, T], we have \({\mathcal {S}}[F\cdot X,F^0(t_1),F^0(t_2)] = {\mathcal {S}}[X,t_1,t_2]\). A \(\pi \)-projectable vector field V on \({\mathbb {R}}\times M\) is called an infinitesimal variational symmetry of \({\mathcal {S}}\), if its flow consists of variational symmetries of \({\mathcal {S}}\).

Lemma 7.28

The \(\pi \)-projectable vector field V of the form (4.9) is an infinitesimal variational symmetry of \({\mathcal {S}}\) if and only if

$$\begin{aligned} \left[ (j^\nabla V)(L_0) + L_0\dot{V}^0\right] \big (j^\nabla _t X\big ), \quad t\in [0,T] \end{aligned}$$

is a martingale, for all \(X\in I_0^T(M)\).

Proof

As in the proof of Theorem 4.14, we let \(\psi = \{(\psi ^0_\epsilon , {\bar{\psi }}_\epsilon )\}_{\epsilon \in {\mathbb {R}}}\) be the flow generated by V, and denote \({\tilde{X}}_\epsilon = \psi _\epsilon \cdot X\). Then, by a change of variable \(s=\psi ^0_\epsilon (t)\),

$$\begin{aligned} \begin{aligned} {\mathcal {S}}\big [{\tilde{X}}_\epsilon , \psi ^0_\epsilon (t_1), \psi ^0_\epsilon (t_2)\big ]&= {\textbf{E}}\int _{\psi ^0_\epsilon (t_1)}^{\psi ^0_\epsilon (t_2)} L_0\left( s, {\tilde{X}}_\epsilon (s), D_\nabla {\tilde{X}}_\epsilon (s) \right) {\textrm{d}}s \\&= {\textbf{E}}\int _{t_1}^{t_2} L_0\left( \psi ^0_\epsilon (t), {\bar{\psi }}_\epsilon (t,X(t)), D_\nabla {\tilde{X}}_\epsilon \big (\psi ^0_\epsilon (t)\big ) \right) \frac{{\textrm{d}}\psi ^0_\epsilon }{{\textrm{d}}t}(t) {\textrm{d}}t. \end{aligned} \end{aligned}$$

Since for all \([t_1,t_2]\subset [0,T]\) and each \(\epsilon \), \(\mathcal S[{\tilde{X}}_\epsilon , \psi ^0_\epsilon (t_1), \psi ^0_\epsilon (t_2)] = S[X,t_1,t_2]\), we have that the difference

$$\begin{aligned} L_0\left( \psi ^0_\epsilon (t), {\bar{\psi }}_\epsilon (t,X(t)), D_\nabla \tilde{X}_\epsilon (\psi ^0_\epsilon (t)) \right) \frac{{\textrm{d}}\psi ^0_\epsilon }{{\textrm{d}}t}(t) - L_0\left( t, X(t), D_\nabla X(t) \right) . \end{aligned}$$

is a martingale (depending on \(\epsilon \)). Taking derivatives with respect to \(\epsilon \) and evaluating at \(\epsilon =0\) for the above equality, and recalling that \(j^\nabla V = \frac{{\textrm{d}}}{{\textrm{d}}\epsilon } \big |_{\epsilon =0}j^\nabla \psi _\epsilon \), we can obtain the desired result. \(\square \)

Definition 7.29

Given a smooth function \(\Phi :{\mathbb {R}}\times M\rightarrow {\mathbb {R}}\). A \(\pi \)-projectable vector field V on \({\mathbb {R}}\times M\) is called an infinitesimal \(\Phi \)-divergence symmetry of \({\mathcal {S}}\), if

$$\begin{aligned} \left[ \big (j^\nabla V\big )(L_0) + L_0\dot{V}^0\right] \big (j^\nabla _t X\big ) = {\textbf{D}}_{\textrm{t}} \Phi \big (j^\nabla _t X\big ), \end{aligned}$$

for all \(X\in I_0^T(M)\) and \(t\in [0,T]\).

Recall that for the \(\pi \)-projectable vector field V of the form (4.9), we denote \({\bar{V}} = V^i \frac{\partial }{\partial {x^i}}\), as in Corollary 4.17.

Proposition 7.30

A vector field V of the form (4.9) is an infinitesimal \(\Phi \)-divergence symmetry of \({\mathcal {S}}\) if and only if

$$\begin{aligned} V^0 \partial _t L_0 + d_x L_0({\bar{V}}) + d_{\dot{x}} L_0 \left( \frac{\overline{{\textbf{D}}}{\bar{V}}}{dt} \right) - \dot{V}^0 E_0 = {\textbf{D}}_{\textrm{t}} \Phi . \end{aligned}$$

Proof

It follows from Corollary 4.17 and (7.19), (7.20) that

$$\begin{aligned} \begin{aligned} {\textbf{D}}_{\textrm{t}} \Phi&= V^0 \partial _t L_0 + V^i \partial _i L_0 \\ {}&\quad + \left[ \left( \partial _t + \dot{x}^j \partial _j \right) V^i + \textstyle {{\frac{1}{2}}} \left( \Delta {\bar{V}} + \textrm{Ric}({\bar{V}}) \right) ^i - \dot{V}^0 \dot{x}^i \right] \partial _{\dot{x}^i} L_0 +\dot{V}^0 L_0 \\&= V^0 \partial _t L_0 + d_x L_0({\bar{V}}) + d_{\dot{x}} L_0 \left( \left( \partial _t + \nabla _{\dot{x}} + \textstyle {{\frac{1}{2}}} \Delta _{\text {LD}} \right) {\bar{V}} \right) - \dot{V}^0 \left( \dot{x}^i \partial _{\dot{x}^i} L_0 - L_0 \right) \\&= V^0 \partial _t L_0 + d_x L_0({\bar{V}}) + d_{\dot{x}} L_0 \left( \frac{\overline{{\textbf{D}}}{\bar{V}}}{dt} \right) - \dot{V}^0 E_0. \end{aligned} \end{aligned}$$

This concludes the proof. \(\square \)

Corollary 7.31

Let \(L_0:{\mathbb {R}}\times TM\rightarrow {\mathbb {R}}\) be a hyperregular Lagrangian.

Let V be a vector field of the form (4.9). Given a smooth function \(\Phi :{\mathbb {R}}\times M\rightarrow {\mathbb {R}}\), define the \(\Phi \)-extension of V by

$$\begin{aligned} V_\Phi = V + \Phi \frac{\partial }{\partial {u}}, \end{aligned}$$
(7.68)

which is a vector field on \({\mathbb {R}}\times M\times {\mathbb {R}}\). Suppose that V satisfies

$$\begin{aligned} \frac{1}{2} \dot{V}^0 \Delta S = g^{ij} \nabla ^2_{\partial _i, \nabla _{\partial _j} {\bar{V}}} S, \end{aligned}$$

for S the solution of the Hamilton–Jacobi–Bellman equation (7.49) associated with \(L_0\) (for \(f\equiv 0\)). Then, V is an infinitesimal \(\Phi \)-divergence symmetry of \({\mathcal {S}}\) if and only if \(V_\Phi \) is an infinitesimal symmetry of equation (7.49).

Proof

By the classical jet bundle theory, we know that V is an infinitesimal symmetry of Hamilton–Jacobi–Bellman equation (7.49) if and only if (Olver 1998, Theorem 2.31)

$$\begin{aligned} j^{1,2} V \left( u_t + H_0(x, (u_i), t) + \textstyle {{\frac{1}{2}}} g^{ij}(x) u_{ij} - \textstyle {{\frac{1}{2}}} g^{ij}(x) \Gamma _{ij}^k(x) u_k \right) = 0, \end{aligned}$$
(7.69)

where

$$\begin{aligned} j^{1,2} V = V^0 \frac{\partial }{\partial {t}} + V^i \frac{\partial }{\partial {x^i}} + \Phi \frac{\partial }{\partial {u}} + V_t \frac{\partial }{\partial u_t} + V_i \frac{\partial }{\partial u_i} + V_{ij} \frac{\partial }{\partial u_{ij}}, \end{aligned}$$

with coefficients given by Olver (1998, Theorem 2.36 or Example 2.38)

$$\begin{aligned} V_t= & {} \frac{\partial \Phi }{\partial t} - \dot{V}^0 u_t - \frac{\partial V^i}{\partial t} u_i, \quad V_i = \frac{\partial \Phi }{\partial x^i} - \frac{\partial V^j}{\partial x^i} u_j, \\ V_{ij}= & {} \frac{\partial ^2 \Phi }{\partial x^i\partial x^j} - \frac{\partial ^2 V^k}{\partial x^i\partial x^j} u_k - \frac{\partial V^k}{\partial x^i} u_{jk} - \frac{\partial V^k}{\partial x^j} u_{ik}. \end{aligned}$$

Moreover, the jet coordinates \((u_t, u_i, u_{ij})\) satisfy

$$\begin{aligned} (u_t, u_i, u_{ij}) = (\partial _t S, \partial _i S, \partial _{ij} S) = (-E_0- \textstyle {{\frac{1}{2}}}\Delta S, \partial _{\dot{x}^i} L_0, \partial _{ij} S), \end{aligned}$$

where we recall \(dS = d_{\dot{x}}L_0\) from Eq. (7.29) and Remark 7.23, and also that \(\partial _t S = - H_0(dS,t) - \frac{1}{2} \Delta S = - E_0- \frac{1}{2} \Delta S\). Plugging these into (7.69) and using the fact that \(\partial _t H_0 = - \partial _t L_0\) and \(\partial _{x^i} H_0 = - \partial _{x^i} L_0\) due to classical Legendre transform, we have

$$\begin{aligned} \begin{aligned} 0=\,&V^0 \partial _t H_0 + V^i \left( \partial _{x^i} H_0 + \textstyle {{\frac{1}{2}}} \partial _i g^{jk} u_{jk} - \textstyle {{\frac{1}{2}}} \partial _i g^{jk} \Gamma _{jk}^l u_l - \textstyle {{\frac{1}{2}}} g^{jk} \partial _i \Gamma _{jk}^l u_l \right) \\&+ \left( \partial _t \Phi - \dot{V}^0 u_t - \partial _t V^i u_i \right) \\&+ \left( \partial _i \Phi - \partial _i V^l u_l \right) \left( \partial _{p_i} H_0 - \textstyle {{\frac{1}{2}}} g^{jk} \Gamma _{jk}^i\right) \\&+ \textstyle {{\frac{1}{2}}} g^{ij} \left( \partial _i \partial _j \Phi - \partial _i \partial _j V^k u_k - \partial _i V^k u_{jk} - \partial _j V^k u_{ik} \right) \\ =\,&V^0 \partial _t H_0 + V^i \partial _{x^i} H_0 - \left( \partial _t + \partial _{p_i} H_0 \partial _j \right) V^i u_i \\&- \textstyle {{\frac{1}{2}}} g^{ij} \left( \partial _i \partial _j V^k - \Gamma _{ij}^l \partial _l V^k + 2 \Gamma _{il}^k \partial _j V^l + \partial _l \Gamma _{ij}^k V^l \right) u_k \\&- \dot{V}^0 u_t - g^{ij} \left( \partial _j V^k + \Gamma _{jm}^k V^m \right) \left( u_{ik} - \Gamma _{ik}^l u_l \right) \\&+ \left[ \partial _t \Phi + \left( \partial _{p_i} H_0 - \textstyle {{\frac{1}{2}}} g^{jk} \Gamma _{jk}^i\right) \partial _i \Phi + \textstyle {{\frac{1}{2}}} g^{ij} \partial _i \partial _j \Phi \right] \\ =\,&- V^0 \partial _t L_0 - V^i \partial _{x^i} L_0 - \left( \partial _t + \dot{x}^j \partial _j \right) V^i \partial _{\dot{x}^i} L_0 - \textstyle {{\frac{1}{2}}} \left[ \Delta {\bar{V}} + \textrm{Ric}({\bar{V}}) \right] ^k \partial _{\dot{x}^k} L_0 \\&+ \dot{V}^0 \left( E_0 + \textstyle {{\frac{1}{2}}} \Delta S\right) - g^{ij} \nabla ^2_{\partial _i, \nabla _{\partial _j} {\bar{V}}} S + \left( \partial _t \Phi + \dot{x}^i \partial _i \Phi + \textstyle {{\frac{1}{2}}} \Delta \Phi \right) \\ =\,&- \left[ V^0 \partial _t L_0 + d_x L_0({\bar{V}}) + d_{\dot{x}} L_0 \left( \frac{\overline{{\textbf{D}}}{\bar{V}}}{dt} \right) - \dot{V}^0 E_0 \right] \\&+ \left( \textstyle {{\frac{1}{2}}} \dot{V}^0 \Delta S - g^{ij} \nabla ^2_{\partial _i, \nabla _{\partial _j} {\bar{V}}} S \right) + {\textbf{D}}_{\textrm{t}} \Phi , \end{aligned} \end{aligned}$$
(7.70)

where, in the last equality, we used the fact that \((QX)^{ij}(t) = g^{ij}(X(t))\) to derive \({\textbf{D}}_{\textrm{t}} \Phi \). The result then follows from Proposition 7.30. \(\square \)

Theorem 7.32

(Stochastic Noether’s theorem) Let \(L_0:{\mathbb {R}}\times TM\rightarrow {\mathbb {R}}\) be a hyperregular Lagrangian. Suppose that the vector field \(V_\Phi \) in (7.68) is an infinitesimal symmetry of the Hamilton–Jacobi–Bellman equation (7.49) associated with \(L_0\) (with \(f\equiv 0\)). Then, the following stochastic conservation law holds for the stochastic Euler–Lagrange equation (7.22),

$$\begin{aligned} {\textbf{D}}_{\textrm{t}} \left[ V^i \partial _{\dot{x}^i} L_0 - V^0 E - \Phi \right] = 0. \end{aligned}$$
(7.71)

Proof

Recall that \(dS = d_{\dot{x}}L_0\) and \(\partial _t S = - E_0- \frac{1}{2} \Delta S = -E\). By applying Lemma 7.8.(iv) and (7.22), as well as the fact that \((QX)^{ij}(t) = g^{ij}(X(t))\), we have

$$\begin{aligned} \begin{aligned} {\textbf{D}}_{\textrm{t}} \left[ d_{\dot{x}} L_0({\bar{V}}) \right]&= d_{\dot{x}}L_0 \left( \frac{\overline{{\textbf{D}}}{\bar{V}}}{dt} \right) + \frac{\overline{{\textbf{D}}}(d_{\dot{x}}L_0)}{dt} ({\bar{V}}) + (QX)^{ij} (\nabla _{\partial _i} (d_{\dot{x}} L_0) ) (\nabla _{\partial _j} {\bar{V}}) \\&= d_{\dot{x}}L_0 \left( \frac{\overline{{\textbf{D}}}{\bar{V}}}{dt} \right) + d_x L_0 ({\bar{V}}) + g^{ij} \nabla ^2_{\partial _i,\nabla _{\partial _j} {\bar{V}}} S. \end{aligned} \end{aligned}$$

Then, we use HJB equation (7.49) (with \(f\equiv 0\)) and the classical Legendre transform \(H_0 = d_{\dot{x}}L_0\cdot \dot{x} - L_0\) to derive

$$\begin{aligned} \begin{aligned} {\textbf{D}}_{\textrm{t}} E&= - {\textbf{D}}_{\textrm{t}} \partial _t S = - \partial _t \left( \partial _t + \nabla _{\dot{x}} + \textstyle {{\frac{1}{2}}} \Delta \right) S = - \partial _t \left[ dS \cdot \dot{x} + \left( \partial _t + \textstyle {{\frac{1}{2}}} \Delta \right) S \right] \\&= - \partial _t \left( d_{\dot{x}}L_0\cdot \dot{x} - H_0 \right) = - \partial _t L_0. \end{aligned} \end{aligned}$$

Combining these with the S-EL equation (7.22) and the criterion (7.70) for symmetries of the HJB equation (7.27), we have

$$\begin{aligned} \begin{aligned} {\textbf{D}}_{\textrm{t}} \left[ V^i \partial _{\dot{x}^i} L_0 - V^0 E - \Phi \right]&= {\textbf{D}}_{\textrm{t}} \left[ d_{\dot{x}} L_0({\bar{V}}) \right] - \dot{V}^0 E - V^0 {\textbf{D}}_{\textrm{t}} E - {\textbf{D}}_{\textrm{t}} \Phi \\&= d_{\dot{x}}L_0 \left( \frac{\overline{{\textbf{D}}}{\bar{V}}}{dt} \right) + d_x L_0 ({\bar{V}}) + g^{ij} \nabla ^2_{\partial _i,\nabla _{\partial _j} {\bar{V}}} S \\&\quad - \dot{V}^0 \left( E_0 + \textstyle {{\frac{1}{2}}} \Delta S\right) + V^0 \partial _t L_0 - {\textbf{D}}_{\textrm{t}} \Phi \\&= 0. \end{aligned} \end{aligned}$$

The result follows. \(\square \)

Remark 7.33

  1. (i)

    In stochastic Hamiltonian formalism, (7.71) reads as \({\textbf{D}}_{\textrm{t}}\left[ V^i p_i - V^0 H - \Phi \right] = 0\).

  2. (ii)

    The stochastic conservation law (6.19) of a time-independent g-canonical second-order Hamiltonian \(H = \overline{H}_0^g\) can be regarded as a special case of the above stochastic Noether’s theorem. Indeed, consider the infinitesimal unit time translation \(V = \frac{\partial }{\partial t}\), i.e., \(V^0 = 1\), \({\bar{V}} = 0\), \(\Phi =0\). Then, the criterion (7.70) reduces to \(0 = \partial _t L_0 = -\partial _t H_0\), which means that \(H = {\overline{H}}_0^g\) is time-independent. The resulting stochastic conservation law is \({\textbf{D}}_{\textrm{t}} E = {\textbf{D}}_{\textrm{t}} H =0\).

Applying stochastic Noether’s theorem to Schrödinger’s problem of Sect. 7.3, we have the following corollary. Its Euclidean case with zero vector potential (i.e., \(b\equiv 0\)) has already been formulated in Thieullen and Zambrini (1997).

Corollary 7.34

(Stochastic Noether’s theorem for Schrödinger’s problem) Let \(L_0\) be the Lagrangian given in (7.25). Suppose that the vector field \(V_\Phi \) in (7.68) is an infinitesimal symmetry of Hamilton–Jacobi–Bellman equation (7.27) with \(f\equiv 0\). Then, the following stochastic conservation law holds for the coordinate process of the solution of Schrödinger’s problem in (7.33),

$$\begin{aligned} {\textbf{D}}_{\textrm{t}} \left[ g_{ij} \left( D_\nabla ^j x - b^j \right) V^i - V^0 \left( E_0 + \textstyle {{\frac{1}{2}}} \Delta S \right) - \Phi \right] = 0, \end{aligned}$$

where \(E_0\) is the classical energy given in (7.45) and S is the solution of (7.27).