Stochastic discrete Hamiltonian variational integrators

Variational integrators are derived for structure-preserving simulation of stochastic Hamiltonian systems with a certain type of multiplicative noise arising in geometric mechanics. The derivation is based on a stochastic discrete Hamiltonian which approximates a type-II stochastic generating function for the stochastic flow of the Hamiltonian system. The generating function is obtained by introducing an appropriate stochastic action functional and its corresponding variational principle. Our approach permits to recast in a unified framework a number of integrators previously studied in the literature, and presents a general methodology to derive new structure-preserving numerical schemes. The resulting integrators are symplectic; they preserve integrals of motion related to Lie group symmetries; and they include stochastic symplectic Runge–Kutta methods as a special case. Several new low-stage stochastic symplectic methods of mean-square order 1.0 derived using this approach are presented and tested numerically to demonstrate their superior long-time numerical stability and energy behavior compared to nonsymplectic methods.


Introduction
Stochastic differential equations (SDEs) play an important role in modeling dynamical systems subject to internal or external random fluctuations. Standard references include [5,[27][28][29]42,50]. Within this class of problems, we are interested in stochastic Hamiltonian systems, which take the form (see [6,30,43]) where H = H (q, p) and h = h(q, p) are the Hamiltonian functions, W (t) is the standard one-dimensional Wiener process, and • denotes Stratonovich integration.
The system (1.1) can be formally regarded as a classical Hamiltonian system with the randomized Hamiltonian given by H (q, p) = H (q, p) + h(q, p) •Ẇ , where H (q, p) is the deterministic Hamiltonian and h(q, p) is another Hamiltonian, to be specified, which multiplies (in the Stratonovich sense, denoted as •) a one-dimensional Gaussian white noise,Ẇ . Such systems can be used to model, e.g., mechanical systems with uncertainty, or error, assumed to arise from random forcing, limited precision of experimental measurements, or unresolved physical processes on which the Hamiltonian of the deterministic system might otherwise depend. Particular examples include modeling synchrotron oscillations of particles in particle storage rings (see [17,56]) and stochastic dynamics of the interactions of singular solutions of the EPDiff basic fluids equation (see [23]). More examples are discussed in Sect. 4. See also [31,37,46,54,57,58,61]. As occurs for other SDEs, most Hamiltonian SDEs cannot be solved analytically and one must resort to numerical simulations to obtain approximate solutions. In principle, general purpose stochastic numerical schemes for SDEs can be applied to stochastic Hamiltonian systems. However, as for their deterministic counterparts, stochastic Hamiltonian systems possess several important geometric features. In particular, their phase space flows (almost surely) preserve the symplectic structure. When simulating these systems numerically, it is therefore advisable that the numerical scheme also preserves such geometric features. Geometric integration of deterministic Hamiltonian systems has been thoroughly studied (see [18,41,55] and the references therein) and symplectic integrators have been shown to demonstrate superior performance in long-time simulations of Hamiltonian systems, compared to non-symplectic methods; so it is natural to pursue a similar approach for stochastic Hamiltonian systems. This is a relatively recent pursuit. Stochastic symplectic integrators were first proposed in [43,44]. Stochastic generalizations of symplectic partitioned Runge-Kutta methods were analyzed in [13,35,36]. A stochastic generating function approach to constructing stochastic symplectic methods, based on approximately solving a corresponding stochastic Hamilton-Jacobi equation satisfied by the generating function, was proposed in [65,66], and this idea was further pursued in [2,4,16]. Stochastic symplectic integrators constructed via composition methods were proposed and analyzed in [45]. A first order weak symplectic numerical scheme and an extrapolation method were proposed and their global error was analyzed in [3]. More recently, an approach based on Padé approximations has been used to construct stochastic symplectic methods for linear stochastic Hamiltonian systems (see [60]). Higher-order strong and weak symplectic partitioned Runge-Kutta methods have been proposed in [67,68]. High-order conformal symplectic and ergodic schemes for the stochastic Langevin equation have been introduced in [25]. Other structure-preserving methods for stochastic Hamiltonian systems have also been investigated, see, e.g., [1,15,26], and the references therein.
Long-time accuracy and near preservation of the Hamiltonian by symplectic integrators applied to deterministic Hamiltonian systems have been rigorously studied using the so-called backward error analysis (see, e.g., [18] and the references therein). To the best of our knowledge, such rigorous analysis has not been attempted in the stochastic context as yet. However, the numerical evidence presented in the papers cited above is promising and suggests that stochastic symplectic integrators indeed possess the property of very accurately capturing the evolution of the Hamiltonian H over exponentially long time intervals (note that the Hamiltonian H in general does not stay constant for stochastic Hamiltonian systems).
An important class of geometric integrators are variational integrators. This type of numerical schemes is based on discrete variational principles and provides a natural framework for the discretization of Lagrangian systems, including forced, dissipative, or constrained ones. These methods have the advantage that they are symplectic, and in the presence of a symmetry, satisfy a discrete version of Noether's theorem. For an overview of variational integration for deterministic systems see [40]; see also [21,32,33,47,48,53,63,64]. Variational integrators were introduced in the context of finite-dimensional mechanical systems, but were later generalized to Lagrangian field theories (see [39]) and applied in many computations, for example in elasticity, electrodynamics, or fluid dynamics; see [34,49,59,62].
Stochastic variational integrators were first introduced in [8] and further studied in [7]. However, those integrators were restricted to the special case when the Hamiltonian function h = h(q) was independent of p, and only low-order Runge-Kutta types of discretization were considered. In the present work we extend the idea of stochastic variational integration to general stochastic Hamiltonian systems by generalizing the variational principle introduced in [33] and applying a Galerkin type of discretization (see [32,33,40,47,48]), which leads to a more general class of stochastic symplectic integrators than those presented in [7,8,35,36]. Our approach consists in approximating a generating function for the stochastic flow of the Hamiltonian system, but unlike in [65,66], we do make this discrete approximation by exploiting its variational characterization, rather than solving the corresponding Hamilton-Jacobi equation.
Main content The main content of the remainder of this paper is, as follows.
In Sect. 2 we introduce a stochastic variational principle and a stochastic generating function suitable for considering stochastic Hamiltonian systems, and we discuss their properties. In Sect. 3 we present a general framework for constructing stochastic Galerkin variational integrators, prove the symplecticity and conservation properties of such integrators, show they contain the stochastic symplectic Runge-Kutta methods of [35,36] as a special case, and finally present several particularly interesting examples of new low-stage stochastic symplectic integrators of mean-square order 1.0 derived with our general methodology. In Sect. 4 we present the results of our numerical tests, which verify the theoretical convergence rates and the excellent long-time performance of our integrators compared to some popular non-symplectic methods. Section 5 contains the summary of our work.

Variational principle for stochastic Hamiltonian systems
The stochastic variational integrators proposed in [7,8] were formulated for dynamical systems which are described by a Lagrangian and which are subject to noise whose magnitude depends only on the position q. Therefore, these integrators are applicable to (1.1) only when the Hamiltonian function h = h(q) is independent of p and the Hamiltonian H is non-degenerate (i.e., the associated Legendre transform is invertible). However, in the case of general h = h(q, p) the paths q(t) of the system become almost surely nowhere differentiable, which poses a difficulty in interpreting the meaning of the corresponding Lagrangian. Therefore, we need a different sort of action functional and variational principle to construct stochastic symplectic integrators for (1.1). To this end, we will generalize the approach taken in [33]. To begin, in the next section, we will introduce an appropriate stochastic action functional and show that it can be used to define a type-II generating function for the stochastic flow of the system (1.1).

Stochastic variational principle
Let the Hamiltonian functions H : T * Q −→ R and h : T * Q −→ R be defined on the cotangent bundle T * Q of the configuration manifold Q, and let (q, p) denote the canonical coordinates on T * Q. For simplicity, in this work we assume that the configuration manifold has a vector space structure, In this case, the natural pairing between one-forms and vectors can be identified with the scalar product on R N , that is, (q, p), (q,q) = p ·q, where (q,q) denotes the coordinates on T Q . Let (Ω, F , P) be the probability space with the filtration {F t } t≥0 , and let W (t) denote a standard one-dimensional Wiener process on that probability space (such that W (t) is F tmeasurable). We will assume that the Hamiltonian functions H and h are sufficiently smooth and satisfy all the necessary conditions for the existence and uniqueness of solutions to (1.1), and their extendability to a given time interval [t a , t b ] with t b > t a ≥ 0. One possible set of such assumptions can be formulated by considering the Itô form of (1.1), with z = (q, p) and where ∂ 2 h/∂q 2 , ∂ 2 h/∂ p 2 , and ∂ 2 h/∂q∂ p denote the Hessian matrices of h. For simplicity and clarity of the exposition, throughout this paper we assume that (see [5,[27][28][29]) (H1) H and h are C 2 functions of their arguments (H2) A and B are globally Lipschitz These assumptions are sufficient 1 for our purposes, but could be relaxed if necessary.
Define the space is a vector space (see [50]). Therefore, we can identify the tangent space We can now define the following stochastic action functional, (2.4) where • denotes Stratonovich integration, and we have omitted writing the elementary events ω ∈ Ω as arguments of functions, following the standard convention in stochastic analysis. Theorem 2.1 (Stochastic variational principle in phase space) Suppose that H (q, p) and h(q, p) satisfy conditions (H1)-(H2). If the curve q(t), p(t) in T * Q satisfies the stochastic Hamiltonian system (1.1) for t ∈ [t a , t b ], where t b ≥ t a > 0, then the pair q(·), p(·) is a critical point of the stochastic action functional (2.4), that is, almost surely for all variations δq(·), δ p(·) ∈ C([t a , t b ]) such that almost surely δq(t a ) = 0 and δ p( . It then follows that the stochastic processes q(t) and p(t) are almost surely continuous, F t -adapted semimartingales, that is, q(·), p(·) ∈ C([t a , t b ]) (see [5,50]). We calculate the variation (2.5) as where we have used the end point condition, δ p(t b ) = 0. Since the Hamiltonians are C 2 and the processes q(t), p(t) are almost surely continuous, in the last two lines we have used a dominated convergence argument to interchange differentiation with respect to and integration with respect to t and W (t). Upon applying the integration by parts formula for semimartingales (see [50]), we find Substituting and rearranging terms produces, where we have used δq(t a ) = 0. Since q(t), p(t) satisfy (1.1), then by definition we have that almost surely for all t ∈ [t a , t b ], , (2.9) that is, q(t) can be represented as the sum of two semi-martingales M 1 (t) and M 2 (t), where the sample paths of the process M 1 (t) are almost surely continuously differentiable. Let us calculate where in the last equality we have used the standard property of the Riemann-Stieltjes integral for the first term, as M 1 (t) is almost surely differentiable, and the associativity property of the Stratonovich integral for the second term (see [27,50]). Substituting (2.10) in (2.8), we show that the second term is equal to zero. By a similar argument we also prove that the first term in (2.8) is zero. Therefore, δB = 0, almost surely.
Remark It is natural to expect that the converse theorem, that is, if q(·), p(·) is a critical point of the stochastic action functional (2.4), then the curve q(t), p(t) satisfies (1.1), should also hold, although a larger class of variations (δq, δ p) may be necessary.
A variant of such a theorem, although for a slightly different variational principle and in a different setting, was proved in Lázaro-Camí and Ortega [30]. Another variant for Lagrangian systems was proved by Bou-Rabee and Owhadi [8] in the special case when h = h(q) is independent of p. In that case, one can assume that q(t) is continuously differentiable. In the general case, however, q(t) is not differentiable, and the ideas of [8] cannot be applied directly. We leave this as an open question. Here, we will use the action functional (2.4) and the variational principle (2.5) to construct numerical schemes, and we will directly verify that these numerical schemes converge to solutions of (1.1).

Stochastic type-II generating function
When the Hamiltonian functions H (q, p) and h(q, p) satisfy standard measurability and regularity conditions [e.g., (H1)-(H2)], then the system (1.1) possesses a pathwise unique stochastic flow F t,t 0 : Ω × T * Q −→ T * Q. It can be proved that for fixed t, t 0 this flow is mean-square differentiable with respect to the q, p arguments, and is also almost surely a diffeomorphism (see [5,[27][28][29]). Moreover, F t,t 0 almost surely preserves the canonical symplectic form Ω T * Q = N i=1 dq i ∧ dp i (see [6,30,44]), that is, where F * t,t 0 denotes the pull-back by the flow F t,t 0 . We will show below that the action functional (2.4) can be used to construct a type II generating function for F t,t 0 . Let (q(t),p(t)) be a particular solution of (1. , and an open neighborhood W (ω) ⊂ T * Q of the curve (q(ω, t),p(ω, t)) such that for all q a ∈ U (ω) and p b ∈ V (ω) there exists a pathwise unique solution (q(ω, t; q a , p b ),p(ω, t; q a , p b )) of (1.1) which satisfies q(ω, t a ; q a , p b ) = q a ,p(ω, t b ; q a , p b ) = p b , and (q(ω, t; q a , p b ),p(ω, t; q a , p b )) ∈ W (ω) for t a ≤ t ≤ t b . (As in the deterministic case, for t b sufficiently close to t a one can argue that such neighborhoods exist; see [38].) Define the function S : Y −→ R as S(q a , p b ) = B q(·; q a , p b ),p(·; q a , p b ) , (2.12) where the domain Y ⊂ Ω × T * Q is given by Below we prove that S generates 2 the stochastic flow F t b ,t a .

Theorem 2.2
The function S(q a , p b ) is a type-II stochastic generating function for the stochastic mapping F t b ,t a , that is, F t b ,t a : (q a , p a ) −→ (q b , p b ) is implicitly given by the equations (2.13) where the derivatives are understood in the mean-square sense.
Proof Under appropriate regularity assumptions on the Hamiltonians [e.g., (H1)-(H2)], the solutionsq(t; q a , p b ) andp(t; q a , p b ) are mean-square differentiable with respect to the parameters q a and p b , and the partial derivatives are semimartingales (see [5]). We calculate the derivative of S as (2.14) where for notational convenience we have omitted writing q a and p b explicitly as arguments ofq(t) andp(t). Applying the integration by parts formula for semimartingales (see [50]), we find • dp(t). (2.15) Substituting and rearranging terms, we obtain the result, We can consider S(q a , p b ) as a function of time if we let t b vary. Let us denote this function as S t (q a , p). Below we show that S t (q a , p) satisfies a certain stochastic partial differential equation, which is a stochastic generalization of the Hamilton-Jacobi equation considered in [33].
whereq(τ ) ≡q(τ ; q a , p) andp(τ ) ≡p(τ ; q a , p) as before. Then the function S 2 (q a , p, t) satisfies the following stochastic partial differential equation where d S 2 denotes the stochastic differential of S 2 with respect to the t variable.
Proof Choose an arbitrary pair (q a , p a ) ∈ T * Q and define the particular solution (q(τ ),p(τ )) = F τ,t a (q a , p a ). Form the function S 2 (q a ,p(t), t) and consider its total stochastic differential 3d S 2 (q a ,p(t), t) with respect to time. On one hand, the rules of Stratonovich calculus implȳ where d S 2 denotes the partial stochastic differential of S 2 with respect to the t argument. On the other hand, integration by parts in (2.17) implies Comparing (2.19) and (2.20), and using (2.13), we obtain (2.21) This equation is satisfied along a particular pathp(t), however, as in the discussion preceding Theorem 2.2, we can argue, slightly informally, that for almost all ω ∈ Ω, and for t sufficiently close to t a , one can find open neighborhoods U (ω) ⊂ Q and V (ω) ⊂ Q * which can be connected by the flow F t,t a , i.e., given q a ∈ U (ω) and p ∈ V (ω), there exists a pathwise unique solution (q(ω, τ ),p(ω, τ )) such that q(ω, t a ) = q a andp(ω, t) = p. With this assumption, (2.21) can be reformulated as the full-blown stochastic PDE (2.18).
Remark Similar stochastic Hamilton-Jacobi equations were introduced in [65,66], where they were used for constructing stochastic symplectic integrators by considering series expansions of generating functions in terms of multiple Stratonovich integrals. This was a direct generalization of a similar technique for deterministic Hamiltonian systems (see [18]). In this work we explore the generalized Galerkin framework for constructing approximations of the generating function S(q a , p b ) in (2.13) by using its variational characterization (2.12). Our approach is a stochastic generalization of the techniques proposed in [33,48] for deterministic Hamiltonian and Lagrangian systems.

Stochastic Noether's theorem
Let a Lie group G act on Q by the left action Φ : actions, respectively, given in coordinates by the formulas (see [22,38]) where i, j = 1, . . . , N and summation is implied over repeated indices. Let g denote the Lie algebra of G and exp : g −→ G the exponential map (see [22,38]). Each element ξ ∈ g defines the infinitesimal generators ξ Q , ξ T Q , and ξ T * Q , which are vector fields on Q, T Q, and T * Q, respectively, given by The momentum map J : T * Q −→ g * associated with the action Φ T * Q is defined as the mapping such that for all ξ ∈ g the function J ξ : . The momentum map J can be explicitly expressed as (see [22,38]) (2.25) Noether's theorem for deterministic Hamiltonian systems relates symmetries of the Hamiltonian to quantities preserved by the flow of the system. It turns out that this result carries over to the stochastic case, as well. A stochastic version of Noether's theorem was proved in [6,30]. For completeness, and for the benefit of the reader, below we state and provide a less involved proof of Noether's theorem for stochastic Hamiltonian systems.

Theorem 2.3 (Stochastic Noether's theorem) Suppose that the Hamiltonians H :
for all g ∈ G. Then the cotangent lift momentum map J : T * Q −→ g * associated with Φ T * Q is almost surely preserved along the solutions of the stochastic Hamiltonian system (1.1).
Proof Equation (2.26) implies that the Hamiltonians are infinitesimally invariant with respect to the action of G, that is, for all ξ ∈ g we have where d H and dh denote differentials with respect to the variables q and p. Let (q(t), p(t)) be a solution of (1.1) and consider the stochastic process J ξ (q(t), p(t)), where ξ ∈ g is arbitrary. Using the rules of Stratonovich calculus we can calculate the stochastic differential where we used (1.1), (2.24), and (2.27). Therefore, J ξ q(t), p(t) = const almost surely for all ξ ∈ g, which completes the proof.

Stochastic Galerkin Hamiltonian variational integrators
If the converse to Theorem 2.1 holds, then the generating function S(q a , p b ) defined in (2.12) could be equivalently characterized by where the extremum is taken pointwise in the probability space Ω. This characterization allows us to construct stochastic Galerkin variational integrators by choosing a finite dimensional subspace of C([t a , t b ]) and a quadrature rule for approximating the integrals in the action functional B. Galerkin variational integrators for deterministic systems were first introduced in [40], and further developed in [21,32,33,47,48]. In the remainder of the paper, we will generalize these ideas to the stochastic case.

Construction of the integrator
Suppose we would like to solve (1.1) on the interval [0, T ] with the initial con- In order to determine the discrete curve {(q k , p k )} k=0,...,K that approximates the exact solution of (1.1) at times t k we need to construct an approximation of the exact stochastic flow For convenience, we will express q(t) in terms of Lagrange polynomials. Consider the control points 0 = d 0 < d 1 < · · · < d s = 1 and let the corresponding Lagrange polynomials of degree s be denoted by l μ,s (τ ), that is, l μ,s (d ν ) = δ μν . A polynomial trajectory q d (t; q μ ) can then be expressed as . . , s are the control values,q d denotes the time derivative of q d , andl μ,s denotes the derivative of the Lagrange polynomial l μ,s with respect to its argument. The restriction of the action functional (2.4) to the space Next we approximate the integrals in (3.4) using numerical quadrature , where 0 ≤ c 1 < · · · < c r ≤ 1 are the quadrature points, and α i , β i are the corresponding weights. At this point we only make a general assumption that for each i we have α i = 0 or β i = 0. More specific examples will be presented in Sect. 3.4. The approximate action functional takes the form is the increment of the Wiener process over the considered time interval and is a Gaussian random variable with zero mean and variance t. The way of approximating the Stratonovich integral above was inspired by the ideas presented in [8,12,36,43,44]. Note that since we only used W = t k+1 t k dW (t) in the above approximation, we can in general expect mean-square convergence of order 1.0 only. To obtain mean-square convergence of higher order we would also need to include higher-order multiple Stratonovich integrals, e.g., to achieve convergence of order 1.5 we would need to include terms involving Z = .13): Equations (3.6) and (3.7) can be written together as the following system: , and for brevity we have introduced the notation Equation ( (3.9) where A > 0 is suitably chosen for the considered problem. See [14,44] for more details regarding schemes with truncated random increments and their convergence. Alternatively, one could employ the techniques presented in, e.g., [51,52,67], where the unbounded increments W have been replaced with discrete random variables.
Although the scheme (3.8) formally appears to be a straightforward generalization of its deterministic counterpart, it should be emphasized that the main difference lies in the fact that the increments W are random variables such that E( W 2 ) = t, which makes the convergence analysis significantly more challenging than in the deterministic case. The main difficulty is in the choice of the parameters s, r , α i , β i , c i , so that the resulting numerical scheme converges in some sense to the solutions of (1.1). The number of parameters and order conditions grows rapidly, when terms approximating multiple Stratonovich integrals are added (see Sect. 3.6 and [10][11][12]14]). In Sects. 3.2 and 3.3 we study the geometric properties of the family of schemes described by (3.8), whereas in Sects. 3.4 and 3.5 we provide concrete choices of the coefficients that lead to convergent methods.

Properties of stochastic Galerkin variational integrators
The key features of variational integrators are their symplecticity and exact preservation of the discrete counterparts of conserved quantities (momentum maps) related to the symmetries of the Lagrangian or Hamiltonian (see [40]). These properties carry over to the stochastic case, as was first demonstrated in [8] for Lagrangian systems.
In what follows, we will show that the stochastic Galerkin Hamiltonian variational integrators constructed in Sect. 3.1 also possess these properties.  .7). Then F + t k+1 ,t k is almost surely symplectic, that is,

10)
where Ω T * Q = N i=1 dq i ∧ dp i is the canonical symplectic form on T * Q. Proof The proof follows immediately by observing that (see [33]) 11) where d in the above formula denotes the differential operator with respect to the variables q and p and is understood in the mean-square sense. The result holds almost surely, because Eq. (3.7) holds almost surely.
The discrete counterpart of stochastic Noether's theorem readily generalizes from the corresponding theorem in the deterministic case.
where π Q * : Q × Q * −→ Q * is the projection onto Q * , then the cotangent lift momentum map J associated with Φ T * Q is almost surely preserved, i.e., a.s.
Proof See the proof of Theorem 4 in [33].

14)
for all g ∈ G, and suppose the interpolating polynomial q d (t; q μ ) is equivariant with respect to G. Then the generalized discrete stochastic Lagrangian Proof The proof is similar to the proofs of Lemma 3 in [33] and Theorem 3 in [48]. Let us, however, carefully examine the actions of G on Q, T Q, and T * Q. Let q k ∈ Q and p k+1 ∈ Q * , and let q k+1 = D 2 H + d (q k , p k+1 ). First, note that for the stochastic discrete Hamiltonian (3.6), we have where we used (3.8e). Considerq k = Φ g (q k ) and (q k+1 ,p k+1 ) = Φ T * Q g (q k+1 , p k+1 ) for g ∈ G, and calculate (3.15) for the transformed valuesq k andp k+1 : Let us perform a change of variables with respect to which we extremize. First, define q μ = Φ g −1 (q μ ), so thatq μ = Φ g (q μ ) for μ = 0, . . . , s. From (3.13) we have q d (t k + c i t;q μ ) = Φ g (q d (t k + c i t; q μ )), which we use to define P i by q d (t k + c i t;q μ ),P i = Φ T * Q g q d (t k +c i t; q μ ), P i for i = 1, . . . , r . Since Φ g and Φ T * Q g are bijective, extremization with respect to q μ and P i is equivalent to extremization with respect toq μ andP i , andq 0 =q k implies q 0 = q k . Moreover, from (3.13) and (2.22) we have thatP iqd (t k +c i t;q μ ) = P iqd (t k +c i t; q μ ). Finally, the invariance of the Hamiltonians implies

Stochastic symplectic partitioned Runge-Kutta methods
A general class of stochastic Runge-Kutta methods for Stratonovich ordinary differential equations was introduced and analyzed in [10][11][12]. These ideas were later used by Ma et al. [35] and Ma and Ding [36] to construct symplectic Runge-Kutta methods for stochastic Hamiltonian systems. An s-stage stochastic symplectic partitioned Runge-Kutta method for (1.1) is defined in [36] by the following system: where Q i and P i for i = 1, . . . , s are the position and momentum internal stages, respectively, and the coefficients of the method a i j ,ā i j , b i j ,b i j , α i , β i satisfy the symplectic conditions (3.19d) for i, j = 1, . . . , s. We now prove that in the special case when r = s, the stochastic Galerkin variational integrator (3.8) is equivalent to the stochastic symplectic partitioned Runge-Kutta method (3.18).
Moreover, let the weights α i be given by for i, j = 1, . . . , s.

Proof
The proof follows the main steps of the proof of Theorem 2.6.2 in [40]. The time derivativeq d is a polynomial of degree s − 1. Therefore, it can be uniquely expressed in terms of the Lagrange polynomialsl j,s−1 (τ ) aṡ Upon integrating with respect to time, we find where we have used q 0 = q k . For τ = 1 this gives where we have used q s = q k+1 and (3.20). Define the internal stages Q j ≡ q d (t k + c j t). Then, upon using (3.8d), Eq. (3.24) becomes (3.18c). For τ = c i Eq. (3.23) gives where a i j is defined by (3.21a). Upon substituting (3.8d), Eq.

Examples
In the construction of the integrator (3.8) we may choose the degree s of the approximating polynomials and the quadrature rules In the deterministic case, the higher the degree of the polynomials and the higher the order of the quadrature rule, then the higher the order of convergence of the resulting integrator (see [48]). In our case, however, as explained in Sect. 3.1, we cannot in general achieve mean-square order of convergence higher than 1.0, because we only used W in (3.5). Since the system (3.8) requires solving (s + r )N equations for (s + r )N variables, from the computational point of view it makes sense to only consider methods with low values of s and r . In this work we focus on the following classical numerical integration formulas (see [18][19][20] In [48] notation Ps Nr Qu was proposed to denote a Galerkin variational integrator based on polynomials of degree s and a quadrature rule of order u with r quadrature points. We adopt a similar notation, keeping in mind that u denotes the classical order of the used quadrature rule-when the rule is applied to a stochastic integral, as in (3.5), its classical order is not attained in general. We also use a three-letter code to identify which integration formula is used. For example, P2N 2Q4Gau denotes the integrator defined by (3.8) using polynomials of degree 2 and the Gauss-Legendre quadrature formula of classical order 4 with 2 quadrature points for both the Lebesgue and Stratonovich integrals in (3.5). If two different quadrature rules are used, we first write the rule applied to the Lebesgue integral, followed by the rule applied to the Stratonovich integral, e.g., P1N 1Q2Gau N 2Q2Lob. Below we give several examples of integrators obtained by using polynomials of degree s = 1, 2 and the quadrature rules listed above.

General Hamiltonian function h(q, p)
For a general Hamiltonian h = h(q, p), Eq. (3.8d), which represents the discretization of the Legendre transform, needs to contain both ∂ H /∂ p and ∂h/∂ p terms to correctly approximate the continuous system. Therefore, we only consider methods with α i = β i = 0 for all i = 1, . . . , r . A few examples of interest are listed below.

P1N 2Q2Lob (Stochastic trapezoidal method)
This integrator is based on polynomials of degree s = 1 with control points d 0 = 0, d 1 = 1, and the trapezoidal rule. Equations (3.8) take the form This integrator is a stochastic generalization of the trapezoidal method for deterministic systems (see [40]). One may easily verify that if the Hamiltonians are separable, that is, H (q, p) = T 0 ( p) + U 0 (q) and h(q, p) = T 1 ( p) + U 1 (q), then P 1 = P 2 and (3.30) is equivalent to the Störmer-Verlet method (3.29) and is fully explicit.

P1N 2Q2Otr
Like the method (3.30), this integrator is based on polynomials of degree s = 1 with control points d 0 = 0, d 1 = 1, but uses the open trapezoidal rule (r = 2, c 1 = 1/3, c 2 = 2/3, α 1 = 1/2, α 2 = 1/2, β i = α i ). Equations (3.8) take the form (3. 33) In general one has to solve the first, third, and fourth equation simultaneously (3N equations for 3N variables). In case of separable Hamiltonians we have P 1 = P 2 and one only needs to solve N nonlinear equations: P 1 can be explicitly calculated from the first equation and substituted into the third one, and the resulting nonlinear equation then has to be solved for q k+1 .

P2N 2Q2Otr
If the open trapezoidal rule is used with polynomials of degree s = 2, we obtain yet another partitioned Runge-Kutta method (3.18) with a 11 =ā 22 = 1/2, a 12 = a 12 = −1/6, a 21 =ā 21 = 2/3, The resulting integrator is also computationally expensive in general, but if the Hamiltonians H and h are separable, then (3.8d) implies P 1 = P 2 = P 3 , and the integrator can be rewritten as where (3.35) and H (q, p) = T 0 ( p) + U 0 (q) and h(q, p) = T 1 ( p) + U 1 (q). In this case only the first equation needs to be solved for q k+1 , and then the second equation is an explicit update.

Hamiltonian function h(q) independent of momentum
In case the Hamiltonian function h = h(q) is independent of the momentum variable p, the term ∂h/∂ p does not enter Eq. (3.8d), and therefore we can allow a choice of quadrature rules such that α i = 0 or β i = 0 for some i. If α i = 0, however, the system (3.8) becomes underdetermined, but at the same time the corresponding P i does not enter any of the remaining equations, therefore we can simply ignore it. To simplify the notation, let i 1 < · · · < ir be the set of indices such that α i m = 0, and denotē α m ≡ α i m ,c m ≡ c i m for m = 1, . . . ,r . Similarly, let j 1 < · · · < jr be the set of indices such that β j m = 0, and denoteβ m ≡ β i m ,c m ≡ c j m for m = 1, . . . ,r . In (3.8) leave out the terms and equations corresponding to α i = 0 or β i = 0, and replace α i , β i , c i and r byᾱ i ,β i ,c i ,c i ,r andr , accordingly. In other words, this is equivalent to using the quadrature rules

P1N 1Q1Rec (Stochastic symplectic Euler method)
The rectangle rule (r = 1,c 1 = 1,ᾱ 1 = 1) does not yield a convergent numerical scheme in the general case, but when h = h(q), the Itô and Stratonovich interpretations of (1.1) are equivalent, and the rectangle rule can be used to construct efficient integrators. In fact, applying the rectangle rule to both the Lebesgue and Stratonovich integrals and using polynomials of degree s = 1 yield a method which can be written as This method is a straightforward generalization of the symplectic Euler scheme (see [18,40]) and is particularly computationally efficient, as only the first equation needs to be solved for q k+1 , and then the second equation is an explicit update. Moreover, in case the Hamiltonian H is separable, the method becomes explicit. 2 . P1N 1Q1RecN 2Q2Lob The accuracy of the stochastic symplectic Euler scheme above can be improved by applying the trapezoidal rule to the Stratonovich integral instead of the rectangle rule. The resulting integrator takes the form where  P1N 1Q1RecN 1Q2Gau Similarly, if we apply the midpoint rule instead of the trapezoidal rule, we obtain the following modification of the stochastic symplectic Euler method: where  P2N 2Q2LobN 1Q1Rec A modification of the stochastic Störmer-Verlet method (3.29) is obtained if we use the rectangle rule to approximate the Stratonovich integral: This integrator has a similar computational cost as the stochastic Störmer-Verlet method (see Sect. 4), but it yields a slightly less accurate solution (see Sect. 4). Moreover, in case the Hamiltonian H is separable, the method becomes explicit.
This integrator is a modification of the stochastic midpoint method (3.28). We apply the midpoint rule (r = 1,c 1 = 1/2,ᾱ 1 = 1) to the Lebesgue integral in (3.4), and the trapezoidal rule (r = 2,c 1 = 0,c 2 = 1,β 1 = 1/2,β 2 = 1/2) to the Stratonovich integral. The resulting numerical scheme can be written as where This method is fully implicit, but similar to (3.28), simplifies when the Hamiltonian H is separable. 6. P1N 2Q2LobN 1Q2Gau If instead we apply the trapezoidal rule to the Lebesgue integral and the midpoint rule to the Stratonovich integral in (3.4), we obtain a modification of the stochastic trapezoidal rule (3.30): This method becomes explicit when the Hamiltonian H is separable and the noise is additive, i.e., ∂h/∂q = const.

Convergence
Various criteria for convergence of stochastic schemes have been suggested in the literature (see [28,42]). Some criteria concentrate on pathwise approximations of the exact solutions (mean-square convergence, strong convergence), while others focus on approximations of some functionals instead (weak convergence). We are here primarily interested in mean-square convergence. Letz(t) = (q(t),p(t)) be the exact solution to (1.1) with the initial conditions q 0 and p 0 , and let z k = (q k , p k ) denote the numerical solution at time t k obtained by applying (3.8) iteratively k times with the constant time step t. The numerical solution is said to converge in the mean-square sense with global order r if there exist δ > 0 and a constant C > 0 such that for all t ∈ (0, δ) we have where T = K t, as defined before, and E denotes the expected value. In principle, in order to determine the mean-square order of convergence of the Galerkin variational integrator (3.8) we need to calculate the power series expansions of q k+1 and p k+1 in terms of the powers of t and W , and compare them to the Stratonovich-Taylor expansions for the exact solutionq(t k + t) andp(t k + t) (see [12,28,42]). It is quite a tedious task to do in the general case, therefore we only discuss the examples presented in Sect. 3.4. For instance, in case of the stochastic trapezoidal method (3.30) we plug in series expansions for P 1 , P 2 , q k+1 and p k+1 , and determine their coefficients by expanding the derivatives of the Hamiltonians into Taylor series around (q k , p k ) and comparing the terms corresponding to the like powers of t and W . We find that (3.46) where the derivatives of the Hamiltonians are evaluated at (q k , p k ). Letq(t; q k , p k ) andp(t; q k , p k ) denote the exact solution of (1.1) such thatq(t k ; q k , p k ) = q k and p(t k ; q k , p k ) = p k . Using (1.1) we calculate the Stratonovich-Taylor expansions for q(t k+1 ; q k , p k ) andp(t k+1 ; q k , p k ), and comparing them to (3.46) we find that (3.47) Using Theorem 1.1 from [42], we conclude that the stochastic trapezoidal method (3.30) has mean-square order of convergence r = 1. In a similar fashion we prove that all methods presented in Sect. 3.4 are convergent with mean-square order 1. We further verify these results numerically in Sect. 4.1.
Remark For simplicity and clarity of the exposition, in this work we are mainly concerned with a one-dimensional noise in (1.1). However, all of the constructions and results presented in Sects. 2 and 3 generalize in a straightforward manner, when a multidimensional noise W 1 , W 2 , . . . , W M , together with the corresponding Hamiltonian functions h 1 , h 2 , . . . , h M , is considered in (1.1), except that the integrators derived in Sect. 3.4 in general do not attain mean-square order 1.0 of convergence, unless the noise is commutative. Indeed, if we repeat the procedure described above for the stochastic trapezoidal method, we will obtain the following power series expansions in terms of the powers of t and W i : The latter in particular means that the methods presented in Sect. 3.4.2 retain their mean-square order of convergence for multidimensional noises.

Methods of order 3/2
In order to construct stochastic Galerkin variational integrators of higher order one needs to include higher order terms in the discretization of the Stratonovich integral in (3.5). For example, a method of mean-square order 3/2 must include terms involving The random variables W and Z have a Gaussian joint distribution (see [28,44]), and at each time step they can be simulated by two independent N (0, 1)-distributed random variables χ and η as In order to achieve mean-square convergence of order 3/2 one needs to determine appropriate values for the parameters s, r , α i , β i , γ i , and c i . However, we will not attempt to achieve this in the present work. Instead, we will show that some known stochastic symplectic integrators can be derived as stochastic Galerkin variational integrators. Suppose the Hamiltonian is separable, i.e., H (q, p) = T ( p) + U (q), and the Hamiltonian function h = h(q) does not depend on momentum. Consider the discrete Hamiltonian where different weightsᾱ i and α i were applied to the potential U (q) and kinetic T ( p) terms, respectively. Similar to (3.8), the corresponding stochastic variational integrator takes the form where μ = 1, . . . , s−1 in the second equation, and i = 1, . . . , r in the fourth equation.
In the special case when r = s and with the coefficients where we assume α i = 0 andᾱ i = 0 for all i. Partitioned Runge-Kutta methods of type (3.59) were considered in [44]. In particular, it was shown that for s = 2 the choice of the coefficients gives a method of mean-square order 3/2 (see Theorem 4.3 in [44]).

Numerical experiments
In this section we present the results of our numerical experiments. We verify numerically the convergence results from Sect. 3.5 and investigate the conservation properties of our integrators. In particular, we show that our stochastic variational integrators demonstrate superior behavior in long-time simulations compared to some popular general purpose non-symplectic stochastic algorithms.

Kubo oscillator
In order to test the convergence of the numerical algorithms from Sect. 3.4.1 we performed computations for the Kubo oscillator, which is defined by H (q, p) = where β is the noise intensity (see [44]). The Kubo oscillator is used in the theory of magnetic resonance and laser physics. The exact solution is given bȳ q(t) = p 0 sin(t + βW (t)) + q 0 cos(t + βW (t)), p(t) = p 0 cos(t + βW (t)) − q 0 sin(t + βW (t)), (4.1) where q 0 and p 0 are the initial conditions. Simulations with the initial conditions q 0 = 0, p 0 = 1 and the noise intensity β = 0.1 were carried out until the time T = 3.2 for a number of time steps t = 0.000625, 0.00125, 0.0025, 0.005, 0.01, 0.02. In each case 2000 sample paths were generated. Let z t (t) = (q t (t), p t (t)) denote the numerical solution. We used the exact solution (4.1) as a reference for computing the mean-square error E(|z t (T ) −z(T )| 2 ), wherez(t) = (q(t),p(t)).
The dependence of this error on the time step t is depicted in Fig. 1. We verified that our algorithms have mean-square order of convergence 1.0. The integrators P1N 3Q4Lob, P1N 3Q4Mil, P1N 2Q2Lob (stochastic trapezoidal method), and P2N 2Q2Lob (stochastic Störmer-Verlet method) turned out to be the most accurate, with the latter two having least computational cost.

Synchrotron oscillations of particles in storage rings
We carried out a similar test for the numerical schemes from Sect. 3.4.2. We performed computations for the stochastic Hamiltonian system defined by H (q, p) = p 2 /2 − cos q and h(q) = β sin q, where β is the noise intensity. Systems of this type are used for modeling synchrotron oscillations of a particle in a storage ring. Due to fluctuating Fig. 2 The mean-square error at the time T = 3.2 as a function of the time step t for the simulations of the synchrotron oscillations of a particle in a storage ring with the initial conditions q 0 = 0, p 0 = 1 and the noise intensity β = 0.1. In each case 2000 sample paths were generated. The tested integrators proved to be convergent of order 1.0 in the mean-square sense. Note that the plots for P1N 1Q1Rec, P1N 1Q1RecN 2Q2Lob,  and P1N 1Q1RecN 1Q2Gau, as well as for P2N 2Q2Lob and P1N 2Q2LobN 1Q2Gau, overlap very closely electromagnetic fields, a particle performs stochastically perturbed oscillations with respect to a reference particle which travels with fixed energy along the design orbit of the accelerator; in this description p corresponds to the energy deviation of the particle from the reference particle, and q measures the longitudinal phase difference of both particles (see [17,56] for more details). Simulations with the initial conditions q 0 = 0, p 0 = 1 and the noise intensity β = 0.1 were carried out until the time T = 3.2 for a number of time steps t = 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64. In each case 2000 sample paths were generated. The mean-square error was calculated with respect to a high-precision reference solution generated using the order 3/2 strong Taylor scheme (see [28], Chapter 10.4) with a very fine time step t = 2 · 10 −6 . The dependence of this error on the time step t is depicted in Fig. 2. We verified that our algorithms have mean-square order of convergence 1.0.

Kubo oscillator
One can easily check that in the case of the Kubo oscillator the Hamiltonian H (q, p) stays constant for almost all sample paths, i.e., H (q(t),p(t)) = H (q 0 , p 0 ) almost surely. We used this example to test the performance of the integrators from Sect. 3.4.1.
Simulations with the initial conditions q 0 = 0, p 0 = 1, the noise intensity β = 0.1, and the relatively large time step t = 0.25 were carried out until the time T = 1000 (approximately 160 periods of the oscillator in the absence of noise) for a single realization of the Wiener process. For comparison, similar simulations were carried out using non-symplectic explicit methods like Milstein's scheme and the order 3/2 strong Taylor scheme (see [28]). The numerical value of the Hamiltonian H (q, p) as a function of time for each of the integrators is depicted in Fig. 3. We find that the non-symplectic schemes do not preserve the Hamiltonian well, even if small time steps are used. For example, we find that Milstein's scheme does not give a satisfactory solution even with t = 0.001, and though the Taylor scheme with t = 0.05 yields a result comparable to the variational integrators, the growing trend of the numerical Hamiltonian is evident. On the other hand, the variational integrators give numerical solutions for which the Hamiltonian oscillates around the true value (one can check via a direct calculation that the stochastic midpoint method (3.28) in this case preserves the Hamiltonian exactly; of course this does not necessarily hold in the general case).

Anharmonic oscillator
In general the Hamiltonian H (q, p) does not stay constant for stochastic Hamilton equations. To determine how well our integrators perform in such cases we considered the anharmonic oscillator defined by H (q, p) = p 2 /2 + γ q 4 and h(q) = βq, where β is the noise intensity and γ is a parameter. One can calculate the expected value of the Hamiltonian analytically as E H q(t), p(t) = H q 0 , p 0 + β 2 2 t, (4.2) that is, the mean value of the Hamiltonian grows linearly in time (see [56]). Simulations with the initial conditions q 0 = 0, p 0 = 1, the parameter γ = 0.1, and the noise intensity β = 0.1 were carried out until the time T = 784 (approximately 100 periods of the oscillator in the absence of noise). In each case 10,000 sample paths were generated. The numerical value of the mean Hamiltonian E(H ) as a function of time for each of the integrators is depicted in Fig. 4. We see that the variational integrators accurately capture the linear growth of E(H ), whereas the Taylor scheme fails to reproduce that behavior even when a smaller time step is used. It is worth noting that the integrators P1N 1Q1RecN 2Q2Lob and P1N 1Q1RecN 1Q2Gau  To avoid numerical difficulties, one could in principle use the truncated increments (3.9) with, e.g., A = (3 − t)/(2β) (for t < 3). However, given the negligible probability that | W | > A for the parameters used in Sects. 4.1.1 and 4.2.1, we did not observe any numerical issues, even though we did not use truncated increments. In the case of all the other numerical experiments presented in Sect. 4, the applied algorithms either turned out to be explicit, or the corresponding nonlinear systems of equations had solutions for all values of W . Nonlinear equations were solved using Newton's method and the previous time step values of the position q k and momentum p k were used as initial guesses.

Summary
In this paper we have presented a general framework for constructing a new class of stochastic symplectic integrators for stochastic Hamiltonian systems. We generalized the approach of Galerkin variational integrators introduced in [33,40,48] to the stochastic case, following the ideas underlying the stochastic variational integrators introduced in [8]. The solution of the stochastic Hamiltonian system was approximated  P1N 1Q1RecN 2Q2Lob and P1N 1Q1RecN 1Q2Gau prove to be particularly accurate, while having a low computational cost by a polynomial of degree s, and the action functional was approximated by a quadrature formula based on r quadrature points. We showed that the resulting integrators are symplectic, preserve integrals of motion related to Lie group symmetries, and include stochastic symplectic Runge-Kutta methods introduced in [35,36,44] as a special case when r = s. We pointed out several new low-stage stochastic symplectic methods of mean-square order 1.0 for systems driven by a one-dimensional noise, both for the case of a general Hamiltonian function h = h(q, p) and a Hamiltonian function h = h(q) independent of p, and demonstrated their superior long-time numerical stability and energy behavior via numerical experiments. We also stated the conditions under which these integrators retain their first order of convergence when applied to systems driven by a multidimensional noise.
Our work can be extended in several ways. In Sect. 3.6 we indicated how higherorder stochastic variational integrators can be constructed and showed that a type of stochastic symplectic partitioned Runge-Kutta methods of mean-square order 3/2 considered in [44] can be recast in that formalism. It would be interesting to derive new stochastic integrators of order 3/2 by choosing appropriate values for the parameters in (3.54) or (3.56). It would also be interesting to apply the Galerkin approach to construct stochastic variational integrators for constrained (see [7]) and dissipative (see [9]) stochastic Hamiltonian systems, and systems defined on Lie groups (see [32]). Another important problem would be stochastic variational error analysis. That is, rather than considering how closely the numerical solution follows the exact trajectory of the system, one could investigate how closely the discrete Hamiltonian matches the exact generating function. In the deterministic setting these two notions of the order of convergence are equivalent (see [40]). It would be instructive to know if a similar result holds in the stochastic case. A further vital task would be to develop higher-order weakly convergent stochastic variational integrators. As mentioned in Sects. 3.1 and 3.6, higher-order methods require inclusion of higher-order multiple Stratonovich integrals, which are cumbersome to simulate in practice. In many cases, though, one is only interested in calculating the probability distribution of the solution rather than precisely approximating each sample path. In such cases weakly convergent methods are much easier to use (see [28,42]). Finally, one may extend the idea of variational integration to stochastic multisymplectic partial differential equations such as the stochastic Korteweg-de Vries, Camassa-Holm or Hunter-Saxton equations. Theoretical groundwork for such numerical schemes has been recently presented in [24].