1 Introduction

We consider systems of stochastic differential equations (SDEs) in \(\mathbb {R} ^d\) subject to a smooth scalar constraint and a Stratonovich noise of the form

$$\begin{aligned} \mathrm{d}X(t)=\Pi _\mathcal {M} (X(t)) f(X(t))\mathrm{d}t +\Pi _\mathcal {M} (X(t)) \Sigma (X(t)) \circ \mathrm{d}W(t),\quad X(0)=X_0\in \mathcal {M}, \nonumber \\ \end{aligned}$$

where \(\Pi _\mathcal {M} :\mathbb {R} ^d\rightarrow \mathbb {R} ^{d\times d}\) is the orthogonal projection on the tangent bundle of the manifold \(\mathcal {M} =\{x\in \mathbb {R} ^d,\zeta (x)=0\}\) of codimension q\(\zeta :\mathbb {R} ^d\rightarrow \mathbb {R} ^q\) is a given constraint, \(f:\mathbb {R} ^d\rightarrow \mathbb {R} ^d\) is a smooth drift, \(\Sigma :\mathbb {R} ^d\rightarrow \mathbb {R} ^{d\times d}\) is a smooth diffusion coefficient, and W is a standard d-dimensional Brownian motion in \(\mathbb {R} ^d\) on a probability space equipped with a filtration and fulfilling the usual assumptions. For simplicity of the analysis, we assume that \(\mathcal {M} \) is a compact smooth manifold of codimension \(q=1\). The smoothness and compactness of \(\mathcal {M} \) guarantee in particular the existence and uniqueness of a solution to (1.1) with bounded moments for all times \(t>0\).Footnote 1 In addition, thanks to the projection operator \(\Pi _\mathcal {M} \), the solution X(t) lies on \(\mathcal {M} \) for all \(t>0\). In the additive noise case where \(\Sigma (x)=\sigma I_d\) with \(\sigma >0\), Eq. (1.1) can also be rewritten equivalently with a Lagrange multiplier (see [44, Sect.] or [45, Sect. 3.3]) as:

$$\begin{aligned} \mathrm{d}X(t)=f(X(t))\mathrm{d}t +\sigma \mathrm{d}W(t)+g(X(t)) \mathrm{d}\lambda _t, \quad \zeta (X(t))=0,\quad X(0)=X_0\in \mathcal {M}, \end{aligned}$$

where \(g=\nabla \zeta \) and \(\lambda \) is an adapted stochastic process determined by the equation \(\zeta (X)=0\).

A major motivation of model (1.1) appears in computational problems in molecular dynamics with the constrained overdamped Langevin equation (obtained in the particular case where \(\Sigma (x)=\sigma I_d\) is a constant homothety),

$$\begin{aligned} \mathrm{d}X(t)=\Pi _\mathcal {M} (X(t)) f(X(t))\mathrm{d}t +\sigma \Pi _\mathcal {M} (X(t)) \circ \mathrm{d}W(t),\quad X(0)=X_0\in \mathcal {M},\nonumber \\ \end{aligned}$$

with \(\sigma >0\)\(f=-\nabla V\) and \(V:\mathbb {R} ^d\rightarrow \mathbb {R} \) is a smooth potential. The overdamped Langevin equation is widely used to model the motion of a set of particles subject to a potential V in a high-friction regime. The possible constraints can be induced for example by strong covalent bonds between atoms, or fixed angles in molecules. Sampling from the constrained overdamped Langevin equation allows to compute the so-called free energy, which is a key quantity in thermodynamic (see, for instance, [18, 44, 45] and references therein). Equations of the form (1.1) appear naturally when studying conservative SDEs, that is, SDEs possessing an invariant H conserved almost surely by all realizations of (1.1). The solution of conservative SDEs is subject to the constraint \(\zeta (X)=0\) with \(\zeta (x)=H(x)-H(X_0)\). Drawing samples on a manifold also has many applications in statistics (see [10, 23] and references therein).

Under regularity conditions on the generator of the SDE and on \(\mathcal {M} \), it was shown in [18, 25] that the solution X(t) of the SDE (1.1) is ergodic, that is, there exists a unique invariant measure \(\mathrm{d}\mu _\infty \) on \(\mathcal {M} \) that has a density \(\rho _\infty \) with respect to \(\mathrm{d}\sigma _\mathcal {M} \), the canonical measure on \(\mathcal {M} \) induced by the Euclidean metric of \(\mathbb {R} ^d\), such that for all test functions \(\phi \in \mathcal {C} ^\infty (\mathbb {R} ^d,\mathbb {R})\),

$$\begin{aligned} \lim _{T\rightarrow \infty }\frac{1}{T}\int _0^T \phi (X(t)) \mathrm{d}t= \int _{\mathcal {M}} \phi (x) \mathrm{d}\mu _\infty (x)\quad \text {almost surely}. \end{aligned}$$

In the case of the overdamped Langevin Eq. (1.2) on \(\mathcal {M} \), the process is naturally ergodic and the invariant measure is given by \(\mathrm{d}\mu _\infty =\rho _\infty \mathrm{d}\sigma _\mathcal {M} =\frac{1}{Z}\exp \left( -\frac{2}{\sigma ^2}V\right) \mathrm{d}\sigma _\mathcal {M} \) with \(Z=\int _{\mathcal {M}}\exp \left( -\frac{2}{\sigma ^2}V\right) \mathrm{d}\sigma _\mathcal {M} \). Approximating the quantity \(\int _{\mathcal {M}} \phi (x) \mathrm{d}\mu _\infty (x)\) is a computational challenge when the dimension d is high, which is the case in the context of molecular dynamics where the dimension is proportional to the number of particles, because a standard quadrature formula becomes prohibitively expensive. We emphasize that \(\mu _\infty \) is singular with respect to the Lebesgue measure on \(\mathbb {R} ^d\). In addition, the integrator samples should remain on the manifold \(\mathcal {M} \). Hence, the order conditions for sampling the invariant measure in the Euclidean context of \(\mathbb {R} ^d\) do not generalize straightforwardly to the manifold case. The main goal of this article is to build and analyse high-order one-step integrators for approximating \(\int _{\mathcal {M}} \phi (x) \mathrm{d}\mu _\infty (x)\) that lie on the manifold \(\mathcal {M} \) and that have the form

$$\begin{aligned} X_{n+1}=\Phi (X_n,h,\xi _n), \end{aligned}$$

where the \(\xi _n\) are standard independent random vectors and h is the numerical step.

There are different ways to approximate the solution of the SDE problem (1.1). A strong approximation focuses on approaching the realization of a single trajectory of (1.1) for a given realization of the Wiener process W. A weak approximation approaches the average of functionals of the solution. We focus here on the approximation for the invariant measure, that is, approaching averages of functionals of the solution in the stationary state. This convergence is the numerical equivalent of (1.3). The integrator (1.4) is said to have order p for the invariant measure if for all \(\phi \in \mathcal {C} ^\infty (\mathbb {R} ^d,\mathbb {R})\), there exists a positive constant \(C(\phi )\) independent of the initial condition \(X_0\) such that

$$\begin{aligned} e(\phi ,h)\le C(\phi )h^p\quad \text {where} \quad e(\phi ,h)=\bigg |\lim _{N\rightarrow \infty } \frac{1}{N+1}\sum _{n=0}^N \phi (X_n)-\int _\mathcal {M} \phi \mathrm{d}\mu _\infty \bigg |.\nonumber \\ \end{aligned}$$

We recall that a scheme of weak order r immediately has order \(p\ge r\) for the invariant measure. For the underdamped and overdamped Langevin dynamics in \(\mathbb {R} ^d\), the articles [3, 4, 9, 40, 41] proposed multiple schemes of high order for the invariant measure with low weak order (typically \(r=1\)). We mention in particular the work [3] that introduced a methodology for the analysis and design of high-order integrators for the invariant measure. This methodology, which relies on Talay–Tubaro expansions [63], backward error analysis and modified differential equations for SDEs [1, 22, 36, 37, 67], is generalized in the context of manifolds in the present paper.

A widely used and simple numerical scheme for sampling the invariant measure distribution on manifolds is the Euler scheme (see [17, 42, 44, 45] for instance). Two variants exist for the overdamped Langevin Eq. (1.2), both of order one in the weak sense, or for sampling the invariant measure: the Euler integrator with explicit projection direction

$$\begin{aligned} X_{n+1}=X_n+hf(X_n) +\sigma \sqrt{h} \xi _n + \lambda g(X_n), \quad \zeta (X_{n+1})=0, \end{aligned}$$

and alternatively the Euler integrator with implicit projection direction

$$\begin{aligned} X_{n+1}=X_n+hf(X_n) +\sigma \sqrt{h} \xi _n + \lambda g(X_{n+1}), \quad \zeta (X_{n+1})=0. \end{aligned}$$

To the best of our knowledge, no high-order numerical integrators for sampling the invariant measure of the overdamped Langevin equation with constraints (1.2) have been proposed in the literature. In [46], an order two discretization based on the RATTLE integrator (see [5, 31, 58]) is applied to the underdamped Langevin equation, rather than to the overdamped Langevin dynamic (1.2). The previously described discretizations can be combined with Metropolis–Hastings rejection procedures [32, 50]. We quote in particular the Markov chain Monte Carlo (MCMC) methods [10, 27, 45] and the hybrid Monte Carlo methods [46, 65], where the need for a reverse projection check is shown to be a key step. We also mention the integrators in [47, 66] that are based on an Euler discretization and present new approaches for projecting on the manifold. The alternative approach of using Metropolis–Hastings rejection procedure allows to fully remove the bias on the invariant measure. Analogous to the Euclidean case, this procedure does not make high-order discretizations obsolete because, in particular, the rejection rate depends on the quality of the discretization and the dimension of the problem in general, and in the case of stiff problems or problems in high dimension, it suffers from timestep restrictions. Note also that in the specific case where \(\mathcal {M} \) is a Lie group, high-order integrators can be naturally obtained using splitting methods that are, however, typically limited to weak order two of accuracy (see [7] for further details in the context of ODEs).

This article proposes new tools for constructing integrators of any high order for sampling the invariant measure of constrained SDEs of the form (1.1) and relies on the formalism of trees and Butcher-series. Originally introduced by Hairer and Wanner in [30], and based on the work of Butcher [13], B-series have proved to be a powerful standard tool for the numerical analysis of deterministic differential equations, as presented, for instance, in the textbooks [14, 29]. In the last decades, several works extended B-series to the stochastic context. We mention in particular Burrage and Burrage [11, 12] and Komori et al. [35] who first introduced stochastic trees and B-series for studying the order conditions of strong convergence of SDEs, Rößler [53,54,55,56,57] and Debrabant and Kværnø [19,20,21] for the design and analysis of high-order weak and strong integrators on a finite time interval, [6] for creating schemes preserving quadratic invariants, and [38], where tree series were applied to a class of stochastic differential algebraic equations (SDAEs) for the computation of strong order conditions. Finally, we mention the recent work [39], which introduced the exotic aromatic B-series for the computation of order conditions for sampling the invariant measure of ergodic SDEs in \(\mathbb {R} ^d\), and that we extend in this paper to the context of SDEs on manifolds.

This article is organized as follows. Section 2 is devoted to the analysis of the accuracy of integrators for sampling the invariant measure on a manifold \(\mathcal {M} \). In Sect. 3, we apply this methodology on a class of Runge–Kutta methods for solving the constrained overdamped Langevin equation (1.2), to derive arbitrary high-order conditions for the invariant measure, with special emphasis on order two conditions, and to introduce a new order two scheme that uses only a few evaluations of f per step. The detailed calculations of the order conditions for the invariant measure are done in Sect. 4 with the help of an extension of the exotic aromatic B-series formalism [39]. We compare in Sect. 5 the new order two scheme with the Euler scheme (1.7) in numerical experiments on a sphere, a torus and the special linear group \({{\,\mathrm{SL}\,}}(m)\) to confirm its order of convergence for sampling the invariant measure.

2 High Order Ergodic Approximation on a Manifold

In this section, we present a new criterion for building integrators of any order for the invariant measure by extending the \(\mathbb {R} ^d\) results in [3, 22] to the context of manifolds. We first settle down a few notations and assumptions, before we recall the standard weak expansions of the exact and numerical solution using the backward Kolmogorov equation. For \(\zeta :\mathbb {R} ^d\rightarrow \mathbb {R} \) a smooth map, we denote \(g=\nabla \zeta \) its gradient, and \(G(x)=g^T(x)g(x)=\left| g(x)\right| ^2\) the Gram function related to the manifold \(\mathcal {M} =\{x\in \mathbb {R} ^d,\zeta (x)=0\}\), where we denote by \(\left| x\right| =(x^T x)^{1/2}\) the Euclidean norm in \(\mathbb {R} ^d\). We assume in the rest of the article that \(\mathcal {M} \) is a compact and smooth manifold of codimension one embedded in \(\mathbb {R} ^d\). We suppose in addition that the Gram function G is strictly positive on \(\mathcal {M} \)\(G(x)\ge \alpha >0\) for all \(x\in \mathcal {M} \). With these notations, the projection \(\Pi _\mathcal {M} \) on the tangent bundle is given by \(\Pi _\mathcal {M} (x)=I_d-G(x)^{-1}g(x)g(x)^T\). We denote \(\mathcal {L} \) the generator of the SDE (1.1). It is given, for \(\phi \in \mathcal {C} ^\infty (\mathbb {R} ^d,\mathbb {R})\), by

$$\begin{aligned} \mathcal {L} \phi =\phi '(\Pi _\mathcal {M} f) +\frac{1}{2}\sum _{i=1}^d \phi '((\Pi _\mathcal {M} \Sigma e_i)'(\Pi _\mathcal {M} \Sigma e_i)) +\frac{1}{2}\sum _{i=1}^d \phi ''(\Pi _\mathcal {M} \Sigma e_i,\Pi _\mathcal {M} \Sigma e_i), \nonumber \\ \end{aligned}$$

where \((e_i)_{i=1, \dots , d}\) is the canonical basis of \(\mathbb {R} ^d\) and, for all vectors \(a^1\), ..., \(a^m\in \mathbb {R} ^d\), we use the following notation for differentials in \(\mathbb {R} ^d\),

$$\begin{aligned} \phi ^{(m)}(a^1,\dots ,a^m) =\sum _{i_1,\dots ,i_m=1}^d \partial _{i_1,\dots ,i_m} \phi a^1_{i_1}\dots a^m_{i_m} =\sum _{i_1,\dots ,i_m=1}^d \frac{\partial ^m\phi }{\partial x_{i_1}\dots \partial x_{i_m}} a^1_{i_1}\dots a^m_{i_m}. \end{aligned}$$

For the overdamped Langevin equation (1.2), the generator (2.1) reduces to

$$\begin{aligned} \mathcal {L} \phi&=\phi 'f -G^{-1}(g,f)\phi 'g -\frac{\sigma ^2}{2}G^{-1}{{\,\mathrm{div}\,}}(g)\phi 'g +\frac{\sigma ^2}{2}G^{-2}(g,g'g)\phi 'g +\frac{\sigma ^2}{2}\Delta \phi \nonumber \\&\quad -\frac{\sigma ^2}{2}G^{-1}\phi ''(g,g) =\frac{\sigma ^2}{2}\exp \Big (\frac{2}{\sigma ^2}V\Big ){{\,\mathrm{div}\,}}_\mathcal {M} \Big (\exp \Big (-\frac{2}{\sigma ^2}V\Big )\nabla _\mathcal {M} \phi \Big ), \end{aligned}$$

where \(\nabla _\mathcal {M} \psi :=\Pi _\mathcal {M} \nabla \psi \) and \({{\,\mathrm{div}\,}}_\mathcal {M} (H):={{\,\mathrm{div}\,}}(H)-G^{-1}(g,H'(g))\). The adjoint \(\mathcal {L} ^*\) of the generator (2.1) in \(L^2(\mathrm{d}\sigma _\mathcal {M})\) for the SDE (1.1), i.e. the operator that satisfies for all test functions \(\phi \)\(\psi \in \mathcal {C} ^\infty (\mathbb {R} ^d,\mathbb {R})\),

$$\begin{aligned} \int _\mathcal {M} (\mathcal {L} \phi ) \psi \mathrm{d}\sigma _\mathcal {M} =\int _\mathcal {M} \phi (\mathcal {L} ^*\psi ) \mathrm{d}\sigma _\mathcal {M}, \end{aligned}$$

is given by

$$\begin{aligned} \mathcal {L} ^*\phi =-{{\,\mathrm{div}\,}}_\mathcal {M} (\phi f)+\frac{1}{2}\sum _{i=1}^d {{\,\mathrm{div}\,}}_\mathcal {M} ({{\,\mathrm{div}\,}}_\mathcal {M} (\phi \Sigma e_i)\Sigma e_i). \end{aligned}$$

Remark 2.1

As \(\mathcal {L} \) is a self-adjoint operator in \(L^2(\mathrm{d}\mu _\infty )\), but not in \(L^2(\mathrm{d}\sigma _\mathcal {M})\) in general, it could be more natural to perform the analysis in the space \(L^2(\mathrm{d}\mu _\infty )\). However, as we allow the substages of our numerical integrators to explore the open neighbourhood of \(\mathcal {M} \) in \(\mathbb {R} ^d\), we shall work in this paper with differential operators that cannot be rewritten in general with intrinsic derivatives on the manifold \(\mathcal {M} \). In addition, performing directly the integration by parts calculations in \(L^2(\mathrm{d}\mu _\infty )\) with such operators is not straightforward, and this motivated the choice of \(L^2(\mathrm{d}\sigma _\mathcal {M})\) for the analysis. A similar choice was done in [39] in the context of \(\mathbb {R} ^d\).

We follow the framework of [25]. In particular, we rely on the construction of the local orthogonal coordinates. In a neighbourhood \(N_\mathcal {M} \) of the manifold \(\mathcal {M} \), there exists an atlas of local orthogonal coordinate systems \((y,z)\in (V\subset \mathbb {R} ^{d-1})\times (-\varepsilon ,\varepsilon )\) for \(\varepsilon >0\), with respect to local charts \(\psi :U\subset N_\mathcal {M} \rightarrow (V\subset \mathbb {R} ^{d-1})\times (-\varepsilon ,\varepsilon )\), such that if \(\psi (x)=(y,z)\), then \(z=\zeta (x)\). We make the following regularity assumption on the generator \(\mathcal {L} \).

Assumption 2.2

On an open neighbourhood \(N_\mathcal {M} \) of \(\mathcal {M} \) in \(\mathbb {R} ^d\), there exists a constant \(C>0\) such that for all \(x \in N_\mathcal {M} \) and \((y,z)=\psi (x)\), for all one-form field \(v:T\mathcal {M} \rightarrow \mathbb {R} \) on \(\mathcal {M} \) of norm one, we have

$$\begin{aligned} \sum _{i,j=1}^{d-1} \sum _{k=1}^d {\widetilde{\Sigma }}_{ik}(y,z){\widetilde{\Sigma }}_{jk}(y,z) v_i({\widetilde{x}}) v_j({\widetilde{x}}) \ge C, \end{aligned}$$

where \({\widetilde{x}}\in \mathcal {M} \) is such that \(\psi ({\widetilde{x}})=(y,0)\) and, for \(k=1,\dots ,d\)\(({\widetilde{\Sigma }}_{ik}(y,z))_i\in \mathbb {R} ^{d-1}\) is defined as the restriction of the vector \((\Pi _\mathcal {M} ({\widetilde{x}}) \Sigma _{ik}(x))_i\in \mathbb {R} ^d\) to the tangent space \(T_{{\widetilde{x}}}\mathcal {M} \) of \(\mathcal {M} \), rewritten in the local orthogonal coordinate system.

This assumption is a variant in the manifold case of the uniform ellipticity property of the generator \(\mathcal {L} \) in the Euclidean context of \(\mathbb {R} ^d\). In addition, Assumption 2.2 is automatically satisfied for the constrained overdamped Langevin equation (1.2) and yields that the function \(u(x,t)=\mathbb {E} [\phi (X(t))|X(0)=x]\) satisfies the backward Kolmogorov equation (see [25]):

$$\begin{aligned} \frac{\partial u}{\partial t}(x,t) = \mathcal {L} u(x,t), \quad u(x,0)=\phi (x), \quad x\in N_\mathcal {M}, \quad t>0. \end{aligned}$$

We refer to [36, 37] for similar results in the context of \(\mathbb {R} ^d\). The backward Kolmogorov equation (2.3) allows us to write the following expansion of \(u(x,h) =\mathbb {E} [\phi (X(h))|X_0=x]\) for h small enough

$$\begin{aligned} u(x,h) = \phi (x)+\sum _{j=1}^N \frac{h^j}{j!} \mathcal {L} ^j\phi (x)+ h^{N+1} R_N^h(\phi ,x), \quad x\in N_\mathcal {M}, \end{aligned}$$

where \(N_\mathcal {M} \) is an open neighbourhood of \(\mathcal {M} \) in \(\mathbb {R} ^d\) and the remainder satisfies \(\left| R_N^h(\phi ,x)\right| \le C_N(\phi )\) where the constant \(C_N(\phi )\) is independent of h and x.

We now assume the existence and uniqueness of an invariant measure, as well as an additional regularity property on \(\mathcal {L} \), in the spirit of [22, Hypotheses H1-H2] in the context of \(\mathbb {R} ^d\).

Assumption 2.3

There exists an open neighbourhood \(N_\mathcal {M} \) of \(\mathcal {M} \) in \(\mathbb {R} ^d\) and a unique positive function \(\rho _\infty \in \mathcal {C} ^\infty (N_\mathcal {M},\mathbb {R})\) satisfying \(\int _\mathcal {M} \rho _\infty \mathrm{d}\sigma _\mathcal {M} =1\) and \(\mathcal {L} ^*\rho _\infty =0\) on \(N_\mathcal {M} \). Moreover, for all \(\phi \in \mathcal {C} ^\infty (N_\mathcal {M},\mathbb {R})\) such that \(\int _\mathcal {M} \phi \mathrm{d}\sigma _\mathcal {M} =0\), there exists a unique solution \(\rho \in \mathcal {C} ^\infty (N_\mathcal {M},\mathbb {R})\) to the Poisson problem \(\mathcal {L} ^*\rho =\phi \) that satisfies \(\int _\mathcal {M} \rho \mathrm{d}\sigma _\mathcal {M} =0\).

The existence and uniqueness of the invariant measure are in particular satisfied for the constrained overdamped Langevin equation (1.2) (see [25, Sect. 2.3] for further details). Assumption 2.3 yields the ergodicity of the process X(t) solution of (1.1) with the unique invariant measure \(\mathrm{d}\mu _\infty =\rho _\infty \mathrm{d}\sigma _\mathcal {M} \) on \(\mathcal {M} \). To proceed further, we shall assume that the integrator (1.4) is ergodic, that is, there exists a measure \(\mathrm{d}\mu _h\) that has a density with respect to \(\mathrm{d}\sigma _\mathcal {M} \) such that

$$\begin{aligned} \lim _{N\rightarrow \infty } \frac{1}{N+1}\sum _{n=0}^N \phi (X_n)=\int _{\mathcal {M}} \phi \mathrm{d}\mu ^h\quad \text {almost surely}. \end{aligned}$$

We refer to [48, 60,61,62] in the Euclidean case, and to [25] in the manifold case, and references therein, for further details on the ergodicity of numerical integrators. In addition, we suppose that \(\mathbb {E} [\phi (X_1)|X_0=x]\), the numerical analog of u(xh), can be developed in powers of h as was done, for instance, in [3, 63] in the context of \(\mathbb {R} ^d\).

Assumption 2.4

For all \(\phi \in \mathcal {C} ^\infty (\mathbb {R} ^d ,\mathbb {R})\), the numerical integrator (1.4) has a weak Taylor expansion of the form

$$\begin{aligned} \mathbb {E} [\phi (X_1)|X_0=x] = \phi (x) + \sum _{j=1}^N h^j \mathcal {A} _{j-1}\phi (x)+ h^{N+1}R_N^h(\phi ,x),\quad x\in N_\mathcal {M},\,\,\,\,\, \end{aligned}$$

for all h assumed small enough, and where \(N_\mathcal {M} \) is an open neighbourhood of \(\mathcal {M} \) in \(\mathbb {R} ^d\) and the remainder satisfies \(\left| R_N^h(\phi ,x)\right| \le C_N(\phi )\) where the constant \(C_N(\phi )\) is independent of h and x. The \(\mathcal {A} _j\)’s, \(j=0,1,2,\ldots \) are linear differential operators with coefficients depending smoothly on fg and their (high order) derivatives (and depending on the choice of the integrator).

Under Assumptions 2.2 and 2.4, by comparing the expansions (2.6) and (2.4), the integrator has at least weak order p if \(\mathcal {A} _{j-1}=\mathcal {L} ^j/j!\) for \(j=1,\dots ,p\). However, as observed already in \(\mathbb {R} ^d\), high order for the invariant measure can be achieved in spite of a low weak order. This is the purpose of Theorem 2.5 where we present a new sufficient condition for a scheme to have order r for the invariant measure. This result, which relies on the powerful tool of backward error analysis for SDEs, is similar to [3, Thm. 3.3] in the context of smooth compact manifolds.

Theorem 2.5

Under Assumptions 2.22.3 and 2.4, if the numerical scheme is consistent (that is, if \(\mathcal {A} _0=\mathcal {L} \)) and ergodic, and if it satisfies in \(L^2(\mathrm{d}\sigma _\mathcal {M})\)

$$\begin{aligned} \mathcal {A} _j^*\rho _{\infty }=0, \quad j=1,\dots ,r-1, \end{aligned}$$

then it has order r for the invariant measure and the numerical error (1.5) satisfies, for \(h\rightarrow 0\),

$$\begin{aligned} e(\phi ,h)&=h^r \int _\mathcal {M} \phi (x) \rho _r(x) \mathrm{d}\sigma _\mathcal {M} (x)+\mathcal {O} (h^{r+1})\\&=h^r\int _0^\infty \int _\mathcal {M} u(x,t)\mathcal {A} _r^* \rho _\infty (x) \mathrm{d}\sigma _\mathcal {M} (x) \mathrm{d}t+\mathcal {O} (h^{r+1}), \end{aligned}$$

where \(\rho _r\in \mathcal {C} ^\infty (N_\mathcal {M},\mathbb {R})\) is the unique solution of the Poisson problem \(\mathcal {L} ^* \rho _r=-\mathcal {A} _r^*\rho _\infty \) in \(N_\mathcal {M} \) that satisfies \(\int _\mathcal {M} \rho _r \mathrm{d}\sigma _\mathcal {M} =0\), with \(N_\mathcal {M} \) an open neighbourhood of \(\mathcal {M} \) in \(\mathbb {R} ^d\).

The proof of Theorem 2.5 is detailed in Appendix A for the sake of completeness. The idea is to write an expansion of the error in the spirit of [63], and to generalize the analysis in [3, 22] on \(\mathbb {T} ^d\) and in [25] to the context of smooth compact manifolds.

Theorem 2.5 states the result for times \(t\rightarrow \infty \). A bound of the error at finite time \(t_n=nh\) is typically given by the following exponential estimate (see [22, 25])

$$\begin{aligned} \bigg |\mathbb {E} [\phi (X_n)]-\int _{\mathcal {M}} \phi (x) \mathrm{d}\mu _\infty (x)\bigg |\le Ke^{-\mu t_n}+Ch^p, \end{aligned}$$

where the constant \(\mu >0\) is in practice the spectral gap of a certain operator that depends on the numerical integrator. Reducing the error term \(Ke^{-\mu t_n}\) is out of the scope of this paper, though the recent works [2, 24, 43] proposed numerical methods in \(\mathbb {R} ^d\) that improve the rate of convergence at infinity, while sometimes also reducing the variance.

Remark 2.6

One can consider possible generalizations of Theorem 2.5 in the case where \(\mathcal {M} \) is not compact, or if \(\mathcal {M} \) is a manifold of any dimension. We refer to [3] for the non-compact extension of Theorem 2.5 in the context of \(\mathbb {R} ^d\).

3 High-Order Integrators for Constrained Langevin Dynamics

In this section, we propose a new class of Runge–Kutta methods for sampling the invariant measure of Eq. (1.2), and present the methodology for deriving the conditions of any order for the invariant measure using Theorem 2.5. In particular, we compute exactly the consistency and order two conditions for the invariant measure as they are the most relevant for the applications.

3.1 Runge–Kutta Methods for Constrained Overdamped Langevin

When discretizing naively Eq. (1.2), one cannot ensure in general that the integrator stays on \(\mathcal {M} \). It is natural to discretize instead the equivalent formulation with Lagrange multipliers

$$\begin{aligned} \mathrm{d}X=f(X)\mathrm{d}t +\sigma \mathrm{d}W+g(X)\mathrm{d}\lambda _t,\quad \zeta (X)=0,\quad X(0)=X_0\in \mathcal {M}. \end{aligned}$$

The class of numerical schemes we obtain is in the spirit of deterministic Runge–Kutta methods for differential algebraic problems such as the methods SHAKE and RATTLE (see [5, 31, 58]), introduced in the context of constrained Hamiltonian dynamics, or the SPARK class of methods for general DAEs (see [34]). Since evaluating f is in practical applications the most expensive part of the algorithm compared to evaluating g, we propose high-order integrators that are implicit in g and explicit in f in the spirit of implicit-explicit (IMEX) integrators (see, e.g., [31]), so that there are only a few evaluations of f per step. We thus consider the following class of Runge–Kutta integrators

$$\begin{aligned} Y_i&=X_n+h\sum \limits _{j=1}^s a_{ij}f(Y_j) +\sigma \sqrt{h} d_i \xi _n + \lambda _i \sum \limits _{j=1}^s {\widehat{a}}_{ij} g(Y_j), \quad i = 1, \dots ,s, \nonumber \\ \zeta (Y_i)&=0 \quad \text {if} \quad \delta _i=1, \quad i = 1, \dots ,s, \nonumber \\ X_{n+1}&=Y_s, \end{aligned}$$

where \(A=(a_{ij}), {\widehat{A}}=({\widehat{a}}_{ij})\in \mathbb {R} ^{s\times s}\) and \(\delta _i=\sum _{j=1}^s {\widehat{a}}_{ij}\in \{0,1\}\) are the given Runge–Kutta coefficients, and the \(\xi _n\sim \mathcal {N} (0,I_d)\) are independent standard Gaussian random vectors in \(\mathbb {R} ^d\). (An alternative with discrete bounded random variables is discussed in Remark 3.2.) We fix \(\delta _s=1\) so that \(X_{n+1}\in \mathcal {M} \) and we ask that if \(\delta _i=0\), then \({\widehat{a}}_{ij}=0\) for \(j=1,\dots ,s\) (internal stages without projection, \(Y_i\notin \mathcal {M} \) a.s.). Ideally, one aims for IMEX integrators with a low number of evaluations of f, we hence assume in addition that \({\widehat{A}}\) is a lower triangular matrix and A is a strictly lower triangular matrix (in the spirit of DIRK methods). We represent the numerical integrators with their associated Butcher tableau, where \(b=(a_{s,i})_i\)\({\widehat{b}}=({\widehat{a}}_{s,i})_i\)\(c= A\mathbb {1}\) and \(\mathbb {1}=(1,\dots ,1)^T\).

$$\begin{aligned}\begin{array} {c|c||c|c||c} c &{} A &{} \delta &{} {\widehat{A}} &{} d\\ \hline &{} b^T &{} &{} {\widehat{b}}^T &{} \end{array} \end{aligned}$$

For instance, the Euler schemes can be written as Runge–Kutta methods of the form (3.1) with \(s=2\) and the following Butcher tableaux.

$$\begin{aligned} \text {Euler }(1.6): \begin{array} {c|cc||c|cc||c} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 1 &{} 1 &{} 0 &{} 1 &{} 1 &{} 0 &{} 1\\ \hline &{} 1 &{} 0 &{} &{} 1 &{} 0 &{} \end{array} \qquad \text {Euler }(1.7): \begin{array} {c|cc||c|cc||c} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 1 &{} 1 &{} 0 &{} 1 &{} 0 &{} 1 &{} 1\\ \hline &{} 1 &{} 0 &{} &{} 0 &{} 1 &{} \end{array} \end{aligned}$$

Note that the class of methods (3.1) satisfies automatically Assumption 2.4.

Remark 3.1

The class of Runge–Kutta methods (3.1) can be straightforwardly generalized (as done in [39] in the Euclidean case \(\mathbb {R} ^d\)) to study partitioned problems where \(f=f_1+f_2\) and, for example, to create IMEX schemes. In order to improve the order of the method without increasing its cost, one could also apply a postprocessor (in the spirit of [64] in \(\mathbb {R} ^d\)) or use multiple independent noises in (3.1) instead of only one random variable \(\xi _n\sim \mathcal {N} (0,I_d)\). This last extension can increase the number of conditions but may also increase the set of solutions. We refer in particular to [19, 39] in the context of \(\mathbb {R} ^d\), where it is shown for a class of stochastic Runge–Kutta methods that the order conditions for weak order 3 cannot be satisfied in general, unless we use at least two independent noises. In addition, if we rewrite the internal stages of (3.1) as

$$\begin{aligned} Y_i=X_n+h\sum \limits _{j=1}^s a_{ij}f(Y_j) +\sigma \sqrt{h} d_i \xi _n + \bigg (\sum \limits _{j=1}^s {\widehat{a}}_{ij} g(Y_j)\bigg )\lambda _i, \end{aligned}$$

where \(g:\mathbb {R} ^d\rightarrow \mathbb {R} ^{d\times q}\) and \(\lambda _i\in \mathbb {R} ^q\), then the same class of methods is also fit for solving (1.2) with a multidimensional constraint \(\zeta :\mathbb {R} ^d\rightarrow \mathbb {R} ^q\). Note that the coefficients of the method do not depend on the dimension of the space d or the codimension q of the manifold. This will be studied in future work.

Remark 3.2

If \(\xi _n\) is a Gaussian random variable, its realizations can be arbitrarily large, and the existence and uniqueness of the solution of the system (3.1) does not hold in general. A standard remedy to ensure that the projection on \(\mathcal {M} \) always exists for \(h\le h_0\) small enough is to replace the standard Gaussian random vectors \(\xi \) in (3.1) by bounded discrete random vectors \({\widehat{\xi }}\) that have the same first moments in the spirit of [51, Chap. 2]. This way, the order of the method is preserved both in the weak sense and for the invariant measure, and the method is well-posed for all h small enough. For weak/ergodic order two, one can consider, for instance, the random vectors \({\widehat{\xi }}\) with independent components \({\widehat{\xi }}_i\) that satisfy

$$\begin{aligned} \mathbb {P} ({\widehat{\xi }}_i=0)=\frac{2}{3} \quad \text {and} \quad \mathbb {P} ({\widehat{\xi }}_i=\pm \sqrt{3})=\frac{1}{6}, \quad i=1,\dots ,d. \end{aligned}$$

The following lemma guarantees the well-posedness of a method of the form (3.1) with bounded random variables \({\widehat{\xi }}_n\). The result is still true when A and \({\widehat{A}}\) are general matrices, but we consider only the lower triangular case for the sake of brevity. This result is in the spirit of [31, Chap. VII] for deterministic DAEs.

Lemma 3.3

For Runge–Kutta methods of the form (3.1) where the \(\xi _n\) are replaced by bounded random variables \({\widehat{\xi }}_n\), there exists \(h_0>0\) such that for all \(h\le h_0\), for any initial condition \(X_n\in \mathcal {M} \), there exists a unique solution \(X_{n+1}\) of (3.1) in a neighbourhood of \(X_n\). Furthermore, the internal stages satisfy \(Y_i=X_n+\mathcal {O} (\sqrt{h})\) and \(\lambda _i=\mathcal {O} (\sqrt{h})\) for \(i=1,\dots ,s\).


We proceed by induction on i. We assume that for \(j<i\), the \(Y_j\) are already defined and satisfy \(Y_j=X_n+\mathcal {O} (\sqrt{h})\). The result is straightforward if \(\delta _i=0\). We thus assume that \(\delta _i=1\) and prove the existence of a unique solution to the equations of the internal stage i:

$$\begin{aligned} Y_i&=X_n+h\sum _{j=1}^{i-1} a_{ij}f(Y_j) +\sigma \sqrt{h} d_i {\widehat{\xi }}_n + \lambda _i \sum _{j=1}^i {\widehat{a}}_{ij} g(Y_j), \end{aligned}$$
$$\begin{aligned} \zeta (Y_i)&=0. \end{aligned}$$

Using \(\zeta (X_n)=0\), we rewrite Eq. (3.4) as

$$\begin{aligned} \zeta (Y_i)-\zeta (X_n)=\int _0^1 g^T(X_n+\tau (Y_i-X_n))\mathrm{d}\tau (Y_i-X_n)=0. \end{aligned}$$

Inserting (3.3) in (3.5) yields

$$\begin{aligned} \int _0^1 g^T(X_n+\tau (Y_i-X_n))\mathrm{d}\tau \Big [h\sum _{j=1}^{i-1} a_{ij}f(Y_j) +\sigma \sqrt{h} d_i {\widehat{\xi }}_n + \lambda _i \sum _{j=1}^i {\widehat{a}}_{ij} g(Y_j)\Big ]=0.\nonumber \\ \end{aligned}$$

Multiplying both sides of Eq. (3.3) by \(\int _0^1 g^T(X_n+\tau (Y_i-X_n))\mathrm{d}\tau \Big (\sum _{j=1}^i {\widehat{a}}_{ij} g(Y_j)\Big )\), and substituting \(\lambda _i\) in (3.3) with its value from (3.6), we deduce that \(F(Y_i,h)=0\), where the function \(F:\mathbb {R} ^d\times \mathbb {R} \rightarrow \mathbb {R} ^d\) is given by

$$\begin{aligned} F(y,t)&=\int _0^1 g^T(X_n+\tau (y-X_n))\mathrm{d}\tau \bigg [ \Big (t\sum _{j=1}^{i-1} a_{ij}f(Y_j) +\sigma \sqrt{t} d_i {\widehat{\xi }}_n\Big )\\&\quad \Big (\sum _{j=1}^{i-1} {\widehat{a}}_{ij} g(Y_j)+{\widehat{a}}_{ii} g(y)\Big )\\&\quad +\Big (\sum _{j=1}^{i-1} {\widehat{a}}_{ij} g(Y_j)+{\widehat{a}}_{ii} g(y)\Big ) \Big (y-X_n-t\sum _{j=1}^{i-1} a_{ij}f(Y_j) -\sigma \sqrt{t} d_i {\widehat{\xi }}_n\Big ) \bigg ]. \end{aligned}$$

As \(F(X_n,0)=0\) and the partial differential \(\partial _y F(X_n,0)=G(X_n) I_d\) is invertible, the implicit function theorem yields the existence and uniqueness of \(Y_i\) in a ball of centre \(X_n\) for \(h\le h_0\) small enough. As \({\widehat{\xi }}_n\) is bounded and \(\mathcal {M} \) is compact, there exists a deterministic \(h_0\) that works for every initial condition \(X_n\in \mathcal {M} \). Now that \(Y_i\) is well posed, we deduce from the identity \(F(Y_i,h)=0\) that \(Y_i=X_n+\mathcal {O} (\sqrt{h})\) and we derive from (3.6) that \(\lambda _i\) is well posed for h small enough and satisfies \(\lambda _i=\mathcal {O} (\sqrt{h})\). Finally, we observe that \((Y_i,\lambda _i)\) is indeed a solution to (3.3) and (3.4). \(\square \)

Remark 3.4

In practice, one can solve numerically each internal stage of the set of equations (3.1) with a fixed point iterations or a Newton method starting from \(Y_i=X_n\) and \(\lambda _i=0\). As \(\mathcal {M} \) is compact, if the \(\xi _n\) are replaced by bounded random variables, these two methods converge for \(h\le h_0\) where \(h_0\) is small enough and independent of the initial condition. It is crucial to initialize the \(Y_i\) in a neighbourhood of \(X_n\) as (3.1) has multiple solutions in general. For example, the Euler scheme (1.7) always has two solutions if \(\mathcal {M} \) is a sphere (the two intersections of \(\mathcal {M} \) and a straight line going through the centre of \(\mathcal {M} \)).

Before looking at the consistency and the order conditions of the class of methods (3.1), we introduce a concise notation for multiplying vectors component-wise.

Definition 3.5

For \(y, y^{(1)}, \dots , y^{(n)}\in \mathbb {R} ^d\) and \(m\ge 0\), we define the diamond product and the diamond power as the vectors in \(\mathbb {R} ^d\),

We present below the detailed calculation of the consistency conditions of the class of methods (3.1) for the constrained overdamped Langevin equation (1.2). Similar proofs can be found in [44, Prop. 3.24] for the Euler schemes (1.6) and (1.7), and in [3, 39] for Runge–Kutta methods in \(\mathbb {R} ^d\).

Proposition 3.6

For a Runge–Kutta method of the form (3.1), the operator \(\mathcal {A} _0\) in (2.6) is given for \(\phi \in \mathcal {C} ^\infty (\mathbb {R} ^d,\mathbb {R})\) by

In particular, if


then the method is consistent, that is, \(\mathcal {A} _0=\mathcal {L} \).


If we apply one step of a method of the form (3.1) with the initial condition \(X_0=x\), then the internal stages \(Y_i\) satisfy the following expansion

$$\begin{aligned} Y_i&=x+\sigma \sqrt{h}d_i \xi +hc_i f(x) +R^h,\quad \text {if} \quad \delta _i=0,\\ Y_i&=x+\sqrt{h}\left[ \sigma d_i \xi +\lambda _{1/2,i}(x) g(x)\right] \\&\qquad +h\Big [c_i f(x)+\lambda _{1,i}(x) g(x)+\sigma \lambda _{1/2,i}(x)\sum _{j=1}^s {\widehat{a}}_{ij}d_j g'(x)\xi \\&\qquad +\lambda _{1/2,i}(x)\sum _{j=1}^s {\widehat{a}}_{ij} \lambda _{1/2,j}(x)\delta _j g'(x)g(x) \Big ] +R^h,\quad \text {if} \quad \delta _i=1, \end{aligned}$$

where the remainder satisfies \(\left| R^h\right| \le Ch^{3/2}\), and where we used that \(\lambda _{i}\) can be developed in powers of \(\sqrt{h}\) as \(\lambda _{i}=\sqrt{h}\lambda _{1/2,i}+h\lambda _{1,i}+\dots \) in the spirit of [44, Lemma 3.25]. If \(\delta _i=1\)\(\zeta (Y_i)\) can also be expanded as

$$\begin{aligned} \zeta (Y_i)&=\zeta (x) +\sqrt{h}\left[ \sigma d_i (g,\xi )+ \lambda _{1/2,i} G \right] \\&\qquad +h\Big [c_i (g,f)+\lambda _{1,i} G +\lambda _{1/2,i}\sum _{j=1}^s {\widehat{a}}_{ij} \lambda _{1/2,j}\delta _j (g,g'g)\\&\qquad +\sigma \lambda _{1/2,i}\sum _{j=1}^s {\widehat{a}}_{ij}d_j (g,g'\xi )+\frac{1}{2}\sigma ^2 d_i^2 (\xi ,g'\xi )\\&\qquad +\sigma \lambda _{1/2,i} d_i (g,g'\xi ) + \frac{1}{2} \lambda _{1/2,i}^2 (g,g'g) \Big ]+\dots \end{aligned}$$

where we omitted the dependency in x of Gg\(g'\) and the \(\lambda _{k/2,j}\)’s for brevity. We have \(\zeta (Y_i)=\zeta (x)=0\) (since \(x\in \mathcal {M} \)), thus by identifying each term of the expansion with zero, we get

$$\begin{aligned} \lambda _{1/2,i}&=-\sigma \delta _i d_i G^{-1}(g,\xi ),\\ \lambda _{1,i}&=- \delta _i c_i G^{-1}(g,f) +\sigma ^2 \delta _i\left( \sum _{j=1}^s {\widehat{a}}_{ij} d_i d_j+d_i^2\right) G^{-2}(g,\xi )(g,g'\xi )\\&\quad -\sigma ^2 \delta _i\left( \sum _{j=1}^s {\widehat{a}}_{ij} \delta _j d_i d_j+\frac{1}{2} d_i^2\right) G^{-3}(g,\xi )^2 (g,g'g) -\frac{\sigma ^2}{2} \delta _i d_i^2 G^{-1}(\xi ,g'\xi ). \end{aligned}$$

For \(\phi \) a test function, the operator \(\mathcal {A} _0\phi \) satisfies

$$\begin{aligned} \mathbb {E} [\phi (X_1)]=\mathbb {E} [\phi (Y_s)]=\phi (x)+h\mathcal {A} _0\phi (x)+h^2\mathcal {A} _1\phi (x)+\cdots \end{aligned}$$

By replacing \(Y_s\) with its expansion in powers of \(h^{1/2}\), and by identifying the first terms, we deduce that

where we used that \(\delta _s=1\) and that all the terms containing an odd number of \(\xi \) vanish since odd moments of \(\xi \) are zero. Distributing the expectation on each term and using \(c_s=b^T\mathbb {1}\) yield the desired expression of \(\mathcal {A} _0\phi \). We deduce the consistency conditions \(b^T\mathbb {1}=d_s=1\) and in order to get \(\mathcal {A} _0=\mathcal {L} \). \(\square \)

Remark 3.7

The analysis presented in Sect. 3.1 is conducted for the overdamped Langevin dynamics (1.2). It would be interesting to consider extensions with multiplicative noise or a non-gradient vector field f. The calculations would likely become more involved, and we may get more order conditions (see, for instance, [3, Thm. 3.3] and [39, Remark 5.1 and Sect. 5.5] in the context of \(\mathbb {R} ^d\), where many additional terms arise, in particular for the integration by parts calculations). This will be studied in future work.

3.2 Order Conditions for the Invariant Measure on Manifolds

We now derive the methodology for getting the conditions of arbitrary high order for sampling the invariant measure of the constrained overdamped Langevin equation (1.2). In particular, the following theorem presents the Runge–Kutta conditions for order two for the invariant measure on \(\mathcal {M} \). Note that the number of conditions does not depend on the dimension of the space d.

Theorem 3.8

(Runge–Kutta conditions for order two for the invariant measure) We consider a Runge–Kutta method of the form (3.1) and assume the consistency condition (3.7). If the method is ergodic and if the following conditions are satisfied, then the integrator has order two for the invariant measure:

In the particular case where we set \(\delta =\mathbb {1}\), the order two conditions reduce to the following:

For simplicity, we used in Theorem 3.8 the notation  of Definition 3.5. For instance, the condition rewrites into

$$\begin{aligned} \sum _{i,j=1}^s {\widehat{b}}_i d_i^2 {\widehat{a}}_{ij} d_j=\sum _{i,j=1}^d {\widehat{b}}_i d_i {\widehat{a}}_{ij} d_j+\frac{1}{2}\Big (\sum _{i=1}^d {\widehat{b}}_i d_i\Big )^2. \end{aligned}$$

The order conditions of Theorem 3.8 can be obtained from straightforward calculations with the following methodology. We compute the operator \(\mathcal {A} _1\) with the same method used for \(\mathcal {A} _0\) in Proposition 3.6. It is a differential operator of order four with the following first terms

$$\begin{aligned} \mathcal {A} _1\phi =\frac{\sigma ^4}{8}\Delta ^2\phi -\frac{\sigma ^4}{4}G^{-1}\Delta \phi ''(g,g)+\frac{\sigma ^4}{8}G^{-2}\phi ^{(4)}(g,g,g,g)+\mathcal {B} \phi , \end{aligned}$$

where \(\mathcal {B} \) is a differential operator of order three. We present the complete expansion of \(\mathcal {A} _1\) in Sect. 4 by using a B-series approach. If we assume that \({\widehat{b}}^T d=b^T d\), then we can integrate by parts to transform \(\int _\mathcal {M} \mathcal {A} _1\phi \mathrm{d}\mu _\infty \) into an integral of the form \(\int _\mathcal {M} \mathcal {A} _1^0\phi \mathrm{d}\mu _\infty \) where \(\mathcal {A} _1^0\phi \) is a differential operator of order one in \(\phi \) (in the spirit of [3, 39]). On a manifold, the integration by parts is a corollary of the Green theorem (see, for instance, [59, Chap. II]). As we shall see below, it reveals a crucial tool for deriving order conditions for the invariant measure. To perform the calculations in a systematic manner, a formalization of the integration by parts process with trees and B-series is presented in Sect. 4.

Lemma 3.9

(Integration by parts on \(\mathcal {M} \)) If \(\psi :\mathbb {R} ^d\rightarrow \mathbb {R} \) and \(H:\mathbb {R} ^d\rightarrow \mathbb {R} ^d\) are smooth functions, then

$$\begin{aligned} \int _{\mathcal {M}}(\nabla _\mathcal {M} \psi , H) \mathrm{d}\sigma _\mathcal {M} =-\int _\mathcal {M} \psi {{\,\mathrm{div}\,}}_\mathcal {M} (\Pi _\mathcal {M} H)\mathrm{d}\sigma _\mathcal {M}, \end{aligned}$$

where \(\nabla _\mathcal {M} \psi :=\Pi _\mathcal {M} \nabla \psi \) and \({{\,\mathrm{div}\,}}_\mathcal {M} (H):={{\,\mathrm{div}\,}}(H)-G^{-1}(g,H'g)\). In addition, with the invariant measure \(\mathrm{d}\mu _\infty =\rho _\infty \mathrm{d}\sigma _\mathcal {M} \) and \(k\ge 0\), we obtain

$$\begin{aligned}&\int _{\mathcal {M}} \Big [G^{-k} \psi 'H -G^{-(k+1)}(g,H)\psi 'g\Big ] \mathrm{d}\mu _\infty \nonumber \\&\quad = \int _\mathcal {M} \Big [G^{-(k+1)}(g,H'g)\psi \nonumber \\&\qquad -(2k+1)G^{-(k+2)}(g,g'g)(g,H)\psi \nonumber \\&\qquad -G^{-k}{{\,\mathrm{div}\,}}(H)\psi +2kG^{-(k+1)}(g,g'H)\psi \nonumber \\&\qquad +G^{-(k+1)}{{\,\mathrm{div}\,}}(g)(g,H)\psi +\frac{2}{\sigma ^2}G^{-(k+1)}(g,f)(g,H)\psi \nonumber \\&\qquad -\frac{2}{\sigma ^2}G^{-k}(f,H)\psi \Big ]\mathrm{d}\mu _\infty . \end{aligned}$$

For example, let us integrate by parts the terms of order four w.r.t. \(\phi \) of the operator \(\mathcal {A} _1\phi \) in Eq. (3.8). Applying identity (3.9) with \(\psi =\frac{\sigma ^4}{8}\Delta \phi '(e_i)\)\(H=e_i\) and \(k=0\), and then summing on \(i=1,\dots ,d\) yields

$$\begin{aligned}&\int _{\mathcal {M}} \Big [\frac{\sigma ^4}{8}\Delta ^2\phi -\frac{\sigma ^4}{8}G^{-1}\Delta \phi ''(g,g)\Big ] \mathrm{d}\mu _\infty = \int _\mathcal {M} \Big [-\frac{\sigma ^4}{8} G^{-2}(g,g'g)\Delta \phi 'g \nonumber \\&\quad +\frac{\sigma ^4}{8} G^{-1}{{\,\mathrm{div}\,}}(g)\Delta \phi 'g +\frac{\sigma ^2}{4} G^{-1}(g,f)\Delta \phi 'g -\frac{\sigma ^2}{4} \Delta \phi 'f\Big ]\mathrm{d}\mu _\infty . \end{aligned}$$

We apply again (3.9) with \(\psi =\frac{\sigma ^4}{8}\phi ^{(3)}(g,g,e_i)\)\(H=e_i\) and \(k=1\), and then sum on \(i=1,\dots ,d\) to get

$$\begin{aligned}&\int _{\mathcal {M}} \Big [\frac{\sigma ^4}{8}G^{-1}\Delta \phi ''(g,g) -\frac{\sigma ^4}{8}G^{-2}\phi ^{(4)}(g,g,g,g)\Big ] \mathrm{d}\mu _\infty \nonumber \\&\quad = \int _\mathcal {M} \Big [-\frac{\sigma ^4}{4}G^{-1}\sum _i \phi ^{(3)}(g,\partial _i g,ei) +\frac{\sigma ^4}{2}G^{-2}\phi ^{(3)}(g,g,g'g) \nonumber \\&\qquad -\frac{3\sigma ^4}{8}G^{-3}(g,g'g)\phi ^{(3)}(g,g,g) +\frac{\sigma ^4}{8}G^{-2}{{\,\mathrm{div}\,}}(g)\phi ^{(3)}(g,g,g) \nonumber \\&\qquad +\frac{\sigma ^2}{4}G^{-2}(g,f)\phi ^{(3)}(g,g,g) -\frac{\sigma ^2}{4}G^{-1}\phi ^{(3)}(g,g,f)\Big ]\mathrm{d}\mu _\infty . \end{aligned}$$

Subtracting (3.11) from (3.10) allows to express \(\int _\mathcal {M} \mathcal {A} _1\phi \mathrm{d}\mu _\infty \) with derivatives of \(\phi \) of order strictly less than 4. We iterate this method to obtain \(\int _\mathcal {M} \mathcal {A} _1\phi \mathrm{d}\mu _\infty =\int _\mathcal {M} \mathcal {A} _1^0\phi \mathrm{d}\mu _\infty \) where \(\mathcal {A} _1^0\) is an operator of order one in \(\phi \) and then find sufficient conditions such that \(\mathcal {A} _1^0=0\). This implies that \(\mathcal {A} _1^*\rho _\infty =0\), and Theorem 2.5 then gives the order two for the invariant measure. The computation of \(\mathcal {A} _1^0\) is further detailed in Sect. 4.

Although constructing methods of high weak order is not the main focus of this paper, considering the explicit formula for \(\mathcal {A} _1\) and comparing with \(\mathcal {L} ^2/2\) (see Sect. 4 for their detailed expansion in B-series), one immediately obtains the following theorem for weak order two of accuracy.

Theorem 3.10

(Runge–Kutta conditions for weak order two) We consider a Runge–Kutta method of the form (3.1) and assume that it satisfies (3.7). If the following conditions are satisfied, then the integrator has weak order two:

Remark 3.11

For \(\delta =\mathbb {1}\), the weak order two conditions of Theorem 3.10 have no solution, which is in contrast with the invariant measure case presented in Theorem 3.8. Indeed, the condition  cannot be fulfilled if we fix \(\delta =\mathbb {1}\).

3.3 Illustrative Examples of High Order Runge–Kutta Methods on Manifolds

In this section, we present several examples of high-order Runge–Kutta methods of the form (3.1). The purpose of these examples is to illustrate our analysis, and deriving new integrators with small error constant, favourable stability properties, small variance and fast convergence to equilibrium is a challenging open question which is not addressed in the present paper. First, we introduce a method that has order two for sampling the invariant measure of the constrained Langevin dynamics (1.2). Since there are many solutions to the order conditions, we obtain this integrator by solving numerically an optimization problem: we minimize the absolute values of the coefficients of the method under the constraints given by the order conditions of Theorem 3.8. This method is explicit in f and uses only three evaluations of f per step. It is defined by the following Butcher tableau

$$\begin{aligned} \begin{array} {c|cccc||c|cccc||c} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} d_1\\ c_2 &{} c_2 &{} 0 &{} 0 &{} 0 &{} 1 &{} {\widehat{a}}_{21} &{} {\widehat{a}}_{22} &{} 0 &{} 0 &{} d_2\\ c_3 &{} 0 &{} c_3 &{} 0 &{} 0 &{} 1 &{} {\widehat{a}}_{31} &{} {\widehat{a}}_{32} &{} {\widehat{a}}_{33} &{} 0 &{} d_3\\ 1 &{} {\widehat{a}}_{41} &{} {\widehat{a}}_{42} &{} {\widehat{a}}_{43} &{} 0 &{} 1 &{} {\widehat{a}}_{41} &{} {\widehat{a}}_{42} &{} {\widehat{a}}_{43} &{} 0 &{} 1\\ \hline &{} {\widehat{a}}_{41} &{} {\widehat{a}}_{42} &{} {\widehat{a}}_{43} &{} 0 &{} &{} {\widehat{a}}_{41} &{} {\widehat{a}}_{42} &{} {\widehat{a}}_{43} &{} 0 &{} \end{array} \end{aligned}$$

or by the associated set of equations

$$\begin{aligned} \begin{aligned} Y_1&=X_n+\sigma \sqrt{h} d_1 \xi _n + \lambda _1 g(Y_1), \\ Y_2&=X_n+hc_2 f(Y_1)+\sigma \sqrt{h} d_2 \xi _n + \lambda _2 \left[ {\widehat{a}}_{21} g(Y_1)+{\widehat{a}}_{22}g(Y_2)\right] , \\ Y_3&=X_n+hc_3 f(Y_2)+\sigma \sqrt{h} d_3 \xi _n + \lambda _3 \left[ {\widehat{a}}_{31} g(Y_1)+{\widehat{a}}_{32} g(Y_2)+{\widehat{a}}_{33}g(Y_3)\right] , \\ X_{n+1}&=X_n+h\sum _{j=1}^3 {\widehat{a}}_{4j} f(Y_j)+\sigma \sqrt{h} \xi _n + \lambda _4 \sum _{j=1}^3 {\widehat{a}}_{4j} g(Y_j), \\ \text {where }&\lambda _1,\lambda _2,\lambda _3,\lambda _4 \text { are such that }\zeta (Y_1)=\zeta (Y_2)=\zeta (Y_3)=\zeta (X_{n+1})=0, \end{aligned}\qquad \end{aligned}$$

and with the values of \(c_i\)\(d_i\)\({\widehat{a}}_{ij}\) given in Appendix C. To implement one step of this scheme, we apply a few iterations of the Newton method to find the projections on \(\mathcal {M} \). We emphasize that if the stepsize h is not small enough, the fixed point problems of finding \(\lambda _i\) such that \(\zeta (Y_i)=0\) may not be well defined, leading to diverging Newton iterations. Following Remark 3.2, we replace the standard Gaussian random vectors \(\xi \) in (3.1) by independent bounded discrete random vectors \({\widehat{\xi }}\) that satisfy (3.2). This way, the order two for the invariant measure is preserved and the method is well posed for h small enough.

With the same methodology we used to obtain the order conditions of Theorem 3.8 and Theorem 3.10, and with the expressions of \(\mathcal {A} _1\phi \) and \(\mathcal {A} _1^0\phi \) (see Sect. 4 for further details), we also get classes of Runge–Kutta integrators and their order conditions for the following specific subproblems.

Euclidean case \(\mathbb {R} ^d\). Fixing \(g=0\) in the expressions of \(\mathcal {A} _1\phi \) and \(\mathcal {A} _1^0\phi \) yields the order two conditions in the weak sense and for the invariant measure in \(\mathbb {R} ^d\) as given in [39, Tables 1-2].

Deterministic case. Fixing \(\sigma =0\) in the expression of \(\mathcal {A} _1\phi \) yields the order conditions for approximating the solution of ODEs of the form \(\dot{x}=\Pi _\mathcal {M} (x)f(x)\), where f is a gradient. Note that this equation can be rewritten as the following differential algebraic equation (DAE) of index two (see [31, Chap. VII]):

$$\begin{aligned} {\dot{x}}&=f(x)+\lambda g(x),\nonumber \\ 0&=\zeta (x). \end{aligned}$$

We obtain a class of deterministic Runge–Kutta methods for solving DAEs of the form (3.13) by setting \(\sigma =0\) in (3.1). A Runge–Kutta method of this form is consistent if \(b^T \mathbb {1}=1\), and has order two if . For instance, an order two method for solving ODEs of the form (3.13) is

$$\begin{aligned} X_{n+1}=X_n+h\frac{f(X_n)+f(X_{n+1})}{2}+ \lambda \frac{g(X_n)+g(X_{n+1})}{2},\quad \zeta (X_{n+1})=0. \end{aligned}$$

Spherical case. In the simple case where \(\mathcal {M} \) is the unit sphere in \(\mathbb {R} ^d\) (that is, when the constraint is of the form \(\zeta (x)=(\left| x\right| ^2-1)/2\) and \(g(x)=x\)), the consistency conditions (3.7) reduce to \(b^T \mathbb {1}=d_s=1\). The weak order two conditions of Theorem 3.10 reduce to the following conditions:

On the other hand, the order two conditions for the invariant measure of Theorem 3.8 on the sphere are the following:

For example, the following integrator has order two for the invariant measure if \(\mathcal {M} \) is a sphere:

$$\begin{aligned} Y_1&=X_n+h\left( \frac{3}{2}-\sqrt{2}\right) f(Y_2) +\sigma \sqrt{h} \left( 1-\frac{\sqrt{2}}{2}\right) \xi _n + \lambda _1 (2 Y_1-Y_2),\\ Y_2&=X_n+h f(Y_1)+\sigma \sqrt{h} \xi _n + \lambda _2 Y_1, \quad \zeta (Y_1)=\zeta (Y_2)=0,\\ X_{n+1}&=Y_2. \end{aligned}$$

Brownian motions on manifolds. Runge–Kutta methods of the form (3.1) can also be used for simulating a Brownian motion on a manifold (see [33, Chap. III]) by solving numerically

$$\begin{aligned} \mathrm{d}X=\Pi _\mathcal {M} (X) \circ \mathrm{d}W,\quad X(0)=X_0\in \mathcal {M}. \end{aligned}$$

We recall that in the context of \(\mathbb {R} ^d\), the Euler–Maruyama integrator is exact for approximating a Brownian motion in law. However, in the context of manifolds, there are no exact Runge–Kutta integrators for simulating a Brownian motion on \(\mathcal {M} \) in general. In particular, the Euler scheme (1.7) only has weak order one for solving (3.14) in general. Fixing \(f=0\) in (3.1) yields a class of Runge–Kutta methods for solving (3.14). The consistency conditions are \(d_s=1\) and . The conditions for order two for the invariant measure (respectively for weak order two) of such a Runge–Kutta method are obtained by deleting the order conditions in Theorem 3.8 (respectively, in Theorem 3.10) that involve Ab or c. In the specific case where \(\mathcal {M} \) is a sphere, the consistency conditions (3.7) become \(d_s=1\) and the weak order two conditions of Theorem 3.10 reduce to the two following conditions:

For example, a weak order two method for simulating a Brownian motion on a sphere is

$$\begin{aligned} X_{n+1}=X_n+\sqrt{h}\xi _n+ \lambda \frac{3 X_n +\sqrt{h}\xi _n + X_{n+1}}{4},\quad \zeta (X_{n+1})=0. \end{aligned}$$

In addition, there are no additional order two conditions for the invariant measure, that is, any consistent integrator, such as the Euler scheme (1.7), has at least order two for the invariant measure on the sphere.

4 Exotic Aromatic B-Series for Computing Order Conditions

As described in the introduction, B-series were introduced to tackle the calculations of order conditions of ODEs by representing Taylor expansions with trees. In [16], an extension of the original B-series, called aromatic B-series, was used to study volume-preserving integrators. It allowed in particular to represent the divergence of a B-series. B-series and aromatic B-series were also studied later in [26, 49, 52] for their geometric properties, and in [8, 15] for their algebraic structure of Hopf algebras. In [39], a new formalism of B-series, called exotic aromatic B-series, was introduced for computing order conditions for sampling the invariant measure of SDEs in \(\mathbb {R} ^d\). It added a new kind of edge, called liana, to the aromatic trees in order to represent new terms such as the Laplacian of an aromatic B-series. In this section, we extend the formalism of exotic aromatic B-series by allowing the representation of scalar products, and show that the operators \(\mathcal {L} ^j\) and \(\mathcal {A} _j\) can be represented conveniently in the form of B-series. We also rewrite the integration by parts formula (3.9) as a straightforward process on graphs, and apply it to compute \(\mathcal {A} _1^0\).

We consider graphs \(\gamma =(V,E,L)\) where V is the set of nodes, E the set of edges and L the set of lianas. We split the set of edges into \(E=E_0\cup E_S\) where \(E_0\) are the standard oriented edges as defined in [39], and where \(E_S\) is a new set of non-oriented edges represented as double horizontal straight lines. If \((v,w)=(w,v)\in E_S\), we consider this edge as an outgoing edge for both v and w, but v and w are not predecessors of each other. If \((v,w)\in E_S\), we denote \(S(v)=w\) and \(S(v)=v\) otherwise. We consider graphs where each node has exactly one outgoing edge, except exactly one node, called the root r, that has none. If we consider only the graph (VE), where we erase the lianas, it can be decomposed in two kinds of connected components: one that contains the root that we name the rooted tree, and the other components that we name aromas. We decompose the set of nodes in \(V=V_f\cup V_g\cup \{r\}\) where \(V_f\) are the nodes representing the function f and are represented with black disks (respectively, \(V_g\) represent the function g and are drawn with white disks). We write \(N_f(\gamma )\) the number of elements of \(V_f\) (respectively, \(N_g(\gamma )\) the number of elements of \(V_g\)) and \(N_l(\gamma )\) the number of lianas. The order of a directed graph \(\gamma =(V,E,L)\) is defined as:

$$\begin{aligned} \left| \gamma \right| =N_f(\gamma ) +N_l(\gamma ) +\frac{N_g(\gamma )}{2} -\left| E_S\right| . \end{aligned}$$

For instance, the graph \(\gamma =(V,E,L)\) with

$$\begin{aligned} V_f= & {} \{v_2,v_5,v_6\}, \quad V_g=\{v_1,v_3,v_4,v_7\}, \quad E_S=\{(v_6,v_7)\},\nonumber \\ E_0= & {} \{(v_1,r),(v_2,v_1),(v_3,r),(v_4,v_4),(v_5,v_4)\}, \quad L=\{(v_2,v_2),(v_3,v_5),(v_5,v_6)\},\nonumber \\ \end{aligned}$$

satisfies \(\left| \gamma \right| =7\) and is represented as

We say that two directed graphs \((V^1,E^1,L^1)\) and \((V^2,E^2,L^2)\) are equivalent if there exists a bijection \(\varphi :V^1\rightarrow V^2\) such that

$$\begin{aligned} \varphi (V_f^1)= & {} V_f^2, \quad \varphi (V_g^1)=V_g^2, \quad (\varphi \times \varphi )(E^1)=E^2, \\ (\varphi \times \varphi )(E_S^1)= & {} E_S^2, \quad (\varphi \times \varphi )(L^1)=L^2. \end{aligned}$$

We call exotic aromatic forests the equivalence classes of these directed graphs \(\gamma =(V,E,L)\), and we denote \(\mathcal {E} \mathcal {A} \mathcal {T} \) the set of exotic aromatic forests. In addition, we need a different set of rooted forests where the root is in \(V_f\) or \(V_g\). We call them exotic aromatic vector fields and gather them together in the set \(\mathcal {E} \mathcal {A} \mathcal {V} \). The elementary differential associated with an exotic aromatic forest is given by the following definition:

Definition 4.1

Let \(\gamma =(V,E,L)\in \mathcal {E} \mathcal {A} \mathcal {T} \), and let f\(g:\mathbb {R} ^d \rightarrow \mathbb {R} ^d\) and \(\phi :\mathbb {R} ^d\rightarrow \mathbb {R} \) be smooth functions. We denote \(l_1,\dots ,l_s\) the elements of L\(v_1,\dots ,v_m\) the elements of \(V\smallsetminus \{r\}\) and \(\delta _{i,j}\) the Kronecker symbol (\(\delta _{i,j}=1\) if \(i=j\)\(\delta _{i,j}=0\) else). We use the notation for \(v\in V\)\(I_{\pi (v)}=(i_{q_1},\dots ,i_{q_s})\) where \(\pi (v)=\{q_1,\dots ,q_s\}\) are the predecessors of v, and \(J_{\Gamma (v)}=(j_{l_{x_1}},\dots ,j_{l_{x_t}})\) where \(\Gamma (v)=\{l_{x_1},\dots ,l_{x_t}\}\) are the lianas linked to v. Then, \(F(\gamma )\) is defined as

$$\begin{aligned} F(\gamma )(f,g,\phi )&=\sigma ^{2(\left| \gamma \right| -N_f(\gamma ))} G^{-N_g(\gamma )/2}\\&\quad \sum _{i_{v_1},\dots ,i_{v_m}=1}^d \sum _{j_{l_1},\dots ,j_{l_s}=1}^d \left( \prod _{v\in V_f} \delta _{i_v,i_{S(v)}} \partial _{I_{\pi (v)}} \partial _{J_{\Gamma (v)}} f_{i_v}\right) \\&\quad \cdot \left( \prod _{v\in V_g} \delta _{i_v,i_{S(v)}} \partial _{I_{\pi (v)}} \partial _{J_{\Gamma (v)}} g_{i_v}\right) \partial _{I_{\pi (r)}} \partial _{J_{\Gamma (r)}} \phi . \end{aligned}$$

For example, the differential associated with the exotic aromatic forest \(\gamma \) given by (4.1) is

$$\begin{aligned} F(\gamma )(f,g,\phi )&=\sigma ^{8} G^{-2}\sum _{i_{v_1},\dots ,i_{v_7}=1}^d \sum _{j_{l_1},\dots ,j_{l_3}=1}^d \partial _{j_{l_1}j_{l_1}}f_{i_{v_2}} \partial _{j_{l_2}j_{l_3}}f_{i_{v_5}} \delta _{i_{v_6},i_{v_7}} \partial _{j_{l_3}}f_{i_{v_6}}\\&\qquad \cdot \partial _{i_{v_2}}g_{i_{v_1}} \partial _{j_{l_2}}g_{i_{v_3}} \partial _{i_{v_4}i_{v_5}}g_{i_{v_4}} \delta _{i_{v_7},i_{v_6}} g_{i_{v_7}} \partial _{i_{v_1}i_{v_3}}\phi . \end{aligned}$$

We extend the definition of F on \({{\,\mathrm{Span}\,}}(\mathcal {E} \mathcal {A} \mathcal {T})\) by linearity and write, for the sake of simplicity, \(F(\gamma )(\phi )\) instead of \(F(\gamma )(f,g,\phi )\). An exotic aromatic B-series is a formal series indexed over \(\mathcal {E} \mathcal {A} \mathcal {T} \) of the form

$$\begin{aligned} B(a)(\phi )=\sum _{\gamma \in \mathcal {E} \mathcal {A} \mathcal {T}} h^{\left| \gamma \right| }a(\gamma )F(\gamma )(\phi ). \end{aligned}$$

Remark 4.2

As we assumed that the functions f and g are gradients, multiple exotic aromatic forests can represent the same differential. We do not detail here the method to identify two such forests as it is similar to [39, Prop. 4.7] in the context of \(\mathbb {R} ^d\).

The following result states that the operators \(\mathcal {L} ^j/j!\) and \(\mathcal {A} _j\) can be written with exotic aromatic forests. We omit the proof for the sake of brevity as it is similar to [39, Thm. 4.1].

Proposition 4.3

Take a Runge–Kutta method of the form (3.1), then the expansions (2.4) and (2.6) can be formally written with exotic aromatic B-series, that is, there exists two maps e and a over \(\mathcal {E} \mathcal {A} \mathcal {T} \) such that

$$\begin{aligned} \mathbb {E} [\phi (X(h))|X(0)=x]=B(e)(\phi )(x), \quad \mathbb {E} [\phi (X_1)|X_0=x]=B(a)(\phi )(x), \end{aligned}$$

and where the operators are given by

$$\begin{aligned} \frac{\mathcal {L} ^j}{j!}=F\bigg (\sum _{\gamma \in \mathcal {E} \mathcal {A} \mathcal {T},\left| \gamma \right| =j} e(\gamma )\gamma \bigg ), \quad \mathcal {A} _{j-1}=F\bigg (\sum _{\gamma \in \mathcal {E} \mathcal {A} \mathcal {T},\left| \gamma \right| =j} a(\gamma )\gamma \bigg ). \end{aligned}$$

If \(e(\gamma )=a(\gamma )\) for all \(\gamma \in \mathcal {E} \mathcal {A} \mathcal {T} \) with \(1\le \left| \gamma \right| \le p\), then the integrator has at least weak order p.

For example, the operator \(\mathcal {L} \) in (2.2) can be rewritten with exotic aromatic forests as:

We present in Table 2 (see Appendix D) the decomposition in exotic aromatic forests of the operators \(\mathcal {L} ^2\phi /2=\mathcal {L} (\mathcal {L} \phi )/2\) and \(\mathcal {A} _1\phi \) under the consistency condition (3.7).

Remark 4.4

If we replace the functions g and \(\phi \) by f and fix \(\sigma =G=1\), the newly obtained exotic aromatic B-series satisfy an isometric equivariance property, that is, they stay unchanged when applying an isometric coordinate transformation. It was proved in [52] that under a condition of locality, aromatic B-series are exactly the affine equivariant methods, that is, the maps that stay unchanged when applying an affine coordinate transformation. Analogously, it would be interesting to make a link between the isometric equivariant maps and the exotic aromatic B-series.

In the spirit of the Butcher product on trees [29, Chap. III], we introduce a few notations for writing with ease different operations on forests.


Let \(\gamma \) be an exotic aromatic forest/vector field, \(\tau \) be an exotic aromatic vector field and v a node of \(\gamma \), then we define the following operators on forests.

  1. 1.

    : sum of all exotic aromatic forests/vector fields obtained by linking the root of \(\tau \) to a node of \(\gamma \) with a new edge in \(E_0\)

  2. 2.

    (resp. ): aroma obtained by linking the root of \(\tau \) to a white node (resp. a black node) with a new edge in \(E_S\)

  3. 3.

    : sum of all aromas obtained by linking the root of \(\tau \) to a node of \(\tau \) with a new edge in \(E_0\)

  4. 4.

    : sum of all exotic aromatic forests/vector fields obtained by linking the node v to a node of \(\gamma \) with a new liana

  5. 5.

    : forest obtained by linking the root of \(\tau \) to the node v of \(\gamma \) with a new edge in \(E_0\)

For simplicity, we combine multiple operations on a same forest as in and , where operation 1 is always applied first.

For example, let  and \(v=r\) the root of \(\gamma \), then we get

The integration by parts (3.9) can be rewritten conveniently with exotic aromatic forests.

Lemma 4.5

Let \(\gamma \in \mathcal {E} \mathcal {A} \mathcal {T} \) and \(\tau \in \mathcal {E} \mathcal {A} \mathcal {V} \), then the process of integration by parts rewrites into


We write \(\gamma \sim {\widetilde{\gamma }}\) if it is possible to go from \(\gamma \in \mathcal {E} \mathcal {A} \mathcal {T} \) to \({\widetilde{\gamma }}\in {{\,\mathrm{Span}\,}}(\mathcal {E} \mathcal {A} \mathcal {T})\) with the processes of integration by parts (4.2) or (4.3). We extend this relation by linearity on \({{\,\mathrm{Span}\,}}(\mathcal {E} \mathcal {A} \mathcal {T})\) and make it symmetric so that \(\sim \) becomes an equivalence relation on \({{\,\mathrm{Span}\,}}(\mathcal {E} \mathcal {A} \mathcal {T})\). For example, the integrations by parts (3.10) and (3.11) can be rewritten with exotic aromatic B-series by using (4.3) with  and . It yields

For the sake of completeness, we present in Appendix B the integrations by parts for the order 3 terms of \(\mathcal {A} _1\phi \). The computations are similar for the terms of order two in \(\phi \).

Remark 4.6

In the Euclidean case \(\mathbb {R} ^d\), that is, for a forest \(\gamma \in \mathcal {E} \mathcal {A} \mathcal {T} \) and a vector field \(\tau \in \mathcal {E} \mathcal {A} \mathcal {V} \) with \(N_g(\gamma )=N_g(\tau )=0\) and \(g=0\), Lemma 4.5 reduces to the two following equations:

We recover the process of integration by parts described in [39, Thm. 4.4] in the context of exotic aromatic B-series in \(\mathbb {R} ^d\).

We can now revisit the statement of Theorem 2.5 in terms of B-series.

Theorem 4.7

Take a consistent ergodic Runge–Kutta method of the form (3.1). We denote \(\mathcal {A} _{i}=F(\gamma _i)\) with \(\gamma _i\in \mathcal {E} \mathcal {A} \mathcal {T} \). If \(\gamma _i\sim \gamma _i^0\) and \(F(\gamma _i^0)=0\) for \(1\le i< r\), then the method has at least order r for the invariant measure.

By applying repeatedly the process of integrations by parts described in Lemma 4.5, one can simplify the operator \(\mathcal {A} _{i}=F(\gamma _i)\) into an operator of the form \(\mathcal {A} _{i}^0=F(\gamma _i^0)\) such that \(\gamma _i\sim \gamma _i^0\). The complete decomposition of \(\mathcal {A} _1^0\) into exotic aromatic forests is detailed in Table 3 (see Appendix D). According to Theorem 4.7, choosing the coefficients of the Runge–Kutta method such that \(\gamma _1^0=0\) yields the order two conditions for the invariant measure, as stated in Theorem 3.8.

Remark 4.8

We call \(\mathcal {E} \mathcal {A} \mathcal {T} ^0\) the subset of exotic aromatic forests whose root has only one predecessor (that is, the forests associated with an order one operator) or that have a rooted tree of the form , ...Then, if \(\gamma \in \mathcal {E} \mathcal {A} \mathcal {T} \), there exists \(\gamma ^0 \in \mathcal {E} \mathcal {A} \mathcal {T} ^0\) such that \(\gamma \sim \gamma ^0\). For instance, for a consistent method of the form (3.1), the operator \(\mathcal {A} _1^0=F(\gamma _1^0)\) has the form

so that \(\gamma _1^0 \in \mathcal {E} \mathcal {A} \mathcal {T} ^0\), and \(\mathcal {A} _1^0\) is a differential operator of order one if the condition \(b^T d={\widehat{b}}^T d\) holds.

5 Numerical Experiments

In this section, we perform numerical experiments to confirm the theoretical findings, first on a sphere and a torus in \(\mathbb {R} ^3\), and then on the special linear group.

5.1 Invariant Measure Approximation on a Sphere and a Torus

To check the numerical order two of the Runge–Kutta integrator (3.12) presented in Sect. 3.3, we first compare it with the Euler scheme (1.7) on the unit sphere in \(\mathbb {R} ^3\), where the constraint is given by \(\zeta (x)=(x_1^2+x_2^2+x_3^2-1)/2\). We choose the potential \(V(x)=25(1-x_1^2-x_2^2)\), with \(\sigma =\sqrt{2}\)\(\phi (x)=x_3^2\)\(f=-\nabla V\)\(g=\nabla \zeta \)\(M=10^7\) independent trajectories to have a small Monte Carlo error and a final time \(T=20\). Observe that for the smaller final time \(T=10\) (not included in the figures for conciseness), the convergence curves reveal nearly identical to the case \(T=20\) considered in Fig. 1, which suggests that the numerical solutions are already very close to equilibrium at these final times. Following Remark 3.2 and Lemma 3.3, we use discrete bounded random variables satisfying (3.2) in the implementation of the integrators. For both integrators, we compute the Monte Carlo estimator \(\bar{J}=\frac{1}{M}\sum _{m=1}^M \phi (X_N^{(m)}) \simeq \mathbb {E} [\phi (X_N)]\), where \(X_n^{(m)}\) is the m-th realization of the integrator at time \(t_n=nh\), and N is an integer satisfying \(Nh=T\). We compare this approximation with a reference value of \(\int _\mathcal {M} \phi \mathrm{d}\mu _\infty \) computed via a standard quadrature formula, and we plot the error for the invariant measure (1.5) versus different timestep h. We also plot an estimate of the Monte Carlo error by using the standard error of the mean estimator \(\big (\sum _{m=1}^M (\phi (X_N^{(m)})-\bar{J})^2\big )^{1/2}/\sqrt{M(M-1)}\). We observe in all convergence plots that the Monte Carlo error prevails for small values of the timestep h. On Fig. 1, we observe as expected order one for the Euler scheme (1.7) and order two for the Runge–Kutta scheme (3.12).

Fig. 1
figure 1

A trajectory of the order two method (left) and the convergence curve for the sphere for the invariant measure (right) with the potential \(V(x)=25(1-x_1^2-x_2^2)\)\(\phi (x)=x_3^2\), a final time \(T=20\) and \(M=10^7\) trajectories

We then apply the Euler scheme (1.7) and the Runge–Kutta integrator (3.12) on a torus defined by the constraint \(\zeta (x)=(x_1^2+x_2^2+x_3^2+R^2-r^2)^2-4R^2(x_1^2+x_2^2)\) with \(R=3\) and \(r=1\). The potential is \(V(x)=25(x_3-r)^2\), and we choose \(\sigma =\sqrt{2}\)\(\phi (x)=x_3^2\)\(f=-\nabla V\)\(g=\nabla \zeta \), a final time \(T=20\) and \(M=10^7\) independent trajectories. On Fig. 2, we plot the error for the invariant measure versus the timestep h, by using a reference value for \(\int _\mathcal {M} \phi \mathrm{d}\mu _\infty \) obtained with a standard quadrature formula. As expected, we observe order two for the proposed integrator. These curves confirm the theoretical findings presented in Sect. 3. In particular, the scheme (3.12) has order two of accuracy for the invariant measure on manifolds, according to Theorem 3.8. Note that if we had chosen a very short final time T, we would have observed the weak order one instead of the order two for the invariant measure as we would not have reached equilibrium.

Fig. 2
figure 2

A trajectory of the order two method (left) and the convergence curve for the torus for the invariant measure (right) with the potential \(V(x)=25(x_3-r)^2\)\(\phi (x)=x_3^2\), a final time \(T=20\) and \(M=10^7\) trajectories

5.2 Invariant Measure Approximation on the Special Linear Group

Sampling on a manifold \(\mathcal {M} \) is especially useful to compute integrals of the form \(\int _\mathcal {M} \phi (x) \mathrm{d}\mu _\infty \) when \(\mathcal {M} \) is a manifold of high dimension. The class of methods (3.1) is convenient as the number of order conditions does not increase with the dimension of the space increasing. We apply Method (3.12) on a Lie group (in the spirit of [65, 66]) to see how it performs in high dimension. We choose the special linear group \({{\,\mathrm{SL}\,}}(m)=\{M\in \mathbb {R} ^{m\times m},\det (M)=1\}\), seen as a submanifold of \(\mathbb {R} ^{m^2}\) of codimension 1. As explained in Remark 2.6, our analysis still applies to \({{\,\mathrm{SL}\,}}(m)\) if we choose a potential V with appropriate growth assumptions, even if it is not a compact manifold. We compare the Euler scheme (1.7) and the Runge–Kutta integrator (3.12) on \(\mathcal {M} ={{\,\mathrm{SL}\,}}(m)\) for different m (that is, with the constraint \(\zeta (x)=\det (x)-1\)), where we use in the implementation discrete random variables satisfying (3.2). We choose the potential

Table 1 Numerical approximation of the integral \(J(m)=\int _{{{\,\mathrm{SL}\,}}(m)} \phi (x) \mathrm{d}\mu _\infty \) for \(2\le m \le 4\) with the estimator \(\bar{J}=M^{-1}\sum _{k=1}^M \phi (X_N^{(k)})\) where \((X_n)\) is given by the Euler scheme (1.7) for \(\bar{J}_\text {Euler}\) and by the Runge–Kutta integrator (3.12) for \(\bar{J}_2\), with their respective errors
$$\begin{aligned} V(x)=25{{\,\mathrm{Tr}\,}}((x-I_{m^2})^T(x-I_{m^2})) \end{aligned}$$

and the parameters \(\sigma =\sqrt{2}\)\(\phi (x)={{\,\mathrm{Tr}\,}}(x)\) and \(M=10^6\) trajectories. Each trajectory is an approximation of the solution of Eq. (1.2) at time \(T=10\) with a timestep \(h=T/N\) and \(N=2^{12}\) steps. With this timestep h, the Newton method used in the Euler scheme (1.7) does not converge for approximately \(0.005\%\) of the trajectories for \(m=4\). We choose to discard these trajectories, which induces a negligible bias in the expectation. This does not occur for the Runge–Kutta integrator (3.12). We recall that for a small enough timestep h, the Newton method would always converge (see also Remark 3.4). The reference solution for \(J(m)=\int _{{{\,\mathrm{SL}\,}}(m)} \phi (x) \mathrm{d}\mu _\infty (x)\) is computed with the Runge–Kutta method (3.12) with \(h_{\text {ref}}=2^{-14}T\). With the factor 25 in the potential (5.1), the solution of (1.2) stays close to \(I_{m^2}\), and J(m) is close to \(\phi (I_{m^2})=m\). This choice of factor permits to explore a reasonably small area of \({{\,\mathrm{SL}\,}}(m)\) with moderate manifold curvature. We observe numerically that replacing the factor 25 by 1 in (5.1) induces a severe timestep restriction (results not included for conciseness). The computation of J(m) could also be done via the parametrization given by the Iwasawa decomposition for \({{\,\mathrm{SL}\,}}(m)\) (see, for instance, [28, Chap. 1]) and the use of standard quadrature methods, but these methods have prohibitive costs in high dimension. We put together the numerical results in Table 1 and observe that the Runge–Kutta method (3.12) performs significantly better than the Euler scheme (1.7).