A Geometric Framework for Stochastic Shape Analysis

We introduce a stochastic model of diffeomorphisms, whose action on a variety of data types descends to stochastic models of shapes, images and landmarks. The stochasticity is introduced in the vector field which transports the data in the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework for shape analysis and image registration. The stochasticity thereby models errors or uncertainties of the flow in following the prescribed deformation velocity. The approach is illustrated in the example of finite dimensional landmark manifolds, whose stochastic evolution is studied both via the Fokker-Planck equation and by numerical simulations. We derive two approaches for inferring parameters of the stochastic model from landmark configurations observed at discrete time points. The first of the two approaches matches moments of the Fokker-Planck equation to sample moments of the data, while the second approach employs an Expectation-Maximisation based algorithm using a Monte Carlo bridge sampling scheme to optimise the data likelihood. We derive and numerically test the ability of the two approaches to infer the spatial correlation length of the underlying noise.


Introduction
In this work, we aim at modelling variability of shapes using a theory of stochastic perturbations consistent with the geometry of the diffeomorphism group corresponding to the Large Deformation Diffeomorphic Metric Mapping framework (LDDMM, see [You10]). In applications, such variability arises and can be observed, for example, when human organs are influenced by disease processes, as analysed in computational anatomy [YAM09]. Spatially independent white noise contains insufficient information to describe these large-scale variabilities of shapes. In addition, the coupling of the spatial correlations of the noise must be adapted to a variety of transformation properties of the shape spaces. The theory developed here addresses this problem by introducing spatially correlated noise which respects the geometric structure of the data. This method provides a new way of characterising stochastic variability of shapes using spatially correlated noise in the context of the standard LDDMM framework.
We will show that this specific type of noise can be used for all the data structures that the LDDMM framework applies to. The LDDMM theory was initiated by [Tro95,CRM96,DGM98,MTY02,BMTY05] building on the pattern theory of [Gre94]. LDDMM models the dynamics of shapes by the action of diffeomorphisms on shape spaces. It gives a unified approach to shape modelling and shape analysis, valid for a range of structures such as landmarks, curves, surfaces, images, densities or even tensor-valued images. For any such data structure, the optimal shape deformations are described via the Euler-Poincaré equation of the diffeomorphism group, usually referred to as the EPDiff equation [HMR98,HM05,YAM09]. In this work, we will show how to obtain a stochastic EPDiff equation valid for any data structure, and in particular for the finite dimensional representation of images based on landmarks. For this, we will follow the LDDMM derivation of [BGBHR11] based on geometric mechanics [MR99,Hol11] and the existence of momentum maps to represent images and shapes. The momentum maps are the key structures to retain after the introduction of noise into the EPDiff equation, as they will allow us to use most of the technology developed for shape analysis in the deterministic context and in computational anatomy.
This work is not the first to consider stochastic evolutions in LDDMM. Indeed, [TV12,Via13] and more recently [MS16] already investigated the possibility of stochastic perturbations of landmark dynamics. In these works, the noise is introduced into the momentum equation, as though it were an external random force acting on each landmark independently. In [MS16], an extra dissipative force was added to balance the energy input from the noise and to make the dynamics correspond to a certain type of heat bath used in statistical physics. Here, we will instead introduce an Eulerian noise directly into the reconstruction relation used to find the deformation flows from the velocity fields, which are solutions of the EPDiff equation [HM05,You10]. As we will see, this derivation of stochastic models is compatible with variational principles, preserves the momentum map structure and yields a stochastic EPDiff equation with a novel type of multiplicative noise, depending on the gradient of the solution, as well as its magnitude. This model is based on the previous works [Hol15,ACH16], where, respectively, stochastic perturbations of infinite and finite dimensional mechanical systems were considered. The Eulerian nature of the noise discussed here implies that the noise correlation depends on the image position and not, as for example in [TV12,MS16], on the landmarks themselves. This property will explain why the noise is compatible with any data structure while retaining the freedom in the choice of its spatial correlation. The radius for the landmark kernel is twice their average initial distances. In blue is the stochastic perturbation developed in this paper. The black dots represent the J Eulerian noise fields arranged in a grid configuration. In magenta is the evolution resulting from additive noise in the momentum equation, different for each landmark but with the same amplitude as the Eulerian noise. We run three initial value simulations to compare the limit of a large number of landmarks and small noise correlation. The Eulerian noise model (blue) is robust to the continuum limit and can reproduce the general behaviour of the additive noise model. Furthermore, the choice of the noise fields provides an additional freedom in parameterisation which will be studied and exploited in this work.
To illustrate this framework and give an early demonstration of the stochastic landmark dynamics, we display in Figure 1 three experiments which compare the proposed model with a stochastic forcing model, such as studied for example by [TV12]. The proposed model is the stochastic Hamiltonian system for the positions of the landmarks q i and their canonical momenta p i (1.1) Here, the σ i are given functions of space which represent the spatial correlations of the noise. In Figure 1, the σ i fields are Gaussian fields whose variance is equal to twice their separation distance and locations are indicated by black dots. We where σ is a constant. In this case, the noise corresponds to a stochastic force acting on the landmarks, whose corresponding Brownian motion is different for each landmark. We show on the first panel of Figure 1 that for a small number of landmarks and a large range of spatial correlations of the noise, both types of stochastic deformations in (1.1) and (1.2) visually coincide. This is shown for a simple experiment in translating a circle (from the black circle to the black dashed circle). By doubling the number of landmarks (middle panel), the dynamics of (1.2) results in small-scale noise correlation (magenta), whereas the proposed model (blue) remains equivalent to the first experiment. This figure illustrates shape evolution when the noise is Eulerian and independent of the data structure. Indeed, the limit of a large number of landmarks corresponds to a certain continuum limit, in this case corresponding to curve dynamics. Finally, in the right-most panel, we reduce the range of the spatial correlation of the noise by adding more noise fields. This arrangement allows us to qualitatively reproduce the dynamics of the equation (1.2) with the same number of landmarks.
Modelling large-scale shape variability with noise is of interest for applications in computational anatomy where sources of variability include natural ageing, the influence of diseases such as Alzheimer's disease, and intra-subject population scale variations. In the LDDMM context, these effects are sometimes modelled using the random orbit model [MBC + 97]. The random orbit approach models variability in the observed data by using an ensemble of initial velocities in matching a template to a set of observations via geodesic flows, see [VMYT04]. The randomness is confined to the initial velocity as opposed to the evolving stochastic processes that we will use here. A prior can be defined by assuming a distribution of the initial velocities, and Bayesian approaches can then be used for inference of the template shape as well as additional unknown parameters [AAT07,MMTY08,ZSF13]. The stochastic model developed here can also be applied to model random warps and generate distributions used in Bayesian shape modelling, and for coupling warps and functional variations such as for example in [RSM14,KSPR17]. Indeed, because the proposed probabilistic approach assigns a likelihood to random deformations, the model can be used for general likelihood-based inference tasks.
In the present model, the observed shape variability indicates the spatial correlation of the noise. As this correlation is generally unknown, estimating the parameters of the correlation structure becomes an important part of the framework. We will address this problem by considering two different methods. One is based on the estimation of the time evolution of the probability distribution of each landmark. For this method, we will derive a set of differential equations approximating the time evolution of the complete distribution via its first moments. We can then solve the inverse problem of estimating the noise correlation from a known initial and final distribution of landmarks by minimization of a certain cost function, solved using a genetic algorithm. The other method is based on an Expectation-Maximisation (EM) algorithm which can infer unknown parameters for a parametric statistical model from observed data. In this context, since only initial and final landmarks positions are observed, the full stochastic trajectories are considered missing information. For this algorithm, we need to estimate the likelihood of stochastic paths connecting sets of observed landmarks. We achieve this by adapting the theory of diffusion bridges to the stochastic landmark equation.
Plan of this work. We begin by developing a general theory of stochastic perturbations for inexact matching in section 2. We then focus on exact landmark matching in section 3, which is the simplest example of this theory. In particular, we derive the Fokker-Planck equation in section 3.2 and diffusion bridge simulation in section 3.3. In section 4, we describe the two methods for estimating parameters of the noise from observations. The Fokker-Planck based method is in section 4.2 and the Expectation-Maximisation algorithm in section 4.3. We end the paper with numerical examples in section 5 in which we investigate the effect of the noise on landmark dynamics and compare the two methods for estimating the noise amplitude.

Stochastic Large Deformation Matching
In this section, we will first review the geometrical framework of LDDMM, following [BGBHR11], and then introduce noise following [Hol15] to preserve the geometrical structure of LDDMM. The key ingredient for both topics is the momentum map, which we will use as the main tool for reducing the infinite dimensional equation on the diffeomorphism group to equations on shape spaces.
2.1. The Deterministic LDDMM Model. Here, we will briefly review the theory of reduction by symmetry, as applied to the LDDMM context, following the presentation of [BGBHR11]. We detail the proof of the formulas below in the next section when we include noise. Define an energy functional E by where I 0 , I 1 ∈ V are shapes represented in a vector space V on which the diffeomorphism group Diff(R d ) acts, u t is a time-dependent vector field, and λ is a weight, or tolerance, which allows the matching to be inexact. The flow g t ∈ Diff(R d ) corresponding to u t is found by solving the reconstruction relation and I 0 is matched against I 1 through the action g 1 .I 0 of g 1 on s 0 . The vector field u t can be considered an element of the Lie algebra X(R d ). In the case of I 0 , I 1 being images I : R d → R, the action is by push-forward, g.I = I • g −1 , and when I represents N landmarks with positions q i ∈ R d , the action is by evaluation g.q = (g(q 1 ), . . . , g(q N )) (see [BGBHR11] for more details). The group elements can act on various additional shape structures such as tensor fields.
Remark 2.1 (Nonlinear shape structures). This framework can be extended to structures that are not represented by a vector space V , such as curves or surfaces. We leave this extension for future work.
Using the calculus of variations for the functional (2.1) results in the equation of motion for u t of the form d dt which is called the Euler-Poincaré equation. The operation ad * is the coadjoint action of the Lie algebra of vector fields associated to the diffeomorphism group.
The operation ad * acts on the variations δl/δu, which are 1-form densities, in the dual of the Lie algebra of vector fields, under the L 2 pairing. When l(u) is a norm, this equation is the geodesic equation for that norm, in the case that λ = ∞; that is, with exact matching. We will focus on this case later in section 3 when discussing landmark dynamics. Here, the inexact matching term constrains the form of the momentum m = ∂l ∂u to depend on the geodesic path. Following the notation of [BGBHR11], the momentum map is defined as where g t,s is the solution of (2.2) at time t with initial conditions at time s, while J 0 t = g t,0 I 0 and J 1 t = g t,1 I 1 . The value J 0 1 corresponds to the initial image, pushed forward to time t = 1, and J 1 1 = I 1 is the target image. The operations ⋄ and ♭ in the momentum map formula (2.4) are defined, as follows. The Lagrangian l in (2.1) may be taken as kinetic energy, which defines a scalar product and norm l(u) = u, Lu L 2 = u 2 L 2 on the space of vector fields X(R d ). The quantity Lu = δl/δu may then be regarded as the momentum conjugate to the velocity u. Similarly, for the image data space V , we define the dual space V * with the L 2 pairing f, I = Ω f (x)I(x)dx, where f ∈ V * and Ω is the image domain Ω ∈ R d . This identification defines the ♭ operator as ♭ : V → V * . When an element g t of the diffeomorphism group acts on V by push-forward, I t = g t .I 0 = (g t ) * I 0 , the corresponding infinitesimal action of the velocity u in the Lie algebra of vector fields u ∈ X(R d ) is given by u.I := [g * t d dt (g t ) * I 0 ] t=0 . In terms of this infinitesimal action, we can then define the operation ⋄ : V × V * → g * as I ⋄ f, u g×g * := f, u.I V ×V * . (2.5) A detailed derivation of this formula for the momentum map can be found in [BGBHR11].
Remark 2.2 (Solving this equation). We will just add here the important remark that the relation (2.4) introduces nonlocality into the problem, as the momentum implicitly depends on the value of the group at later times. This is exactly what is needed in order to solve the boundary value problem coming from the matching of images I 1 and I 0 . The optimal vector field can be found with a shooting method or a gradient descent algorithm on the energy functional (2.1), see [BMTY05]. For more information about the relation of the momentum map approach of [BGBHR11] to the LDDMM approach of [BMTY05], see [BH15].
2.2. Stochastic Reduction Theory. The aim here is to introduce noise in the Euler-Poincaré equation (2.3) while preserving the momentum map (2.4) such that the noise descends to the shape spaces. Following [Hol15], we introduce noise in the reconstruction relation (2.2) and proceed with the theory of reduction by symmetry. We will focus on a noise described by a set of J real-valued independent Wiener processes W i t together with J associated vector fields σ i ∈ X(R d ) on the data domain. We will later discuss particular forms of these fields and methods for estimating unknown parameters of the fields in the context of landmark matching.
Remark 2.3 (Dimension of the noise). We proceed here with a finite number of J associated vector fields and finite dimensional noise, while leaving possible extension to infinite dimensional noise such as done by [Via13] for later works.
We replace the reconstruction relation (2.2) by the following stochastic process Proof. We first show that the momentum map formula (2.4) persists in the presence of noise. The key step in its computation is to prove the formula in the lemma 2.5 of [BGBHR11] which is given by ∂ t (g −1 δg) = Ad g δu, where Ad is the adjoint action on the diffeomorphism group on its Lie algebra. We first compute the variations of (2.6) and then prove this formula by a direct computation This key formula is the same as in [BMTY05] and [BGBHR11] for the deterministic case. In particular, it does not explicitly depend on the Wiener processes W i t . This ensures that the momentum map formula (2.4) remains the same as in the deterministic case. The last step of the proof is to derive the stochastic Euler-Poincaré equation (2.7). This is done by computing the stochastic evolution of the momentum, given by δl δu = Ad * g −1 (I 0 ⋄ (g −1 1 π)), where π = 1 λ 2 (g i I 0 − I 1 ) ♭ .
The only time dependence is in the coadjoint action, and, by the standard formula where we have used the stochastic reconstruction relation (2.6) in the form In summary, this stochastic perturbation of the LDDMM framework preserves the form of momentum map (2.4), although it does affect the reconstruction relation (2.6) and the Euler-Poincaré equation (2.7). As shown in [BGBHR11], various data structures fit into this framework including landmarks, images, shapes, and tensor fields. In practice, for inexact matching, a gradient descent algorithm can be used to minimise the energy functional (2.1). The noise will only appear in the evaluation of the matching cost via the reconstruction relation. The algorithm of [BMTY05] then directly applies, provided the stochastic reconstruction relation can be integrated with enough accuracy. We will not treat the full inexact matching problem here. Instead, we will study the simpler case of exact matching, where the energy functional consists only of the kinetic term.

Exact stochastic Landmark Matching
In this section, we apply the previous ideas of stochastic deformation of LDDMM to exact matching with landmark dynamics. This is the simplest data structure in the LDDMM framework, and it will serve to give interesting insights into the effect of the noise in this context. Since exact matching means that the energy functional contains only a kinetic energy, the optimal vector field is found from a boundary value problem with the Euler-Poincaré equation (2.3). For exact matching, the momentum map for landmarks takes the simple familiar form for the reduction of the EPDiff equation (see [CH93,HM05]) for N landmarks with momenta p i and positions q i , with i = 1, 2, . . . , N. A direct substitution of u = K * m into the stochastic Euler-Poincaré equation (2.7) gives the stochastic landmark equations in (3.6). Here, K is a given kernel corresponding to the Green's function of the differential operator L used to construct the Lagrangian. Below, we take a different approach and proceed from a variational principle in which the stochastic landmark dynamics is constrained. We refer the interested reader to for example [JS14] for a detailed exposition of this derivation in the deterministic context.

Stochastic Landmarks dynamics.
Recall that for N landmarks in R d , the diffeomorphism group elements g act on the landmarks by evaluation of their position g.q = (g(q 1 ), . . . , g(q N )), and the associated momentum map is (3.1). The original action functional (2.1) can be equivalently written as a constrained variational principle where the p i play the role of Lagrange multipliers enforcing the stochastic reconstruction relation (2.6). This procedure is based on the Clebsch action principle, which for landmark dynamics has been studied for one dimensional motion of landmarks on the real line in [HT16b] Notice that only the Lagrangian depends on the spatial (Eulerian) variable x on the image domain. We now use the momentum map (3.1) which provides us with the relation which reduces this action functional to the finite dimensional space of landmarks. We arrive at the action integral where the Hamiltonian only depends on the landmark variables, as The action integral in (3.3) involves the phase space Lagrangian (3.4) and the stochastic potential, given by Taking free variations of (3.3) gives the stochastic Hamilton equations in the form (3.6) Explicitly, we have (3.7) In coordinates, the stochastic equations (3.6) become (3.8) where α, β run through the domain directions, α, β = 1, . . . , d.
The particular form of the stochastic potential in (3.5) arises from the Legendre transformation of (3.2). The solutions of (3.8) represent the singular solutions of the stochastic EPDiff equation, corresponding to a stochastic path in the diffeomorphism group. In previous works such as [TV12,Via13,MS16], noise has been introduced additively and only in the momentum equation, corresponding to a stochastic force. Also, the noise has typically been taken to be different for each landmark, and one can interpret it having been attached to each landmark. In the present case, the noise is not additive and the Wiener processes are not related to the landmarks, but to the domain of the image. Nearby landmarks will thus be affected by a similar noise, controlled by the spatial correlations of the noise. We refer to Figure 1 in the Introduction for a numerical experiment demonstrating this effect.
Remark 3.1 (Geometric noise). The geometric origin of our noise deserves more explanation. In the position equation (3.6), the noise arises as the infinitesimal transformation by the action of the stochastic vector field in (2.6), namely dgg −1 = udt + i σ i • dW t i , on the manifold of positions of the landmarks, which is generated by the J stochastic potentials, Φ l (q i , p i ) := p i · σ l (q i )). Since this stochastic Hamiltonian is linear in the canonical momenta, the noise perturbing the evolution of the landmark positions is independent of the landmark momenta. On the other hand, the noise in the momentum equations arises as the cotangent lift of the action of the stochastic vector field dgg −1 on the positions of the landmarks. This cotangent lift determines the action on the momentum fibres attached to the perturbed position of each of the landmarks in phase space. The cotangent lift transformation is given explicitly by the product of the momentum and the gradient of the spatial fields σ l with respect to the position q i of the i-th landmark. This difference increases the effect of the noise in regions where the σ l fields have large spatial gradients, provided the landmarks are moving rapidly enough for their momenta to be non-negligible. We will see in the example that in certain cases this dependence on the momentum can significantly affect the dynamics of the landmarks.

The Fokker-Planck Equation.
In this section, we study the evolution of the probability density function of the stochastic landmarks by using the Fokker-Planck equation. This study is possible in the case of landmarks because the associated phase space is finite dimensional.
We will denote the probability density by P(q, p, t), on the phase space R 2dN at time t. The Fokker-Planck equation can be computed using standard procedures and is given in the following proposition.
Proposition 3.2. The Fokker-Planck equation associated to the stochastic process (3.6) for the probability distribution P : R 2dN × R → R is given by where {F, G} can = ∇F T J∇G is the canonical bracket with J = ( 0 1 −1 0 ) and φ l (q, p) = i p i · σ l (q i ) are the stochastic potentials. This formula also defines the forward Kolmogorov operator L * .
Proof. The proof follows the standard derivation of the Fokker-Planck equation, by taking into account the geometrical structure of the stochastic process (3.6). The time evolution of an arbitrary function f : R 2dN → R can be written We then take the expectation of this stochastic process. Thus, we first compute the Itô correction, which reads 1 2 l {{f, φ l } can , φ l } can dt, as a double Poisson bracket. The expectation of the Itô process then removes the noise term and defines the forward Kolmogorov operator such thatḟ = L * f . By pairing this formula with the density function P(q, p, t) over the phase space (q, p) with the usual L 2 pairing as we obtain the Fokker-Planck equationṖ = L * P that is explicitly given by (3.9) as the double bracket term is self-adjoint and the advection term anti-self-adjoint.
Of course, the direct study of this equation is not possible, even numerically, due to its high dimensionality. The main use here of the Fokker-Planck equation will be to understand the time evolution of uncertainties around each landmark. Indeed, for each landmark q i , the corresponding marginal distribution (integrating P over all the other variables) will represent the time evolution of the error on the mean trajectory of this landmark. We will show in the next section how to approximate the Fokker-Planck equation with a finite set of ordinary differential equations which describe the dynamics of the first moments of the distribution P. This will then be used to estimate parameters of the noise fields σ l for given sets of initial and final landmarks.
Remark 3.3 (On ergodicity). The question of ergodicity of the process (3.6) is not relevant here, as we will only consider this process for finite times, usually between t = 0 and t = 1. The existence of stationary measures of the Fokker-Planck equation via Hörmander's theorem is thus not needed here. Nevertheless, we will need a notion of reachability in the landmark position in the next section, where we will show how to sample diffusion bridges for landmarks with fixed initial and final positions. This ensures that there exists a noise realisation which can bring a any set of landmark to any other set of landmarks. This property is weaker than the Hörmander condition and was introduced by [Sus73].
3.3. Diffusion Bridges. The transition probability and solution to the Fokker-Planck equation P(q, p, t) can also be estimated by Monte Carlo sampling of diffusion bridges. This approach will, in particular, be natural for maximum likelihood estimation of parameters of landmark processes using the Expectation-Maximisation (EM) algorithm that will involve expectation over unobserved landmark trajectories. This estimation approach will be used in section 4.3, and we here develop a theory of conditioned bridge processes for landmark dynamics which we will employ in the estimation. The approach is based on the method of [DH06] with two main modifications. The scheme and its modifications will be detailed after a short description of the approach of [DH06].
In [DH06], a Girsanov formula [Gir60], generalized to account for unbounded drifts, is used to show that when the diffusion field Σ(x, t) of an R d -valued diffusion process is uniformly invertible, the corresponding process conditioned on hitting a point v ∈ R d at time T > 0 are absolutely continuous with respect to an explicitly constructed unconditioned process y that will hit v at time T a.s.. The modified process y is constructed by adding a guiding drift term that forces the process towards the target v. In [DH06], this process is constructed as a modification of (3.10) Letting P x|v denote the law of x conditioned on hitting v with corresponding expectation E x|v , the Cameron-Martin-Girsanov theorem implies that P x|v is absolutely continuous with respect to P y , see for example [Øk03] and the discussion in [PR12]. An explicit expression for the Radon-Nikodym derivative dP x|v /dP y can be computed, and this derivative is central for using simulations of the process y to compute expectations over the conditioned process x|v. In particular, as shown in [DH06], the conditioned process x|v and the modified process y are related by where ϕ(y) is a correction factor applied to each stochastic bridge y. Notice here that f is a real-valued function of the stochastic path from t = 0 to t = T and that the formula does not depend on time.
Returning to landmark evolutions in the phase space R 2dN , the process (3.6) has two vector variables (q, p) that typically will be conditioned on hitting a fixed set of landmark positions v at time T . The conditioning thus happens only in the q variables by requiring q T = v. To construct bridges with an approach similar to the scheme of [DH06], we need to find an appropriate guiding term and handle the fact that the diffusion field may not be invertible in general. Recall first that the landmark process (3.6) has diffusion field where σ j (q) denotes the vector (σ j (q 1 ), . . . , σ j (q N )) T . Notice that this matrix is not square and has dimension 2dN × J so that Σ(q, p) • dW t with dW t a J-vector corresponds to the stochastic term of (3.6). Though Σ(q, p) couples the q and p equation, when the number of noise fields J is sufficiently large, the q part Σ q (q) will often be surjective as a linear map R J → R dN . In this situation, by letting Σ q (q) † denote the Moore-Penrose pseudo-inverse of Σ q (q), we can construct a guiding drift term as (3.14) This term, when added to the process (3.6), ensures that the modified process hits q T a.s. at time T . The drift term (3.14) is a direct generalization of the term added in (3.11). If Σ had been invertible then ΣΣ † = Id resulting in the guiding term of [DH06] used in equation (3.11). In the current non-invertible case, ΣΣ † q (q − v) uses the difference q − v which only involves the landmark position but affects both the position and the momentum equations. We stress here the fact that the introduction of noise in the q equation in (3.6) is essential for this approach. When conditioning on the q variable, a guided process could not directly be constructed in this way if the noise was instead introduced in the p equation as in [TV12,Via13,MS16]. The fact that this term is weighted by ΣΣ † is also important as it allows the guiding term to be more efficient in the noisy regions of the image, where there is more freedom to deviate from the deterministic path. The guiding term can be interpreted as originating from a time-rescaled gradient flow, and with the guiding term added, the diffusion process can be see as a stochastically perturbed gradient flow, see [AHPS17].
The guiding term (3.14) is, in practice, not always appropriate for landmarks. Because the correction is dependent only on the difference to the target in the position equation, a phenomenon of over-shooting is often observed. In such cases, the landmarks travel too fast initially due to a large momentum, strengthened by the guiding term forcing the landmarks towards v. The high initial speed is only corrected when the time approaches T and the guiding term brings the landmark back to their final position. This effect is illustrated in Figure 4 in section 5.2 and results in low values of the correction factor ϕ(q, p) used to compute the expectation in (3.12). This results in inefficient samples when approximating (3.12) by Monte Carlo sampling. Instead, letting b(q, p) denote the drift term of (3.6) with Itô correction, we use a guided diffusion process of the form for some appropriately chosen function φ t,T : R 2dN → R dN that gives an estimate of the value ofq T using the value of the modified stochastic process (q t ,p t ) at time t. The hat denotes the solution of the process (3.15), which is different from the original dynamics of the process (3.6) written without the hats. The choice φ t,T (q,p) :=q recovers the guiding term (3.14). It would be natural to define φ t,T (q,p) := E (q,p) (q T |(q t , p t ) = (q,p)). The resulting guiding term will only be driven by the expected amount needed at the endpoint, not from the value at time t. A similar choice but easier to handle is to let φ t,T (q,p) be the solution at time T of the original deterministic landmark dynamics (2.3), obtained from the initial conditions (q t ,p t ) = (q,p). We will use this latter choice, which is visualised in Figure 4, in the rest of the paper. We note that to ensure convergence, a bounded approximationb will replace the original unbounded drift b in (3.15), but this fact has little influence in practice.
The matrix Σ(q,p)Σ q (q) † in 3.15 only accounts for the q dynamics in the pseudoinverse Σ q (q) † . When the momentum is high and the noise fields σ j have high gradients, this fact can again lead to improbable sample paths. In such cases, the scheme can be further generalised by using a guiding term of the form 1 The matrix D h φ t,T (Σ(q,p)h) | h=0 is a linear approximation of the expected endpoint dynamics as a function of the noise vector h ∈ R J . Again, with φ t,T (q,p) :=q, the original guiding term (3.14) is recovered, and the term is close to the guiding term of (3.15) when the momentum or gradients of σ j are low. We use this term for the experiments in section 5.2 involving high momentum dynamics, e.g. Figure 6.
The Theorem 3.4. Assume Σ q (q) : R J → R N d is surjective for all q with Σ q (q) † bounded, and thatb and Σ are C 1,2 , bounded, and with bounded derivatives. Let P (q,p)|v be the law of (q, p) | q T = v, and let (q,p) be solution to (3.15), (q 0 ,p 0 ) = (q 0 , p 0 ) with φ t,T : R 2dN → R dN given by the solution of (2.3). Then, with a positive measurable f : W (R 2dN ) → R, In the Theorem, [·, ·] is the quadratic variation of semi-martingales. As mentioned above, a bounded approximationb must be used to replace the original drift term b in (3.15). The last integral in the expression for log ϕ(q,p, t) is a result of this approximation.
The result is proved in Appendix A. Using the guidance scheme (3.11), the right hand side limit of (3.17) is in [DH06] shown to equal when Σ is invertible. Extending the convergence argument to the present noninvertible case is non-trivial, and we postpone investigating this to future work.
The second order term − 1 2 Tr Σ q (q) T (H (q,p)qq ) of the correction factor is generally small. Though it can be computed during the simulation, we ignore it in the numerical experiments. The termŝ can be computed by finite difference approximation as noted in [DH06].

Estimating the Spatial Correlation of the Noise
We now assume a set of n observed landmark configurations q 1 , . . . , q n at time T , i.e. the observations are considered realisations of the stochastic process at some positive time T . From this data, we aim at inferring parameters of the model. This can be both parameters of the noise fields σ i and parameters for the initial configuration (q(0), p(0)). The initial configuration can be deterministic with fixed known or unknown parameters, or it can be randomly chosen from a distribution with known or unknown parameters. We develop two different strategies for performing the inference. The first inference method in section 4.2 is a shooting method based on solving the evolution of the first moments of the probability distribution of the landmark positions while the second method in section 4.3 is based on the Expectation-Maximisation (EM) algorithm. The discussion is here in the context of landmarks, although these ideas may also apply in the more general context of section 2.
4.1. The Noise Fields. We start by discussing the form of the unknown J noise fields σ l . To estimate them from a finite amount of observed data, we are forced to require the fields to be specified by a finite number of parameters. A possible choice for a family of noise fields is to select J linearly independent elements σ 1 , . . . , σ J from a dense subset of C 1 (R d , R d ). We here use a kernel k with length-scale r l and a noise amplitude λ l ∈ R d , that is where δ l denotes the kernel positions. Possible choices for the kernel include Gaussians k r l (x) = e −x 2 /(2r 2 l ) , or cubic B-splines k r l (x) = S 3 (x/r l ). The Gaussian kernel has the advantage of simplifying calculations of the moment equations whereas the B-spline representation is compactly supported and gives a partition of unity when used in a regular grid. Other interesting choices may include a cosine or a polynomial basis of the image domain.
In principle, the methods below allow all parameters of the noise fields to be estimated given sufficient amount of data. However, for simplicity, we will fix the length-scale and the position of the kernels. The unknown parameters for the noise can then be specified in a single vector variable θ = (λ 1 , . . . , λ K ). The aim of the next sections will be to estimate this vector, possibly in addition to the initial configuration (q(0), p(0)), from data using the method of moments in 4.2 and EM in 4.3, respectively. 4.2. Method of moments. We describe here our first method for estimating the parameters θ by solving a shooting problem on the space of first and second order moments. Given an estimate of the endpoint distributions P(q, p, T ), we will solve the inverse problem which consists in using the Fokker-Planck equation (3.9) to find the values of θ such that we can reproduce the observed final distribution. Solving the Fokker-Planck equation directly is infeasible due to its high dimensionality, so we will derive a set of finite dimensional equations approximating the Fokker-Planck equation (3.9). One very successful method is to approximate the probability distribution P by its first moments. This has been developed in the context of plasma physics for the Liouville equation, an equation similar to the Fokker-Planck equation (3.9).
Remark 4.1 (Geometric moment equation). As the Fokker-Planck (3.9) is written in term of the canonical bracket, we could expect to be able to apply a geometrical version of the method of moments such as the one developed by [HPT07]. Although this method seems to fit the present geometric derivation of the stochastic equations, we will not use it as it is not in our case practically useful. Indeed, it requires the expansions of the Hamiltonian functions in term of the moments, but we cannot obtain here a valid expansion with a finite number of terms. This is due to the fact that the LDDMM kernel and the noise kernels cannot generally be globally approximated by finite polynomials with bounded approximation error for large distances. This would, in turn, produce spurious strong interactions between distant landmarks.
The method for approximating the Fokker-Planck that we will use here is the following. We first define the moments where we have written only two possible moments, although any combinations of p and q at any order are possible. In this work we will only consider moments up to the second order, that is the moments q α i , p α i , q α i q β j , q α i p β j and p α i p β j . Notice that the first moment are (1, 1)-tensors, and the second moments are (2, 2)-tensors, although we will only use index notation here.
We illustrate this method with the first moment q α i , which represents the mean position of the landmarks. We compute its time derivative and use the property of the Kolmogorov operator L defined in (3.9) to obtain d dt q α i = q α i L * P θ dqdp = L q α i P θ dqdp = L q α i .

(4.4)
We thus first need to apply the Kolmogorov operator L to q α i to obtain (4.5) which corresponds to the q part of the drift of the stochastic process with Itô correction. Similarly, for the momentum evolution, we obtain (4.6) Most of the terms on the right hand side of (4.5) and (4.6) are nonlinear; so their expected value cannot be written in terms of only the first moments. This is the usual closure problem of moment equations, such as the BBGKY problem arising in many-body problems in quantum mechanics. The solution to this problem is to truncate the hierarchy of moments at a given order and consider the system of ODEs as an approximation of the complete Fokker-Planck solution. Here we will apply the so-called cluster expansion method described in [KK11]. We refer to the Appendix B.1 for more details about this method.
Apart from the first approximation q α i q β j → q α i q β j , the next order of approximation is to keep track of the correlations This quantity is also called a centred second moment as for i = j it corresponds to the covariance of the probability distribution for the landmark i. In general, it corresponds to the correlation between the positions of two landmarks. The dynamical equation for this correlation is found from the equation of the second moment, which gives where T stands for the transpose of the previous term. This equation is interesting to study in more detail, as it already gives us information about the nature of the dynamics for the spatial covariance of landmarks. Indeed, we have three types of terms with the following effects.
(1) The σ l -dependent terms. This first term is quadratic in the σ's, not proportional to any linear or quadratic polynomial in q or p. This term is a direct contribution from the noise in the q equation and will have the effect of almost linearly increasing the centred covariance, wherever a σ l > 0.
(2) The h-dependent terms. From the form of this term, we expect it to be proportional to a correlation. It will thus have an exponential effect on the dynamics, triggered by the linear contribution of the first term. Notice that this term only depends on the Hamiltonian, and, thus, on the interaction between landmarks. If two landmarks interact, we expect their covariance to be averaged. This term will capture their averaged covariance. (3) The ∇ q σ l -dependent terms. These terms are related to the noise in the p equation and will account for the effect on the landmark position of the interaction of the momentum of the landmark with the gradients of the noise.
Notice that the last two types of terms describe second order effects with respect to the spatial covariance of the landmarks, as they depend linearly on the correlations. In the expansion of these nonlinear terms, the other correlations involving p will appear. This means that all of the possible second order correlations must be computed. This computation is done in Appendix B, where we also approximate the expected value of the kernels as K(q) ≈ K( q ). As we will see in the numerical examples in section 5, these approximations can give a reliable estimate of the landmark covariance, but this should be rigorously justified to obtain a precise estimate of the errors. Such a study is beyond the scope of this work and is left open.
Given the equations for the moment evolution, we can estimate the parameters θ by minimising the cost function where γ 1 and γ 2 are weights. We denote by q and ∆ 2 qq the target first and second moments and by q (T ) and ∆ 2 qq (T ) the estimated moments which implicitly depend on the parameters of the noise and the initial momentum. The choice of the norm is free here, and we chose a norm which only considers i = j and normalises each term to 1 so that all the covariance of the landmarks contribute equally to the cost. Other choices could be made, depending on applications. Also, the cost function may depend on other parameters, but this would make its minimisation more difficult.
To minimise the cost (4.8), we can use gradient based methods such as the BFGS algorithm. Such methods require the evaluation of the Jacobian of C with respect to all of its arguments. Usually, for the estimation of the initial momentum, a linear adjoint equation is used. However, the derivative with respect to the parameters of the noise cannot be computed in this way. We will evaluate the gradients symbolically by using the Theano library in Python [The16]. To improve the efficiency of the algorithm, we first match the mean final position, by only updating the initial momentum. Then, with this initial condition, we match for both first and second moments and update the initial momentum as well as the parameters λ l . As we will see in the numerical experiments in section 5, gradient-based methods are not optimal, and genetic algorithms, such as the differential evolution algorithm of [SP97] designed for global minimizations, turn out to perform better. 4.3. Maximum Likelihood and Expectation-Maximization. We now describe how to estimate the unknown parameters collected in the variable θ by a maximum likelihood estimation based on the expectation-maximisation (EM) algorithm of [DLR77]. The likelihood of n independent observations (q 1 , . . . , q n ) at time T given parameters θ takes the form (4.9) The parameters θ can be estimated by maximizing the likelihood, i.e. lettinĝ θ ∈ argmax θ L(θ; q 1 , . . . , q n ) .
For this, the likelihood could be directly computed by numerical approximation of P θ (q i , T ) using an approximation of the Fokker-Planck equation (3.9). Alternatively, the fact that the stochastic process is only sampled at time T suggests a missing data approach that marginalises out the unobserved trajectories up to time T . Let (q, p; θ) denote the stochastic landmark process with parameters θ, and let P (q, p; θ) denote its law. Let L(q, p; θ) denotes the likelihood of the entire stochastic path for a given realisation of the noise, and computed with respect to the parameter θ. Notice that this likelihood is only defined for finite time discretizations of the process and there is no notion of path density for the infinite dimensional process. We thus proceed formally, while noting that the approach can be justified rigorously, see e.g. [DS08].
The EM algorithm finds a sequence of parameter estimates {θ l } converging to â θ by iterating over the following two steps: (1) Expectation: Compute the expected value of the log-likelihood given the previous parameter estimate θ l−1 : (4.10) The expectation (4.10) over the process conditioned on the observations q i integrates the likelihood over all sample paths reaching q i . Expectation over conditioned diffusion processes can be approximated by Monte Carlo simulation of diffusion bridges, see for example [DH06,PR12,BFS16]. Here, we employ the bridge simulation approach developed in section 3.3. For each q i , we thus exchange (q t , p t ; θ) with a guided process (q,p; θ, q i ) and use the equality (3.17) from Theorem 3.4. The expectation on the right-hand side of (3.17) can be approximated by drawing samples from the guided process. Note that the correction factor ϕ(q, p|θ l−1 , q i ) makes the approach equal to importance sampling of the conditioned process with the guided process as proposal distribution.
(2) Maximisation: Find the new parameter estimate θ l = argmax θ Q(θ|θ l−1 ) . (4.11) The maximisation step can be approximated by updating θ l such that it increases Q(θ|θ l−1 ) instead of maximising it. This is the approach of the generalised EM algorithm [NH98]. The update of θ is thus computed by taking a gradient step where ǫ > 0. The gradient which is evaluated for each of the sampled paths can be computed symbolically using the Theano library of [The16].
The resulting estimation algorithm is listed in Algorithm 1. For each q i , the expectation E θ l−1 (log P θ (q, p|q i )) is estimated by sampling N bridges bridges. The algorithm can perform a fixed number L of updates to the estimate θ l or stop at convergence.

Numerical examples
We now present several numerical tests of the stochastic perturbation of the landmark dynamics. In particular, we want to illustrate aspects of the effect of the noise on the landmarks and test the algorithms for estimation of the spatial correlation of the noise. We will focus here on synthetic examples and refer to [AHPS17] for an application of the methods on a dataset of Corpora Callosa shapes represented by 77 landmarks. The numerical simulations of this work have been done in Python, using the symbolic computation framework Theano [The16]. The code is available from the public repository https://bitbucket.org/stefansommer/stochlandyn.

Solution of the Fokker-Planck equation.
We first consider a simple experiment with a single landmark, subjected a square array of noise fields with Gaussian noise kernel. To a first order approximation, the mean trajectory of the landmark is a straight line with constant momenta as the Hamiltonian is a pure kinetic energy. This experiment is displayed in Figure 2(a) where we used two arrays of four by four noise fields with either λ l = (0.08, 0) or λ l = (0, 0.08) and three values of the noise radius r l = 0.5, 0.05, 0.03. For large values of r l , the noise is mostly uniform and the gradients of the σ l are negligible. The only term contributing to the final covariance of the landmark is therefore σ α l (q i )σ β l (q j ) . This term only has a linear effect on the covariance which is thus an ellipse proportional to the noise fields. Here the noise has equal strength in both the x and y coordinate thus we observe a circle. For smaller values of r l , the gradient of σ l is large enough for the other term in the momentum equation which couple the momentum and the gradient of σ to affect the moment dynamics. This effect is shown in Figure 2(a) where the covariance has a larger value in the direction of the gradient of σ l than in the other directions. This is explained by the fact that this coupling is of the form ∂ ∂q i (σ l (q i ) · p i ), thus the ellipse is in the direction of the gradient, not the momenta. Notice that there should be some noise in the direction of the momenta for this term to have an effect. Using the same experiment, we compared the estimation of the covariance from the moment equation with a direct sampling obtained by solving the stochastic landmarks equations. We did this experiment for r l = 0.5 and r l = 0.03 in Figure 2(b),2(c). The left panel with r l = 0.5 shows an excellent agreement between the two methods but the right panel with r l = 0.03 shows differences. This type of error in the estimation of the covariance is explained by the fact that the final distribution has a large skewness. This effect is not captured by the moment equations as we neglected the effects of order higher than 2, and the skewness is a third order effect described by terms such as ∆ 3 q α i q β j q γ k . Nevertheless, the final covariance is close enough to the correct one to be able to use it in the estimation of the noise fields. This demonstrates that even in rather extreme cases, which are not realistic for applications, the second order approximation used to derive the moment equation still produces reliable results.  We did a similar experiment but with 5 interacting landmarks arranged in an ellipse configuration and with initial conditions obtained from the deterministic shooting method such that the endpoint of the deterministic landmark equations match another ellipse. We display these experiments in Figure 3 with the same noise as in the previous tests and with r l = 0.2. We modified here the landmark interaction length scale α from α = 0.02 (no-interactions) to α = 0.2 (neighbours interactions) to see the effect of the noise with the landmark interactions. Due to the different length scales, the trajectories to the target ellipse are slightly different so the landmarks will be subject to different noise. The larger length scale has the effect of reducing the differences between the covariances of interacting landmarks.

Bridge Sampling.
Here, we aim at visualising the effect of the constructed bridge sampling scheme. In Figure 4, the effect of the guiding term is visualized on a sample path. At t = T /2, the predicted endpoint φ t,T (q(t),p(t)) is calculated and the difference φ t,T (q,p) − v is used to guide the evolution of the path towards the target v. The guiding term ensures thatq will hit v almost surely at time T . Notice that the difference φ t,T (q,p)−v is generally much smaller than the differencê q − v. The introduction of φ t,T therefore implies that the process is modified less giving more likely bridges. Without φ t,T , the process is generally attracted to quickly towards the target as can be seen by the landmarks at t = 0.5 being almost at their final positions in Figure 4 (b). The path thus overshoots the target. This effect is not present when using φ t,T in Figure 4 (a).
(a) guided process using φ t,T (b) guided process without φ t,T Figure 4. (a) Visualization of the process (3.15). From the initial landmark configuration q(0) (blue crosses), the target v (blue dots) is hit using the modified process (q,p) (black lines:q). At time t = T /2, φ t,T (q(t),p(t)) is calculated (green dots) and the process is guided by −(T − t) −1 ΣΣ † q (φ t,T (q,p) − v) (q part: green arrows, length doubled for visualization). The use of φ t,T implies small guiding and high probability sample bridges. (b) Similar setup but using the guiding term −(T − t) −1 ΣΣ † q (q − v) without φ t,T . The momentum couples with the guiding term, and, intuitively, the path travels too fast towards the target (q at t = T /2 much closer than halfway towards v) and overshoots. This effect gives low probability sample bridges and the guiding term (green arrows) is much larger than in (a).

5.3.
Estimating the noise amplitudes. We here aim at estimating the noise amplitude from sampled data using both the method of moments and maximum likelihood.
We first use the genetic algorithm of [SP97] called differential evolution algorithm to minimise the cost function C in (4.8). This algorithm has in experiments proven successful in avoiding local minima during the optimisation. We compared it with the standard BFGS gradient descent algorithm with a single landmark in Figure 5. This algorithm relies on the Jacobian of the cost functional computed symbolically using the Theano package of [The16]. It is able to estimate the noise amplitude along the trajectory of the landmark where the signal from the gradient of C is  the strongest. For the other regions of the image, the algorithm cannot detect any signal to update the noise fields. The genetic algorithm can overcome this issue as it is based on evolving a population of solution which randomly spans the entire parameter space. In this way, the solution obtained is a better approximation of the global minimum of C. It is interesting that even if the final moment of figure 5 are well matched with the genetic algorithm, but the noise amplitude is not perfectly recovered. This illustrates the expected degeneracy of this model for a low number of landmarks. When more landmarks are added, the noise amplitude estimation is closer to the expected one, see figure 7 below.
In Figure 6, the same experiment is performed with MLE and the bridge sampling scheme. The noise kernels are in this experiment cubic B-splines placed in a grid providing a partition of unity. In the optimisation, λ l are fixed to be equal for all l = 1, . . . , J implying that the total noise variance will be uniform at each point of the domain. The figure shows the experiment performed with low momentum (Figure 6 (a)) and high momentum (Figure 6 (b)). In the low momentum case, the noise parameters are estimated correctly and the sample covariance with the estimated parameters matches the covariance of the original samples. The SDE (3.15) is here used for the bridge sampling scheme. In contrast to the previous method, the algorithm is now optimising for the maximum likelihood of the samples and not directly for matching the final covariance. A higher difference in the endpoint covariance is, therefore, to be expected.
With higher initial momentum, the coupling between the guidance and noise makes the scheme (3.15) overestimate the variance. Instead, the guidance term (3.16) is used. Notice that even though the sample covariance with the estimated parameters matches the covariance of the original samples, the estimated λ l are   Figure 9 show the result of noise estimation using different configurations of the ellipse using both the method of moments and MLE. The noise parameters λ l are allowed to vary with l in both cases giving spatially non-uniform noise amplitude. The algorithms find the correct noise parameters in the areas covered by the landmark trajectories.

Discussion and Outlook
As the first topic of this work, we attempted to answer the question of how to include stochasticity and uncertainty in the framework of large deformation matching in a systematic and geometrically consistent way. In section 2, we exposed a general theory of stochastic deformations in the LDDMM framework, based on the momentum map representation of images of [BGBHR11], by introducing spatially correlated time-dependent noise in the reconstruction relation used to compute the deformation map from its velocity field. By doing so, most of the advantages of the theory of reduction by symmetry remain, in particular, the possibility of applying this generic stochastic model to many data structures. The general dynamical equation is the stochastic EPDiff equation, and the noise appears in a particular  multiplicative form with spatial correlation encoded in a set of spatially dependent functions σ l . The key feature of this noise is that it preserves the structure of the original equation provided by the reduction by symmetry and in particular the momentum map allowing for both exact and inexact matching.
The question of local in time existence and uniqueness of this equation is important but not treated in this work. We refer to [BFM16] for such a study for the 2d Euler equation and to [CFH17] for the 3d case. Another possible extension is to consider an infinite number of σ l fields with an infinite dimensional Wiener process for the stochastic EPDiff equation as investigated by [Via13], also in the context of stochastic shape analysis. We considered time-independent σ l fields. However, there are several approaches for making these fields time dependent beside simply prescribing them as functions of time. Some of these other approaches were derived by [GBH17] in the context of stochastic fluid dynamics. In particular, the idea of having the noise fields being carried by the deformation could be of interest in this context as well. Yet another possibility could be to have two different types of noise fields, one modelling small-scale noise correlation and the other larger scale noise correlations. In this case, it would make sense for the small scale variability to be advected by the large-scale deformation, similarly to the multi-scale model of [HT12].
After defining the general model in section 2, we applied it to exact landmark matching in section 3, which is the simplest, yet non-trivial application of the LD-DMM framework. This approach allowed investigation of the effects of the noise on large deformation matching in a finite dimensional model. Introducing the noise in both the momentum and the position equations of the landmarks made the landmark trajectories rougher than they would have been, had the noise been only in the momentum equation. The noise in the position equation also increased the flexibility for controlling the landmark trajectories. This property was used to derive a scheme for simulating diffusion bridges with corresponding sampling correction factor that allowed evaluation of expectations with respect to the original conditioned landmark dynamics. In addition, we used the finite dimensionality of the system to derive the Fokker-Planck equation.
Some modifications to the standard theory of diffusion bridges were made to fit the case of landmark dynamics and to improve the speed and accuracy of the estimation of expectations over conditioned landmarks trajectories. The landmarks represent the simplest cases for numerical shape analysis, especially in the context of stochastic systems. We used a simple Heun method to solve the stochastic landmark equations. Higher order integration schemes could be used such as the stochastic variational integrators of [HT16a]. The next step in extending the landmark example is to allow for inexact matching and to study the trade-off between the effect of noise and the tolerance of the matching. Several issues regarding ergodicity and other properties of the Kolmogorov operator were left open in this paper, whose future treatments could add to the theoretical understanding of the model. Finally, the stochastic LDDMM framework can be applied to other types of data structures, in particular to images with inexact matching as originally done in [BMTY05]. Studying the effects of the stochastic model on other nonlinear data structures such as curves or surfaces would also be of great interest for future works.
As a second topic, we raised the question of determining the noise correlation from data sets which would allow the theory of stochastic deformations to be used with observed data. We developed two independent methods which we implemented and applied to several test examples. First, the moment equation allows matching of the sample moments. It is deterministic, making optimisation of the noise parameters stable and efficient, and it does not require special conditions on the noise fields. Its accuracy depends on the approximation order in the moment equation. Scaling the moment equation to a large number of landmarks or continuous shapes such as curves may be challenging as well as optimising for a high number of unknown parameters. In the presented landmark experiments, the approach allowed us to reliably estimate the underlying noise.
The second method is the MLE optimisation, a Monte Carlo method which evaluates expectations over conditioned stochastic trajectories. The bridge sampling scheme we used requires the noise fields to span the entire q-space to allow guiding the landmarks towards their target. With high nonlinearity as may happen with large initial momentum and high gradients of the noise fields, guiding the trajectories towards their target with high-probability bridges can be challenging. In general, the stochastic nature of the algorithm makes it harder to control than the matching provided by the moment equation. The bridge sampling scheme can be interpreted as a gradient flow, as discussed in [AHPS17], when applied to images. It allows the likelihood of observed images to be evaluated without a prior image registration step. The method may thus be applicable to image analysis problems, and more generally for inexact matching of shapes in which case the requirement of the noise to span the q-space may be relaxed.
The inference of noise parameters treated here can be extended to more general statistical inference problems on shape spaces. Inferring the initial q 0 positions can be regarded as estimating a most-likely mean, thereby drawing similarities to the Frechét mean [Fré48] and to means defined by maximum likelihood of probability distributions in nonlinear spaces [Som15]. When generalised to images, the approach can be used for simultaneous estimation of template images [JDJG04], possible time-dependent transformations in the momentum as caused for example by disease processes [MF12], and population variation in the spatial noise correlation. We let the function φ t,T : R 2dN → R dN be the solution of the deterministic landmark dynamics (2.3) at time T and use the notation of Theorem 3.4. Because the drift b(q,p, t) in (3.6) is unbounded, we letb be an approximation of b so that the q-partb q is bounded on R dN andb is in the range of Σ(q, p) for all (q, p). We letφ t,T be given by the corresponding deterministic solution usingb in 2.3. Then ∂ tφt,T (q,p) = −b(q,p, t) is bounded and the process differs from the SDE (3.15) by As argued for [Mar11], A.1 has a unique solution satisfying lim t→Tq = v a.s., and the processes A.1 and (3.15) are absolutely continuous with respect to each other.
In fact, the correction term ϕ(q, p) can be derived from [Mar11, Theorem 3] and the difference A.2. Instead, we below give a direct derivation in the landmark case that proves Theorem 3.4.
Appendix B. Moment equation for stochastic landmark B.1. Cluster expansion method. We explain the basics of this method, which can be find in more details in, for example, [KK11] with application in the context of semiconductor physics. This method is used when one seek the dynamics of the expected value of N particles that we will write here N . One cannot solve the complete system, especially if the number of particle is large, thus we want to approximate the expected value of products in term of only a few independent variables. For this we apply the cluster expansion, which begin by writting 2 = 2 s + ∆ 2 2 := 1 1 + ∆ 2 2 , (B.1) The next decomposition is 3 = 3 s + 1 ∆ 2 2 + ∆ 3 3 , (B.2) and so on and so forth. We then only compute the dynamics for the singlets 1 and the correlations, up to some chosen order. In the sequel, we will only consider the doublet correlations ∆ 2 , and in this case we have the general decomposition N = N s + N − 2 s ∆ 2 2 + N − 4 s ∆ 2 2 2 + i N − 2i s ∆ i 2 2 + O(∆ 3 ) .
In the context of quantum mechanics, where the particle operators do not commute, extra care is needed especially for the sign of the term. Here we will consider q α i and p α i as our particles, and as they commute, the expansions are simpler than in [KK11]. We directly compute two of them for illustration, up to quadratic order, q α i p β j = q α i p β j + ∆ 2 q α i p β j q α i q β j p γ k ≈ q α i q β j p γ k + q α i ∆ 2 q β j p γ k + q β j ∆ 2 q α i p γ k + p γ k ∆ 2 q α i q β j . This sort of expansion can fit a more geometrical framework, where the final equations for the first moment will preserve the original structure of the equations. This was developed first in [HLS90] and later in [HPT07,HPT10]. We will not use this method here for a good reason related to the form of the equations. A key step in these papers is to expand the expected value of the Hamiltonian in terms of a finite number of moments, to enable computation of the equation of motion. In our case, the Hamiltonian has a kernel function, which generally cannot be expanded in a finite sum of polynomial terms. By doing the computations directly, we will be able to do another approximation for the kernels, that is, we will assume that they commute with the operation of expectation. A more subtle approximation can be done using the heaviside function, but will give a much larger number of terms in the expansion, see appendix B.6. B.3. qq correlation. Recall the formula of the Kolmogorov operator applied to q α i q β j , L (q α i q β j ) = q α i ∂h ∂p β j + l σ α l (q i )σ β l (q j ) + 1 2 l q α i σ γ l (q j ) ∂σ β l (q j ) ∂q γ j + T , (B.6) which together with (4.5) gives the time evolution of the position correlation in the form We will denote by A the terms corresponding to the drift, by B the terms which are not present in the first moments equation, and by C the other terms which only depend on the noise and the derivative of the noise fields. We proceed by first approximating the expectation of the kernels to get B qq ≈ l σ α l ( q i )σ β l ( q j ) C qq ≈ − 1 2α 2 l l,γ ∆ 2 q α i q γ j σ γ l ( q j )σ β l ( q j ) + T .
where we also used the explicit form of σ l as a Gaussian and its derivative. We will now approximate the A qq term to get It is now clear that the B term will linearly increase the position correlation, which will then exponentially increase by the C term and be affected by the momentumposition correlation by the A term. We now proceed by computing the momentum correlation.
B.4. pp correlation. We compute the Kolmogorov operator on p α We proceed with the B pp term, which is also symmetric under the transpose operation, thus giving the approximation