# Diffeomorphic Random Sampling Using Optimal Information Transport

## Abstract

In this article we explore an algorithm for diffeomorphic random sampling of nonuniform probability distributions on Riemannian manifolds. The algorithm is based on *optimal information transport* (OIT)—an analogue of optimal mass transport (OMT). Our framework uses the deep geometric connections between the Fisher-Rao metric on the space of probability densities and the right-invariant *information metric* on the group of diffeomorphisms. The resulting sampling algorithm is a promising alternative to OMT, in particular as our formulation is semi-explicit, free of the nonlinear Monge–Ampere equation. Compared to Markov Chain Monte Carlo methods, we expect our algorithm to stand up well when a large number of samples from a low dimensional nonuniform distribution is needed.

## Keywords

Density matching Information geometry Fisher–Rao metric Optimal transport Image registration Diffeomorphism groups Random sampling## MSC2010

58E50 49Q10 58E10## 1 Introduction

We construct algorithms for random sampling, addressing the following problem.

### Problem 1

Let \(\mu \) be a probability distribution on a manifold *M*. Generate *N* random samples from \(\mu \).

The classic approach to sample from a probability distribution on a higher dimensional space is to use Markov Chain Monte Carlo (MCMC) methods, for example the Metropolis–Hastings algorithm [6]. An alternative idea is to use diffeomorphic density matching between the density \(\mu \) and a standard density \(\mu _0\) from which samples can be drawn easily. Standard samples are then transformed by the diffeomorphism to generate non-uniform samples. In Bayesian inference, for example, the distribution \(\mu \) would be the posterior distribution and \(\mu _0\) would be the prior distribution. In case the prior itself is hard to sample from the uniform distribution can be used. For *M* being a subset of the real line, the standard approach is to use the cumulative distribution function to define the diffeomorphic transformation. If, however, the dimension of *M* is greater then one there is no obvious change of variables to transform the samples to the distribution of the prior. We are thus led to the following matching problem.

### Problem 2

*M*, find a diffeomorpism \(\varphi \) such that

*M*from which samples can be drawn, and \(\varphi _*\) is the push-forward of \(\varphi \) acting on densities,

*i.e.*,

A benefit of transport-based methods over traditional MCMC methods is cheap computation of additional samples; it amounts to drawing uniform samples and then evaluating the transformation. On the other hand, transport-based methods scale poorly with increasing dimensionality of *M*, contrary to MCMC.

The action of the diffeomorphism group on the space of smooth probability densities is transitive (Moser’s lemma [13]), so existence of a solution to Problem 2 is guaranteed. However, if the dimension of *M* is greater then one, there is an infinite-dimensional space of solutions. Thus, one needs to select a specific diffeomorphism within the set of all solutions. Moselhy and Marzouk [12] and Reich [15] proposed to use optimal mass transport (OMT) to construct the desired diffeomorphism \(\varphi \), thereby enforcing \(\varphi = \nabla c\) for some convex function *c*. The OMT approach implies solving, in one form or another, the heavily non-linear Monge–Ampere equation for *c*. A survey of the OMT approach to random sampling is given by Marzouk *et al.* [9].

In this article we pursue an alternative approach for diffeomorphic based random sampling, replacing OMT by *optimal information transport* (OIT), which is diffeomorphic transport based on the Fisher–Rao geometry [11]. Building on deep geometric connections between the Fisher–Rao metric on the space of probability densities and the right-invariant *information metric* on the group of diffeomorphisms [7, 11], we developed in [3] an efficient numerical method for density matching. The efficiency stems from a solution formula for \(\varphi \) that is explicit up to inversion of the Laplace operator, thus avoiding the solution of nonlinear PDE such as Monge–Ampere. In this paper we explore this method for random sampling (the initial motivation in [3] is medical imaging, although other applications, including random sampling, are also suggested). The resulting algorithm is implemented in a short MATLAB code, available under MIT license at https://github.com/kmodin/oit-random.

## 2 Density Transport Problems

*M*be an

*d*–dimensional orientable, compact manifold equipped with a Riemannian metric \(g=\langle .,.\rangle \). The volume density induced by

*g*is denoted \(\mu _0\) and without loss of generality we assume that the total volume of

*M*with respect to \(\mu _0\) is one,

*i.e.*, \(\int _M \mu _0=1\). Furthermore, the space of smooth probability densities on

*M*is given by

*d*-forms. The group of smooth diffeomorphisms \({\text {Diff}}(M)\) acts on the space of probability densities via push-forward:

In the following we will focus our attention on the diffeomorphic density matching problem (Problem 2). A common approach to overcome the non-uniqueness in the solution is to add a regularization term to the problem. That is, to search for a minimum energy solution that has the required matching property, for some energy functional *E* on the diffeomorphism group. Following ideas from mathematical shape analysis [10] it is a natural approach to define this energy functional using the geodesic distance function \({\text {dist}}\) of a Riemannian metric on the diffeomorphism group. Then the regularized diffeomorphic matching problem can be written as follows.

### Problem 3

*optimal mass transport*(OMT), which induces the Wasserstein \(L^2\) distance on \(\mathrm {Prob}(M)\), see, for example, [8, 14, 16].

*M*. Because of the Hodge decomposition theorem, \(G^{I}\) is independent of the choice of orthonormal basis \(\xi _1,\ldots ,\xi _k\) for the harmonic vector fields. This construction is related to the Fisher-Rao metric on the space of probability density [2, 4], which is predominant in the field of information geometry [1]. We call \(G^{I}\) the

*information metric*. See [3, 7, 11] for more information on the underlying geometry.

The connection between the information metric and the Fisher-Rao metric allows us to construct almost explicit solutions formulas for Problem 2 using the explicit formulas for the geodesics of the Fisher-Rao metric.

### Theorem 1

**.**Let \(\mu \in \mathrm {Prob}(M)\) be a smooth probability density. The diffeomorphism \(\varphi \in \mathrm {Diff}(M)\) minimizing \({\text {dist}}_{G^I}({\text {id}},\varphi )\) under the constraint \(\varphi _*\mu _0=\mu \) is given by \(\varphi (1)\), where \(\varphi (t)\) is obtained as the solution to the problem

The algorithm for diffeomorphic random sampling, described in the following section, is directly based on solving the Eq. (8).

## 3 Numerical Algorithm

In this section we explain the algorithm for random sampling using optimal information transport. It is a direct adaptation of [3, Algorithm 1] (If needed, one may also compute the inverse by \(\varphi _{k+1}^{-1} = \varphi _k^{-1} + \varepsilon v\circ \varphi _k^{-1}\).).

*N*random samples \(y_1,\ldots ,y_N\) from the distribution \(\mu \). One can save \(\varphi _K\) and repeat 8–9 whenever additional samples are needed.

The computationally most intensive part of the algorithm is the solution of Poisson’s equation at each time step. Notice, however, that we do not need to solve nonlinear equations, such as Monge–Ampere, as is necessary in OMT.

## 4 Example

We draw \(10^5\) samples from this distribution using a MATLAB implementation of our algorithm, available under MIT license at https://github.com/kmodin/oit-random.

The implementation can be summarized as follows. To solve the lifting Eq. (8) we discretize the torus by a \(256\times 256\) mesh and use the fast Fourier transform (FFT) to invert the Laplacian. We use 100 time steps. The resulting diffeomorphism is shown as a mesh warp in Fig. 2. We then draw \(10^5\) uniform samples on \([-\pi ,\pi ]^2\) and apply the diffeomorphism on each sample (applying the diffeomorphism corresponds to interpolation on the warped mesh). The resulting random samples are depicted in Fig. 1(right). To draw new samples is very efficient. For example, another \(10^7\) samples can be drawn in less than a second on a standard laptop.

## 5 Conclusions

In this paper we explore random sampling based on the optimal information transport algorithm developed in [3]. Given the semi-explicit nature of the algorithm, we expect it to be an efficient competitor to existing methods, especially for drawing a large number of samples from a low dimensional manifold. However, a detailed comparison with other methods, including MCMC methods, is outside the scope of this paper and left for future work.

We provide an example of a complicated distribution on the flat 2-torus. It is straighforward to extended the method to more elaborate manifolds, *e.g.*, by using finite element methods for Poisson’s equation on manifolds. For non-compact manifolds, most importantly \(\mathbb {R}^n\), one might use standard techniques, such as Box–Muller, to first transform the required distribution to a compact domain.

## References

- 1.Amari, S., Nagaoka, H.: Methods of information geometry. American Mathematical Society, Providence (2000)MATHGoogle Scholar
- 2.Bauer, M., Bruveris, M., Michor, P.W.: Uniqueness of the Fisher-Rao metric on the space of smooth densities. Bull. Lond. Math. Soc.
**48**(3), 499–506 (2016)CrossRefMATHMathSciNetGoogle Scholar - 3.Bauer, M., Joshi, S., Modin, K.: Diffeomorphic density matching by optimal information transport. SIAM J. Imaging Sci.
**8**(3), 1718–1751 (2015)CrossRefMATHMathSciNetGoogle Scholar - 4.Friedrich, T.: Die Fisher-information und symplektische strukturen. Math. Nachr.
**153**(1), 273–296 (1991)CrossRefMATHMathSciNetGoogle Scholar - 5.Hamilton, R.S.: The inverse function theorem of Nash and Moser. Bull. Am. Math. Soc. (N.S.)
**7**(1), 65–222 (1982)CrossRefMATHMathSciNetGoogle Scholar - 6.Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika
**57**(1), 97–109 (1970)CrossRefMATHMathSciNetGoogle Scholar - 7.Khesin, B., Lenells, J., Misiołek, G., Preston, S.C.: Geometry of diffeomorphism groups, complete integrability and geometric statistics. Geom. Funct. Anal.
**23**(1), 334–366 (2013)CrossRefMATHMathSciNetGoogle Scholar - 8.Khesin, B., Wendt, R.: The Geometry of Infinite-dimensional Groups. A Series of Modern Surveys in Mathematics, vol. 51. Springer, Berlin (2009)MATHGoogle Scholar
- 9.Marzouk, Y., Moselhy, T., Parno, M., Spantini, A.: Sampling via measure transport: An introduction. In: Ghanem, R., Higdon, D., Owhadi, H. (eds.) Handbook of Uncertainty Quantification, pp. 1–14. Springer International Publishing, Cham (2016). doi: 10.1007/978-3-319-11259-6_23-1 Google Scholar
- 10.Miller, M.I., Trouvé, A., Younes, L.: On the metrics and euler-lagrange equations of computational anatomy. Annu. Rev. Biomed. Eng.
**4**, 375–405 (2002)CrossRefGoogle Scholar - 11.Modin, K.: Generalized Hunter-Saxton equations, optimal information transport, and factorization of diffeomorphisms. J. Geom. Anal.
**25**(2), 1306–1334 (2015)CrossRefMATHMathSciNetGoogle Scholar - 12.Moselhy, T.A.E., Marzouk, Y.M.: Bayesian inference with optimal maps. J. Comput. Phys.
**231**(23), 7815–7850 (2012)CrossRefMATHMathSciNetGoogle Scholar - 13.Moser, J.: On the volume elements on a manifold. Trans. Am. Math. Soc.
**120**, 286–294 (1965)CrossRefMATHMathSciNetGoogle Scholar - 14.Otto, F.: The geometry of dissipative evolution equations: the porous medium equation. Comm. Partial Diff. Eqn.
**26**(1–2), 101–174 (2001)CrossRefMATHMathSciNetGoogle Scholar - 15.Reich, S.: A nonparametric ensemble transform method for Bayesian inference. SIAM J. Sci. Comput.
**35**(4), A2013–A2024 (2013)CrossRefMATHMathSciNetGoogle Scholar - 16.Villani, C.: Optimal Transport: Old and New, Grundlehren der Mathematischen Wissenschaften, vol. 338. Springer, Berlin (2009)Google Scholar