Abstract
Combining the classical theory of optimal transport with modern operator splitting techniques, we develop a new numerical method for nonlinear, nonlocal partial differential equations, arising in models of porous media, materials science, and biological swarming. Our method proceeds as follows: first, we discretize in time, either via the classical JKO scheme or via a novel Crank–Nicolsontype method we introduce. Next, we use the Benamou–Brenier dynamical characterization of the Wasserstein distance to reduce computing the solution of the discrete time equations to solving fully discrete minimization problems, with strictly convex objective functions and linear constraints. Third, we compute the minimizers by applying a recently introduced, provably convergent primal dual splitting scheme for three operators (Yan in J Sci Comput 1–20, 2018). By leveraging the PDEs’ underlying variational structure, our method overcomes stability issues present in previous numerical work built on explicit time discretizations, which suffer due to the equations’ strong nonlinearities and degeneracies. Our method is also naturally positivity and mass preserving and, in the case of the JKO scheme, energy decreasing. We prove that minimizers of the fully discrete problem converge to minimizers of the spatially continuous, discrete time problem as the spatial discretization is refined. We conclude with simulations of nonlinear PDEs and Wasserstein geodesics in one and two dimensions that illustrate the key properties of our approach, including higherorder convergence our novel Crank–Nicolsontype method, when compared to the classical JKO method.
Introduction
Gradient flow methods are classical techniques for the analysis and numerical simulation of partial differential equations. Historically, such methods were exclusively based on gradient flows arising from a Hilbert space structure, particularly \(L^2({{\mathbb {R}}^d})\), but since the work of Jordan, Kinderlehrer, and Otto in the late 90’s [75, 93, 94], interest has emerged in a range of nonlinear, nonlocal partial differential equations that are gradient flows in the Wasserstein metric,
When \(\varOmega \ne {{\mathbb {R}}^d}\), we consider noflux boundary conditions.
Equations of this form arise in a number of physical and biological applications, including models in granular media [12, 45, 46, 102], material science [71], and biological swarming [6, 39, 77]. Furthermore, many wellknown equations may be written in this way: when \(V = W = 0\) and \(\alpha =1\), Eq. (1) reduces to the heat equation (\(m=1\)), porous medium equation (\(m>1\)), and fast diffusion equation (\(m<1\)) [103]. In the presence of a drift potential V, it becomes a Fokker–Planck equation (\(m=1\)) or nonlinear Fokker–Planck equation (\(m>1\)), as used in models of tumor growth [96, 100]. When the interaction potential W is given by a repulsive–attractive Morse or powerlaw potential,
we recover a range of nonlocal interaction models, which are repulsive at short length scales and attractive at long length scales [4, 5, 34, 101]. When \(W = (\varDelta )^{1}\), the Newtonian potential, we have the Keller–Segel equation and its nonlinear diffusion variants [17, 19, 25, 26, 32, 41, 76]. Finally, as the diffusion exponent \(m \rightarrow +\infty \), we recover congested aggregation and drift equations arising in models of pedestrian crowd dynamics and shape optimization problems [23, 58, 67, 84, 90, 91].
In order to describe the gradient flow structure of equation (1), we begin by rewriting it as a continuity equation in \(\rho (x,t)\) for a velocity field v(x, t),
In this form, two key properties of the equation become evident: it is positivity preserving and conserves mass. In what follows, we will always consider nonnegative initial data, and we will typically renormalize so that the mass of the initial data equals one, i.e., \(\rho _0 \in {{\mathcal {P}}}_{ac}(\varOmega )\), where \( {{\mathcal {P}}}_{ac}(\varOmega )\) is the set of probability measures on \(\varOmega \) that are absolutely continuous with respect to Lebesgue measure. Furthermore, as our objective is to develop a numerical method for these equations, we will exclusively consider the case when \(\varOmega \) is a bounded domain. Throughout, we commit a mild abuse of notation and identify all such probability measures with their densities, \(d \rho (x) = \rho (x) \mathrm {d}x\).
As discovered by Otto [93], given an energy \({\mathcal {E}}: {{\mathcal {P}}}_{ac}(\varOmega ) \rightarrow {\mathbb {R}}\cup \{ +\infty \}\), we may formally define its gradient with respect to the Wasserstein metric \(d_{\mathcal {W}}\) using the formula
(See Sect. 2.1 for the definition of the Wasserstein metric \(d_{\mathcal {W}}\).) In this way, gradient flows of \({\mathcal {E}}\), \(\partial _t \rho =  \nabla _{d_{\mathcal {W}}} {\mathcal {E}}(\rho )\), correspond to solutions of the continuity equation with velocity \(v =  \nabla \frac{\delta {\mathcal {E}}}{ \delta \rho }\). In particular, Eq. (3) is the gradient flow of the energy
Differentiating the energy (4) along solutions of (3), one formally obtains that the energy is decreasing along the gradient flow
which coincides with the theoretical interpretation of gradient flows as solutions that evolve in the direction of steepest descent of an energy, where the notion of steepest descent is induced by the Wasserstein metric structure.
A key feature of equations of the form (3) is the competition between repulsive and attractive effects. For repulsive–attractive interaction kernels W, as in equation (2), these effects can arise purely through nonlocal interactions, leading to rich structure of the steady states [4, 13, 14, 34, 65]. For purely attractive interaction kernels W, as in the Keller–Segel equation, the competition instead arises from the combination of nonlocal interaction with diffusion. In this case, different choices of interaction kernel W, diffusion exponent m, and initial data \(\rho _0\) can lead to widely different behavior—from bounded solutions being globally well posed to smooth solutions blowing up in finite time [17, 19, 25, 26, 32, 41].
Summary of Numerical Approach
The goal of the present work is to develop new numerical approach for partial differential equations of the form (1) that combine gradient flow methods with modern operator splitting techniques. Our approach applies to equations of this form with any combination of diffusion \(\alpha U'_m(\rho )\) \((\alpha \ge 0\)), drift V, or interaction \(W*\rho \) terms—in particular, it is not necessary for diffusion to be present in order for our scheme to converge.
The main idea of our approach is to discretize the PDE/Wasserstein gradient flow at two levels. First, we consider a time discretization of the gradient flow with time step \(\tau \) (see Fig. 1b), either given by the classical JKO scheme (Eq. (6) below) or a new Crank–Nicolson inspired variant (Eq. (7) below). This reduces computation of the gradient flow to solving a sequence of infinitedimensional minimization problems. Then, we consider a dynamical reformulation of these minimization problems, stemming from Benamou and Brenier’s dynamic characterization of the Wasserstein metric, by which the problem becomes the minimization of a strictly convex integral functional subject to a linear PDE constraint (see Fig. 1c). At this level, the problem remains continuous in space and time. We conclude by considering a further discretization of the problem, with inner time step \((\varDelta t)\) and spatial discretization \((\varDelta x)\), by taking piecewise constant approximations of the functions and using a finite difference approximation of the PDE constraint (see Fig. 1d). In this final, fully discrete form, we then compute the minimizer using modern operator splitting techniques, applying Yan’s recent extension of the classical primal dual algorithm for minimizing sums of three convex functions [106].
Our paper is organized as follows. In Sect. 1.2, we discuss the relationship between our numerical approach and previous work. In Sect. 1.3, we summarize our contribution. In Sect. 2, we describe the details of our numerical method. Along with numerically simulating Wasserstein gradient flows, our method also provides, as a special case, a new method for computing Wasserstein geodesics and the Wasserstein distance between probability densities; see Remark 1. In Sect. 3, we prove that, provided a smooth, positive solution of the continuum JKO scheme exists and the energy corresponding to the PDE is sufficiently regular, then minimizers of the fully discrete problem exist (Theorem 1), the objective functions of the discrete problems \(\varGamma \)converge to the objective function of the continuum problem (Theorem 2), and thus, solutions of the fully discrete scheme converge, up to a subsequence, to a solution of the continuum scheme (Theorem 3). As a special case, we also recover convergence of a numerical method for computing Wasserstein geodesics, similar to that introduced by Papadakis, Péyre, and Oudet [95]. Finally, in Sect. 4, we provide several numerical simulations illustrating our approach in both one and two dimensions, computing Wasserstein geodesics, nonlinear Fokker–Planck equations, aggregation diffusion equations, and other related equations.
Details of Approach and Comparison with Previous Work
Classical Numerical PDE Methods
We now compare our approach to existing numerical methods. Perhaps the most common numerical approach for equations of the form (1) is to consider the equation as an advection–diffusion equation and apply classical finite difference, finite volume, or Galerkin discretizations [3, 29, 54, 66, 85]. However, when such methods are based on explicit time discretizations, they suffer from stability constraints due either to the degeneracy of the diffusion (when \(m>1\)) or the nonlocality from the interaction potential W. (See for instance the mesa problem [83].) Implicit time discretizations, on the other hand, are computationally intensive, due to the difficulty of matrix inversion, even when the implicit steps are solved by smart iterative methods to avoid the high computation cost of convolution [3].
Another common approach is to leverage structural similarities between (3) and equations from fluid dynamics to develop particle methods [14, 27, 30, 36, 43, 48, 57, 60, 88, 92]. Until recently, the key limitation of such methods has been developing approaches to incorporate diffusion. Following the analogy with the Navier–Stokes equations, stochastic particle methods have been proposed in the case of linear diffusion (\(m=1\)) [72,73,74, 86]. More recently the first two authors and Patacchini developed a deterministic blob method for linear and nonlinear diffusion (\(m \ge 1\)) [31]. On the one hand, particle methods naturally conserve mass and positivity, and they can also be designed to respect the underlying gradient flow structure of the equation, including the energy dissipation property (5). On the other hand, a large number of particles are often required to resolve finer properties of solutions.
In contrast with such classical methods, our method introduces an auxiliary momentum variable m and an additional inner layer of time discretization, which enlarges the dimension of the problem. However, as later pointed out in [80], the inner layer of time can be discretized with just one step without violating the overall firstorder accuracy, there completely eliminating the additional cost introduced by the inner layer. Another major advantage of our approach is that, by reforming the PDE problem into an optimization problem, we obtain unconditional stability (for the JKO discretization, see Eq. (6) below) while avoiding the inversion of a full matrix in the general implicit setting, which is extremely expensive, especially in higher dimensions; see for instance [3]. Finally, compared to other implicit methods, such as the backward Euler method, the suboptimization problems can be solved independently at each gridpoint, and therefore are massively parallelizable and suitable for highdimensional problems.
Variational Methods
Compared to the classical numerical PDE approaches described in the previous section, a more modern class of numerical methods leverages the gradient flow structure of (1) to approximate solutions of the PDE by solving a sequence of minimization problems. This is the approach we take in the present work. Originally introduced by Jordan, Kinderlehrer, and Otto as a technique for computing solutions of the Fokker–Planck equation (Eq. (1), \(W=0\), \(m=1\)) [75], this scheme approximates the solution \(\rho (x,t)\) at time t by solving the following sequence of n minimization problems with time step \(\tau = t/n\),
The JKO scheme is precisely the analogue of the implicit Euler method in the infinitedimensional Wasserstein space. The constraint \(\rho \in {{\mathcal {P}}}_{ ac}(\varOmega )\) ensures that the method is positivity and mass preserving, and the fact that \(d_{\mathcal {W}}^2(\rho ,\rho ^n) \ge 0\) ensures the energy decreasing along the scheme, \({\mathcal {E}}(\rho ^{n+1}_\tau ) \le {\mathcal {E}}(\rho ^{n}_\tau )\).
Under sufficient assumptions on the underlying domain \(\varOmega \), drift potential V, interaction potential W, and initial data \(\rho _0\) (see Sect. 2.1), the solution of the JKO scheme \(\rho ^n_\tau \) converges to the solution \(\rho (x,t)\) of the partial differential equation (1), with a firstorder rate in terms of the time step \(\tau = t/n\) [2, Theorem 4.0.4],
In our numerical simulations, we observe that this discretization error dominates other errors in our numerical method; see Sects. 4.2.1 and 4.2.2. Consequently, we also introduce a new time discretization, in analogy with the Crank–Nicolson method
The connection between the above scheme and the classical Crank–Nicolson discretization can be seen by considering the optimality conditions for (7):
Like the JKO scheme, our Crank–Nicolson inspired method is also positivity and masspreserving, though it is not energy decreasing. In Figs. 7, 8, and 10 of our numerics section, we conduct a preliminary analysis of the rate of convergence of this method, which verifies that it is indeed higher order than the JKO scheme. As the goal of the present work is primarily the development of fully discrete numerical schemes, we leave a thorough analysis of the rate of convergence of our Crank–Nicolson inspired method as \(\tau \rightarrow 0\) to future work.
On the one hand, our Crank–Nicolson inspired method (7) is not the first higherorder method proposed for metric space gradient flows: Matthes and Plazotta developed a provably secondorder scheme for general metric space gradient flows by generalizing the backward differentiation formula [89]. The Matthes–Plazotta method, however, requires two evaluations of the Wasserstein distance at each outer time step and thus is less practical for our purpose of numerically computing gradient flows in higher dimensions. Another method was introduced by Legendre and Turinici [79] based on the midpoint method. This method can be reformulated as the classical JKO step with half time step followed by an extrapolation. This extrapolation step could be implemented by solving the corresponding continuity equation either explicitly or implicitly; however, solving the equation explicitly could potentially violate conservation of positivity, while solving it implicitly would require an additional matrix inversion. Another higherorder variational method was also proposed in [78], which resembles explicit Runge–Kutta methods and, again, require two or more evaluations of the Wasserstein distance at each outer time step.
Numerical Methods for the Wasserstein Distance
To use either the classical JKO scheme (6) or our new Crank–Nicolson inspired scheme (7) as a basis for numerical simulations, one must first develop a fully discrete approximation of the minimization problem at each step of the scheme. Here, the main numerical difficulty arises in approximating the Wassserstein distance, and there are several different approaches for dealing with this term. First, one can reformulate the Wasserstein distance in terms of a Monge–Ampére equation with nonstandard boundary conditions [11, 68], though difficulties arise due to the lack of a comparison principle [70]. Second, one can reframe the problem as a classical \(L^2({{\mathbb {R}}^d})\) gradient flow at the level of diffeomorphisms [16, 37, 47, 49, 69], but to pursue this approach, one has to overcome complications arising from the underlying geometry and the structure of the PDE system for the diffeomorphisms. Third, one can discretize the Wasserstein distance term as a finitedimensional linear program, overcoming the lack of strict convexity of the objective function by adding a small amount of entropic regularization [8, 55, 61]. (For a detailed survey of computational optimal transport, we refer the reader to the recent book by Péyre and Cuturi for [97].)
A fourth approach for computing the Wasserstein distance, and the one which we develop in the present work, is to consider a dynamic formulation due to Benamou and Brenier [7]. This reframes the problem as a strictly convex optimization problem with linear PDE constraints, which can be discretized using Benamou and Brenier’s original augmented Lagrangian method ALG2 or, more generally, a range of modern proximal splitting methods, as shown by Papadakis, Peyre, and Oudet [95]. (See also [21, 22] for related work on mean field games.) Adding an additional Fisher information term in this dynamic formulation (in analogy with entropic regularization) has also been explored in [82].
Only recently have these above approaches for computing the Wasserstein distance been integrated with the JKO scheme (6) in order to simulate partial differential equations of the form (1). The Monge–Ampére approach extends naturally, though the presence of a diffusion term \(\alpha U'_m(\rho )\) for \(\alpha >0\) is required to enforce convexity constraints at the discrete level [10]. Similarly, entropic regularization (or the addition of a Fisher information term) vastly accelerates the computation of gradient flows, but at the level of the partial differential equation, this corresponds to introducing numerical diffusion, which may disrupt the delicate balance between aggregation and diffusion inherent in PDEs of this type [28, 55, 82]. Finally, Benamou and Brenier’s dynamic reformulation of the Wasserstein distance has also been adopted in recent work to approximate gradient flows [9]. A key benefit of this latter approach when compared to entropic regularization is that it leads to an optimization problem in \(N_x^d \times N_t\) variables, where \(N_x\) and \(N_t\) are the number of spatial and temporal gridpoints, whereas the latter leads to an optimization problem in \(N_x^{2d}\) variables.
In the present work, we further develop this last approach, using Benamou and Brenier’s dynamic reformulation of the Wasserstein distance to simulate Wasserstein gradient flows, via both the classical JKO scheme (6) and our new Crank–Nicolson inspired scheme (7). This leads to a sequence of minimization problems (Fig. 1C), which we discretize (Fig. 1D) and then solve using a modern primal dual three operator splitting scheme due to Yan [106], instead of the classical ALG2 method. See Sect. 2 for a detailed description of our approach.
Due to the fact that we use operator splitting methods to compute the minimizer in Benamou and Brenier’s dynamic formulation of the Wasserstein distance, our work can be seen as an extension of previous work by Papadakis, Peyre, and Oudet [95], which applied similar two operator splitting schemes to simulate the Wasserstein distance. However, there are a few key differences between our approach and previous work. First, we are able to implement the primal dual splitting scheme in a manner that does not require matrix inversion of the finite difference operator, which reduces the computational cost. Second, we succeed in obtaining the exact expression for the proximal operator, which allows our method to be truly positivity preserving, while other similar methods are only positivity preserving in the limit as \(\varDelta x, \varDelta t \rightarrow 0\); see Remark 5. Third, instead of imposing the linear PDE constraint in Benamou and Brenier’s dynamic reformulation exactly, via a finite difference approximation, we allow the linear PDE constraint to hold up to an error of order \(\delta >0\), which can be tuned according to the spatial discretization \((\varDelta x)\), the inner temporal discretization \((\varDelta t)\), and the outer time step \(\tau \) to respect the order of accuracy of the finite difference approximation; see Remark 3. Numerically, this allows our method to converge in fewer iterations, without any reduction in accuracy, as demonstrated in Fig. 3. From a theoretical perspective, the fact that we only require the PDE constraint to hold up to an error of order \(\delta > 0\) makes it possible to prove convergence of minimizers of the fully discrete problem to minimizers of the JKO scheme (6), since minimizers of the fully discrete problem always exist for \(\delta >0\), which is not the case when the PDE constraint is enforced exactly (\(\delta = 0\)); see Remark 8 and Theorem 1.
Contribution
The main components of our numerical method for computing solutions to (1) are:

(a)
an outer time discretization, of either JKO (6) or Crank–Nicolson type (7) (Fig. 1B)

(b)
a dynamic interpretation of the Wasserstein distance (Fig. 1C), which when discretized via finite difference approximations leads to a sequence of constrained optimization problems (Fig. 1D)

(c)
an application of modern three operator splitting schemes for solving these optimization problems.
Our main contributions are:

Unlike classical explicit methods, our JKOtype method is unconditionally stable. Unlike classical implicit methods, it achieves this stability without an expensive matrix inversion.

In practice, we observe that our Crank–Nicolsontype method performs even better than our JKOtype method, in terms of rate of convergence with respect to the outer time step (see Figs. 7, 8, and 10). We leave a thorough analysis of the rate of this convergence of this method to future work.

By formulating our optimization problem with a linear inequality constraint instead of a linear equality constraint, our algorithm converges in fewer iterations when compared to related algorithms for Wasserstein geodesics; see Remark 3 and Fig. 3.

We prove convergence of our fully discrete method (Fig. 1D) to the JKO scheme (Fig. 1B, C) as the spatial discretization and inner time discretization go to zero.
Numerical Method
Dynamic Formulation of JKO Scheme
As described in the previous section, our numerical method for computing the JKO scheme is based on the following dynamic reformulation of the Wasserstein distance due to Benamou and Brenier [7]:
where \((\rho ,v)\in AC(0,1;{{\mathcal {P}}}(\varOmega ) )\times L^1(0,1; L^2(\rho ))\) belongs to the constraint set \({\mathcal {C}}_0\) provided that
where \(\nu \) is the outer unit normal on the boundary of the domain \(\varOmega \). A curve \(\rho \) in \({{\mathcal {P}}}(\varOmega )\) is absolutely continuous in time, denoted \(\rho \in AC(0,1;{{\mathcal {P}}}(\varOmega ) )\), if there exists \(w \in L^1(0,1)\) so that \(d_{\mathcal {W}}(\rho (\cdot , t_0), \rho (\cdot , t_1)) \le \int _{t_0}^{t_1} w(s) \mathrm {d}s\) for all \(0< t_0 \le t_1 < 1\). The PDE constraint (9 and 10) holds in the duality with smooth test functions on \({{\mathbb {R}}^d}\times [0,1]\), i.e., for all \(f \in C^\infty _c({{\mathbb {R}}^d}\times [0,1])\),
This dynamic reformulation reduces the problem of finding the Wasserstein distance between any two measures to identifying the curve in \({{\mathcal {P}}}(\varOmega )\) that connects them with minimal kinetic energy. However, the objective function (8) is not strictly convex, and the PDE constraint (9) is nonlinear. For these reasons, in Benamou and Brenier’s original work, they restrict their attention to the case \(\rho (\cdot , t) \in {{\mathcal {P}}}_{ac}(\varOmega )\) and introduce the momentum variables \(m = v \rho \), in order to rewrite (8) as
where
and \((\rho ,m) \in AC(0,1; {{\mathcal {P}}}_{ac}(\varOmega )) \times L^1(0,1;L^2(\rho ^{1}))\) belong to the constraint set \({\mathcal {C}}_1\) provided that
After this reformulation, the integral functional
is strictly convex along linear interpolations and lower semicontinuous with respect to weak* convergence [1, Example 2.36], and the PDE constraint is linear. As an immediate consequence, one can conclude that minimizers are unique. Furthermore, for any \(\rho _0, \rho _1 \in {{\mathcal {P}}}_{ac}(\varOmega )\), a direct computation shows that the minimizer \(({{\bar{\rho }}}, {{\bar{m}}})\) is given by the Wasserstein geodesic from \(\rho _0\) to \(\rho _1\),
where T is the optimal transport map from \(\rho _0\) to \(\rho _1\). (See [2, 98, 105] for further background on optimal transport.) Consequently, given any minimizer \(({{\bar{\rho }}}, {{\bar{m}}})\) of (12), we can recover the optimal transport plan T via the following formula:
Building upon Benamou and Brenier’s dynamic reformulation of the Wasserstein distance, one can also consider a dynamic reformulation of the JKO scheme (6). In particular, substituting (12) in (6) leads to the following dynamic JKO scheme:
Problem 1
(Dynamic JKO) Given \(\tau >0\), \({\mathcal {E}}\), and \(\rho _0\), solve the constrained optimization problem,
where \((\rho ,m) \in AC(0,1; {{\mathcal {P}}}_{ac}(\varOmega )) \times L^1(0,1;L^2(\rho ^{1}))\) belong to the constraint set \({\mathcal {C}}\) provided that
We emphasize that the requirement \(\rho (x,t) \in {{\mathcal {P}}}_{ac}(\varOmega )\) for all \( t\in [0,1]\) ensures that \(\rho (x,t) \ge 0\).
Remark 1
(Wasserstein geodesics) Note that for any \(\rho _1 \in {{\mathcal {P}}}_{ac}(\varOmega )\), we may take
in which case Problem 1 reduces to the Benamou–Brenier formulation of the Wasserstein distance (12). Consequently, the numerical method we develop for Problem 1 offers, as a particular case, a provably convergent numerical method for computing the Wasserstein geodesic and Wasserstein distance between \(\rho _0\) and \(\rho _1\). On the one hand, there are many alternative methods for computing Wasserstein geodesics in Euclidean space. Indeed, the many algorithms described in the introduction for computing the Wasserstein distance also provide an optimal transport plan, which can be linearly interpolated to give the Wasserstein geodesic [8, 11, 55, 61, 68, 97]. On the other hand, our method is distinguished because it could be more naturally extended to variants of the Wasserstein metric built on the Benamou–Brenier formulation [33, 64, 87], as well as to Wasserstein geodesics on nonEuclidean manifolds, where the geodesic equations on the underlying manifold may no longer be explicit, so that one cannot pass directly from the optimal transport plan to the Wasserstein geodesic.
Remark 2
(existence and uniqueness of minimizers) If the underling domain \(\varOmega \) is convex and the energy \({\mathcal {E}}\) is proper, lower semicontinuous, coercive, and \(\lambda \)convex along generalized geodesics, and also satisfies \( \{ \mu : {\mathcal {E}}(\mu ) < +\infty \} \subseteq {{\mathcal {P}}}_{ac}(\varOmega )\), then, for \(\tau >0\) sufficiently small, there exists a unique solution to Problem 1 [2, Theorem 4.0.4, Theorem 8.3.1]. In particular, these assumptions are satisfied by the energy \({\mathcal {G}}_{\rho _1}\) (18), as well as by the drift–diffusion interaction energy from the introduction (4), for U as in Eq. (3), \(V, W \in C^2(\varOmega )\). (See, for example, [2, Section 9.3] or [56] for more general conditions on U, V, W.)
Thus, if we denote by \(({\bar{\rho }},{\bar{m}})\) the minimizer of Problem 1, then for \(\tau >0\) sufficiently small, the proximal map,
is well defined for all \(\rho _0 \in D({\mathcal {E}})\). Furthermore, the energy decreases under the proximal map,
which can be seen by comparing the value of the objective function at the minimizer \(({\overline{\rho }},{\overline{m}})\) to the value of the objective function at \((\rho (x,0), 0) \in {\mathcal {C}}\) and using that \(\varPhi (\rho ,m) \ge 0\).
Given \(\rho _0 \in D({\mathcal {E}})\), if we recursively define the discrete time gradient flow sequence
then, taking \(\tau = t/n\), \(\rho ^n_\tau \) converges to \(\rho (x,t)\), the gradient flow of the energy \({\mathcal {E}}\) with initial data \(\rho _0\) at time t, and under mild regularity assumptions on \(\rho _0\), we have
In this way, the classical JKO scheme provides a firstorder approximation of the gradient flow [2, Theorem 4.0.4]. In our numerical simulations, we observe that this discretization error dominates other errors in our numerical method; see Sects. 4.2.1 and 4.2.2. For this reason, we introduce the following new scheme, inspired by the Crank–Nicolson method.
Problem 2
(Crank–Nicolson Inspired Dynamic JKO) Given \(\tau >0\), \({\mathcal {E}}\), and \(\rho _0\), solve the constrained optimization problem,
where \((\rho ,m) \in AC(0,1; {{\mathcal {P}}}_{ac}(\varOmega )) \times L^1(0,1;L^2(\rho ^{1}))\) belong to the constraint set \({\mathcal {C}}\) provided that
In Sect. 4.2.2, we provide numerical examples comparing the above method to the classical JKO scheme from Problem 1, illustrating that it achieves a higherorder rate of convergence in practice (see Figs. 7, 8, and 10), in spite of the fact that that it lacks the energy decay property of Problem 1. Under what conditions a higherorder analogue of inequality (21) holds for the new scheme is an interesting open question that we leave to future work, as the main goal of the present work is the development of fully discrete numerical methods for computing minimizers of Problem 1 and 2. By iterating either of these minimization problems, as in Eq. (20), we obtain a numerical method for simulating Wasserstein gradient flows.
Fully Discrete JKO
We now turn to the discretization of the dynamic JKO scheme, Problem 1, and the Crank–Nicolson inspired scheme, Problem 2. We begin by noting that the Crank–Nicolson inspired Problem 2 can be rewritten in the same form as Problem 1 by considering the energy
Using this observation, we will now describe our discretization of both problems simultaneously.
Discretization of Functions and Domain
Given an ndimensional hyperrectangle \(S = \varPi _{I=1}^{n} [a_I,b_I] \subseteq {\mathbb {R}}^{n}\), we discretize it as a union of cubes \(Q_i\), \(i \in {\mathbb {N}}^n\), where in the lth direction, we suppose there are \(N_l\) intervals of spacing \(({ \mathbf \varDelta z})_l = (b_la_l)/N_l\):
Piecewise constant functions with respect to this discretization are given by
To discretize Problem 1, we take \(S = {\overline{\varOmega }} \times [0,1] \subseteq {\mathbb {R}}^{d +1}\), where \({\overline{\varOmega }} = \varPi _{i=1}^{n} [a_i,b_i] \). For any \(i \in {\mathbb {N}}^{d+1}\), write \(i = (j,k)\), for the spatial index \(j \in {\mathbb {N}}^d\) and the temporal index \(k \in {\mathbb {N}}\). We let \( N_x \in {\mathbb {N}} \) denote the number of intervals in each spatial direction and \(N_t \in {\mathbb {N}}\) denote the number of intervals in the temporal direction. Take \(\mathbf{\varDelta z} = (\mathbf{\varDelta x}, \varDelta t)\) for \((\mathbf{\varDelta x})_l = (\varDelta x) >0\) for all \(l = 1, \dots , d\) and \(\varDelta t >0\).
We consider piecewise constant approximations \((\rho ^h,m^h)\) of the functions \((\rho ,m)\), with coefficients denoted by \((\rho _{j,k}, m_{j,k})\). For any \( (\rho ,m) \in C({\overline{\varOmega }} \times [0,1])\), one such approximation is the pointwise piecewise approximation \(({\hat{\rho }}^h, {\hat{m}}^h)\), obtained by defining the coefficients \(({\hat{\rho }}_{j,k}, {\hat{m}}_{j,k})\) to be the value of \((\rho ,m)\) on a regular grid of spacing \((\varDelta x) \times (\varDelta t)\):
where \({\mathbf {1}} = [1,1, \dots , 1]^t \in {\mathbb {N}}^d\). Note that, whenever \((\rho ,m) \in C({\overline{\varOmega }} \times [0,1])\), we have that \(({\hat{\rho }}^h, {\hat{m}}^h)\) converges to \((\rho ,m)\) uniformly.
Discretization of Energy Functionals
Next, we approximate the energy functionals by discrete energies \({\mathcal {E}}^h\), beginning with energies of the form (4). Given a piecewise constant function \(\rho ^h\) with coefficients \(\rho _j\),
where \(V^h(x) = \sum _{j} V_j 1_{Q_j}(x)\) is a piecewise constant approximation of V(x) and \(W^h(x,y) = \sum _{j,l} W_{j,l} 1_{Q_j}(x)1_{Q_j}(y) \) is a piecewise constant approximation of \(W(xy)\). Here, \(W_{j,l} = W(x_j  x_l)\) symmetric, i.e., \(W_{j,l} = W_{l,j}\).
Likewise, for energies of the form (4), we consider the following discretization of the energy \({\mathcal {H}}_{\rho _0}\) from Eq. (22) for the Crank–Nicolson inspired scheme, Problem 2,
Finally, to compute Wasserstein geodesics between two measures \(\rho _0, \rho _1 \in {{\mathcal {P}}}_{ac}(\varOmega )\), we consider a discretization of the energy \({\mathcal {G}}_{\rho _1}\) from Eq. (18). Given a piecewise constant approximation \(\rho _1^h\) of \(\rho _1\) and \(\delta \ge 0\), define
Discretization of Derivative Operators
Let \(D^h_t \rho ^h \) and \(D^h_x m^h\) denote the discrete time derivative and spatial divergence on \(\varOmega \times [0,1]\) and let \(\nu ^h\) denote the discrete outer unit normal of \(\varOmega \). (See Hypothesis 3 for the precise requirements we impose on each of these discretizations). For example, in one dimension we may choose a centered difference in space and a forward Euler method in time,
or a Crank–Nicolson method,
We compute these discretizations of the derivatives at the boundary by extending \(m_{j,k} \) to be zero in the direction of the outer unit normal vector. As we can only expect these approximations of the temporal and spatial derivatives to hold up to an error term, we relax the equality constraints from (17) in the following discrete dynamic JKO scheme.
Discrete Dynamic JKO
The discretizations described in the previous sections lead to a fully discrete dynamic JKO problem:
Problem \(1_{j,k}\) (Discrete Dynamic JKO) Fix \(\tau , \delta _1, \delta _2, \delta _3, \delta _4 >0\), \({\mathcal {E}}^h\), and \(\rho _0^h\). Solve the constrained optimization problem,
where \((\rho _{j,k},m_{j,k})\) belong to the constraint set \({\mathcal {C}}^h\) provided that for all j, k,
The inequalities (30) enforce the PDE constraint and the boundary condition; the inequalities (31) enforce the mass constraint and the initial conditions. Recall that, by definition of \(\varPhi \) in Eq. (13), \(\varPhi (\rho _{j,k}, m_{j,k})< +\infty \) only if \(\rho _{j,k}\) is nonnegative. Consequently, if a minimizer \(\rho _{j,k}\) exists, it must be nonnegative.
Remark 3
(relaxation of PDE constraints) A key element of our numerical method is that we relax the equality constraint (17) at the fully discrete level. This reflects the fact that even an exact solution of the continuum PDE will only satisfy the discrete constraints (3031) up to an error term depending on the order of the finite difference operators.
We allow the choice of \(\delta _i\) to vary for each of the above constraints. However, when the desired exact solution is sufficiently smooth, the optimal choice of \(\delta _i\) for a secondorder discretization of the spatial and temporal derivatives is
where \(\tau >0\) is the size of the timestep in the outer time discretization; see equations (67). As we will demonstrate in Fig. 3 of our numerics section, relaxing the PDE constraint accelerates convergence to a minimizer of the fully discrete Problem \(1_{j,k}\) without any loss of accuracy with respect to the exact continuum solution.
Finally, note that while the discrete PDE constraint (30) automatically enforces the mass constraint up to order \(\delta _1^2 + \delta _2^2\), we choose to impose the mass constraint separately via the first Eq. in (31). This leads to better performance in examples where the exact solution is not smooth enough to satisfy the discrete PDE constraint up to a high order of accuracy but imposing a stricter mass constraint leads to a higher quality numerical solution; see Fig. 4.
Under sufficient hypotheses on the discrete energy \({\mathcal {E}}^h\) and the initial data \(\rho _0^h\), minimizers of Problem \(1_{j,k}\) exist; see Theorem 1. Furthermore, this discrete dynamic JKO scheme preserves the energy decreasing property of the original JKO scheme. To see this, note that, given an energy \({\mathcal {E}}^h\), time step \(\tau >0\), and initial data \((\rho _0^h)_j\) we may define the fully discrete proximal map by
where \(({\overline{\rho }}_{j,k}, {\overline{m}}_{j,k})\) is any minimizer of Problem \(1_{j,k}\). Independently of which minimizer is chosen, we have
which can be seen by comparing the value of the objective function at the minimizer \(({\overline{\rho }}_{j,k},{\overline{m}}_{j,k})\) to the value of the objective function at \((\rho _{j,k}, m_{j,k}) = ((\rho _0)_j, 0) \in {\mathcal {C}}\) and using the fact that \(\varPhi \ge 0\). Furthermore, by iterating the fully discrete proximal map, we may construct a fully discrete gradient flow sequence
In analogy with the continuum case, we will use this fully discrete JKO scheme to simulate gradient flows. (See Algorithm 3.)
Primal Dual Algorithms for Fully Discrete JKO
In order to find minimizers of Problem \(1_{j,k}\), we apply a primal dual operator splitting method. Since the constraints in Problem \(1_{j,k}\) are linear inequality constraints, we may rewrite them in the form \(\Vert {{\tilde{{\mathsf {A}}}}}_i {u}  {{\tilde{b}}}_i\Vert _2 \le \delta _i\) for \(i = 1,2,3,4\), where \({u} = (\mathbf {\rho }, \mathbf {m})\), and \(\mathbf {\rho }\) and \(\mathbf {m}\) are vector representations of the matrices \(\rho _{j,k}\) and \(m_{j,k}\). (See the Appendix A for explicit formulas for \(\tilde{\mathsf {A}}_i\) and \({{\tilde{b}}}_i\), in one spatial dimension). Similarly, we may rewrite the first term of the objective function (29) in terms of u, defining
We consider two cases for the energy term in the objective function. When the energy is of the form \({\mathcal {G}}^h_{\rho _1}\), as in Eq. (26), we reframe the problem by removing the energy from the objective function and adding \( \sum _j \rho _{j, N_t} (\rho _1^h)_j ^2 (\varDelta x)^d \le \delta _5^2 \) to the constraints (30) and (31), denoting \(\Vert {\mathsf {A}}_i {u}  b_i\Vert _2 \le \delta _i\), for \(i = 1,2,3,4,5\), as the modified constraints. On the other hand, when the energy is of the form (24) or (25), we rewrite it in terms of u as
In particular, if we let \({\mathsf {S}}\) be the selection matrix
then \(F({u}) = {\mathcal {F}}^h({\mathsf {S}}{u})\) and \(H(u) = {\mathcal {H}}_{\rho _0}^h({\mathsf {S}}{u})\), where \({\mathcal {F}}^h\) and \({\mathcal {H}}_{\rho _0}^h\) are defined in (24) and (25), respectively.
This leads to the following two optimization problems:
Problem 3(a)
\(\min _{{u}} \varPhi ({u}) + {\mathfrak {i}}_{ \mathbf{{ \delta }}}({\mathsf {A}}{u}),\) \({\mathfrak {i}}_{ {\varvec{\delta }}} ( {\mathsf {A}}{u}) = \left\{ \begin{array}{cl} 0 &{} \Vert {\mathsf {A}}_i {u}  b_i\Vert _{2} \le \delta _i , \ i = 1, \dots ,5 \\ +\infty , &{} \text { otherwise.} \end{array} \right. \)
Problem 3(b)
\(\min _{{u}} \varPhi ({u}) + 2 \tau E({u}) + {\mathfrak {i}}_{\tilde{\varvec{\delta }}} ({{\tilde{{\mathsf {A}}}}} {u})\), \({\mathfrak {i}}_{\tilde{\varvec{\delta }}} ({{\tilde{{\mathsf {A}}}}} {u}) = \left\{ \begin{array}{cl} 0 &{} \Vert {{\tilde{{\mathsf {A}}}}}_i {u}  \tilde{b}_i\Vert _{2} \le {{\tilde{\delta }}}_i , \ i = 1,\dots ,4 \\ +\infty , &{} \text { otherwise.} \end{array} \right. \)
To compute the Wasserstein distance, we solve Problem 2.3, and to compute the gradient flow of an energy, we iterate Problem 2.3\(O(\frac{1}{\tau })\) times, for either \(E(u) = F(u)\) (classical JKO) or \(E(u) = H(u)\) (Crank–Nicolson inspired scheme).
Primaldual methods for solving optimization problems in which the objective function is the sum of two convex functions, as in Problem 2.3, are widely available [52]. However, analogous methods for optimizations problems in which the objective function is the sum of three convex functions, as in Problem 2.3, have only recently emerged [62, 106]. In particular, in Algorithm 1, for Problem 2.3, we use Chambolle and Pock’s wellknown primal dual algorithm, and in Algorithm 2, for Problem 2.3, we use Yan’s recent extension of this algorithm to objective functions with three convex terms. Both algorithms offer an extended range of primal and dual step sizes \(\lambda \) and \(\sigma \) and low periteration complexity, due to the sparseness of \({\mathsf {S}}\), \({\mathsf {A}}\), and \({\tilde{{\mathsf {A}}}}\). Note specifically that the success of Algorithm 1 depends on the ease of computing the proximal operators related to \(\phi \) and \({\mathfrak {i}}_\delta \), and therefore if we simply group the additional energy term in Problem 2.3 to either \(\phi \) or \({\mathfrak {i}}_\delta \), it would violate such property. Instead, we shall consider E(u) as a separate term and take advantage of its smoothness, as shown in Algorithm 2. Finally, in Algorithm 3, we describe how Algorithm 2 can be iterated to approximate the full JKO sequence and, consequently, solutions of a range of nonlinear partial differential equations of Wasserstein gradient flow type.
To initialize both algorithms, we choose \(\phi ^0\) and \(m^0\) to be zero vectors, and for \(\rho ^0\), we let its components at the initial time (i.e., \(k = 0\)) be \(\rho _0(x)\) evaluated on an equally spaced grid of width \(\varDelta x\), and other times to be zero. The stopping criteria consist of checking the PDE constraint (30)–(31) along with the convergence monitors:
The proximal operator, which appears in Algorithms 1 and 2, is defined by
For both \(h = \sigma i_{\varvec{\delta }}^*\) and \(h = \lambda \varPhi \), there are explicit formulas for the proximal operators. By Moreau’s identity, we may write \(\text {Prox}_{\sigma i_{\varvec{\delta }}^*}(x)\) in terms of projections onto balls of radius \(\delta _i\) centered at \(b_i\) for the ith portion of vector x:
For the proximal operator of \(\varPhi \), as shown by Peyré, Papadakis, and Oudet [95, Proposition 1],
where \(\rho ^*\) is the largest real root of the cubic polynomial equation \(P(x) := (x\rho )(x+\lambda )^2\frac{\lambda }{2}m^2 = 0,\) and \(m^*\) can be obtained by \(m^* = \rho ^*m/(\rho ^*+\lambda )\). By computing the proximal operator exactly, our primal dual method is positivity preserving, respecting a key property of the original Problems 1 and \(1_{j,k}\).
As the computations of both proximal operators (35), (36) are componentwise, they can easily be parallelized. Likewise, the computation of the gradient \(\nabla E\) is also componentwise:
Remark 4
(discrete convolution) As written, the above functionals involves a computation of the convolutions \(\sum _{l } W_{j,l} \rho _{l, N_t}\) and \(\sum _{l } W_{j,l} \rho _{l,0}\), which can be achieved efficiently using the fast Fourier transform. Note that since the product of the discrete Fourier transforms of two vectors is the Fourier transform of the circular convolution and the interaction potential \(W_{jk}=W(x_jx_k)\) is not a periodic function, we need zeropadding for computing the convolution. For the 1D case, we can first use the fast Fourier transform to compute the circular convolution of \(\mathbf {W}=(W_j)_{j=N_x+2}^{N_x2}\) and \((\mathbf {\rho }, ~ (\mathbf {0})_{N_x2})\), and then extract the last \(N_x1\) elements, which are the desired convolution \(\sum _{k } W_{jk} \rho _k\) for \(1\le j \le N_x1\).
Embedding Algorithm 2 to the JKO iteration, we have the following algorithm for Wasserstein gradient flows.
Note that line 6 in Algorithm 3 is to construct a better initial guess for \(\rho \) at each JKO iteration by applying an extrapolation.
Remark 5
(Comparison of our numerical method to previous work) Our definition of the indicator function in Problems 3(a) and 3 (b) differs from previous work, and as a result, our primaldual algorithm does not require the inversion of the matrix \({\mathsf {A}}{\mathsf {A}}^T\) [7, 95], which makes it quite efficient in high dimensions thanks to the sparsity of \({\mathsf {A}}\). A similar approach is taken in a recent preprint [81] to compute the earth mover’s distance \(W_1\), though, in this context, the earth mover’s distance is dissimilar from the Wasserstein distance, since it does not require an extra time dimension and is thus a lowerdimensional problem.
A second difference between our method and the approach in previous works is that, since P(x) has at most one strictly positive root, it can be obtained by the general solution formula for cubic polynomials with real coefficients. Therefore, in our numerical simulations, we may compute the proximal operator \(\text {Prox}_{\lambda \varPhi }(u)\) by using this general solution formula, rather than via Newton iteration [95]. As a consequence, our method is truly positivity preserving, as opposed to positivity preserving in the limit as \(\varDelta x, \varDelta t \rightarrow 0\).
We close this section by recalling sufficient conditions on the primal and dual step sizes \(\sigma \) and \(\lambda \) that ensure Algorithms 1 and 2 converge to minimizers of Problems 2.3 and 2.3.
Proposition 1
(Convergence of Algorithm 1, c.f. [52]) Suppose \(\sigma \lambda <1/\lambda _{max}({{\mathsf {A}}}{{\mathsf {A}}}^t)\) and a minimizer of Problem 2.3 exists. Then, as \(\mathrm {Iter_{max}} \rightarrow +\infty ,\) and \(\epsilon _1\), \(\epsilon _2 \rightarrow 0\) in the stopping criteria (33) (34), the output \({u}^*\) of Algorithm 1 converges to a minimizer of Problem 2.3.
Proposition 2
(Convergence of Algorithm 2, c.f. [106]) Suppose that the discrete energy E(u) defined in Eq. (32) is proper, lower semicontinuous, convex, and there exists \(\beta >0\) such that \(\left\langle u_1u_2, \nabla _u E(u_1)  \nabla _u E(u_2)\right\rangle \ge \beta \Vert \nabla E(u_1)  \nabla E(u_2)\Vert ^2\). Suppose further that \(\sigma \lambda <1/\lambda _{max}({\tilde{{\mathsf {A}}}}{\tilde{{\mathsf {A}}}}^t)\), \(\lambda <2\beta \), and a minimizer of Problem 2.3 exists. Then, as \(\mathrm{Iter_{max}} \rightarrow +\infty \) and \(\epsilon _1\), \(\epsilon _2 \rightarrow 0\), the output \(u^*\) converges to a minimizer of Problem 2.3.
Note here that the cocoercivity requirement on \(\nabla E\) in the above proposition is equivalent to require the Lipschitz continuity of \(\nabla E\), i.e., \(\Vert \nabla _u E(u_1)  \nabla _u E(u_2)\Vert \le \frac{1}{\beta } \Vert u_1  u_2\Vert \). For the energy of the form (4), this requirement reduces to the boundedness of \(U''(\rho )\) and W, which can be satisfied independent of the numerical resolution if we consider bounded solution (no finite time blow up in \(\rho \)) and nonsingular interaction kernel. In the case when W is singular, for example when W is a Newtonian interaction potential, we approximate W by a continuous function via convolution with a mollifier; see Remark 7.
Convergence
We now prove the convergence of solutions of the fully discrete JKO scheme, Problem \(1_{j,k}\), to a solution of the continuum JKO scheme, Problem 1. We begin, in Sect. 3.1, by describing the hypotheses we place on the underlying domain \(\varOmega \), the energy \({\mathcal {E}}\), the initial data \(\rho _0\), and the discretization operators. Then, in Sect. 3.2, we show that minimizers of Problem \(1_{j,k}\) exist, provided the discretization is sufficiently refined. Finally, in Sect. 3.3, we prove that any sequence of minimizers of Problem \(1_{j,k}\) has a subsequence that converges to a minimizer of Problem 1. In order for our finite difference approximation to converge, we assume throughout that a smooth, positive minimizer of the continuum JKO scheme Problem 1 exists. See hypothesis (H6) and Remark 9 for further discussion of this assumption.
Hypotheses
We impose the following hypotheses on the underlying domain, energy, and discretization operators.

(H1)
\(\varOmega = \varPi _{i=1}^d (a_i,b_i) \subseteq {{\mathbb {R}}^d}\), for \(a_i< b_i \in {\mathbb {R}}\). We assume that the spacing of the spatial discretization \((\varDelta x) >0 \) and the temporal discretization \((\varDelta t)>0\) are both functions of h satisfying \(\lim _{h \rightarrow 0} (\varDelta x) = \lim _{h \rightarrow 0} (\varDelta t) = 0\).

(H2)
For any piecewise constant function \(\rho ^h\) on \(\varOmega \), the discrete energy functional \({\mathcal {E}}^h\) has one of the following forms, as described in Sect. 2.2.2:

(a)
\( {\mathcal {F}}^h(\rho ^h) = \sum _{j} \left( U( \rho _j) + V_j \rho _j \right) {{(\varDelta x)^d}}+ \sum _{j,l} W_{j, l} \rho _j \rho _{l} (\varDelta x)^{2d} \)

(b)
\( {\mathcal {H}}^h_{\rho _0}(\rho ^h) = \frac{1}{2} {\mathcal {F}}^h(\rho ^h) + \frac{1}{2} \sum _{j} \left( U'( (\rho _0)_j) + V_j +\sum _l W_{j, l} (\rho _0)_l (\varDelta x)^{d} \right) \rho _j (\varDelta x)^{d} \)

(c)
\({{\mathcal {G}}}_{\rho _1}^h(\rho _j) := {\left\{ \begin{array}{ll} 0 &{}\text { if } \sum _j \rho _j  ( {\rho }^h_1)_j ^2 (\varDelta x)^d \le \delta _5^2 \\ +\infty &{}\text { otherwise.} \end{array}\right. } \)
We place the following assumptions on U, V, and W and the target measure \(\rho _1\):

(i)
Either \(U \equiv 0\) or \(U \in C([0, +\infty ))\) is convex, \(U \in C^1((0,+\infty ))\), \(\lim _{r \rightarrow +\infty } \frac{U(r)}{r} = +\infty \), and \(U(0) = 0\);

(ii)
\( V^h(x) := \sum _{j \in {\mathbb {Z}}^d} V_{j} 1_{Q_{j}}(x) \) and \(W^h(x,y) := \sum _{(j,l) \in {\mathbb {Z}}^d \times {\mathbb {Z}}^d} W_{j,l} 1_{Q_{j}}(x) 1_{Q_{l}}(y)\) are piecewise constant approximations of \(V,W \in C( {\overline{\varOmega }})\) converging uniformly on \({\overline{\varOmega }}\).

(iii)
\(\rho _1 \in C^1({\overline{\varOmega }})\) and \(\rho _1^h\) is a pointwise piecewise constant approximation of \(\rho _1\).

(a)

(H3)
\(D^{h}_t\) and \(D^{h}_x\) are finite difference approximations of the time derivative and spatial divergence. We assume that \(D^h_t\) is a forward Euler method in time, whereas \(D^h_x\) can be given by an explicit or implicit scheme of first or higher order. We denote by \(D^{h}_t\) and \(D^{h}_x\) the dual operators with respect to the \(\ell ^2\) inner product, and we assume the following integration by parts formulas hold for all piecewise constant functions \(\rho ^h,f^h: [0,1] \rightarrow {\mathbb {R}}\),
$$\begin{aligned} \int _0^1 D^{h}_t \rho ^h f^h \mathrm {d}t = \left( \left. \rho ^h f^h \right _{0}^1 \right) \int _0^1 \rho ^h D^{h}_t f^h \mathrm {d}t \end{aligned}$$and if \(m^h:\varOmega \rightarrow {\mathbb {R}}^d\), \(f^h : \varOmega \rightarrow {\mathbb {R}}\),
$$\begin{aligned} \int _{\varOmega } D^{h}_x m^h f^h \mathrm {d}x = \int _{\partial \varOmega } f^h m^h \cdot \nu ^h \mathrm {d}x  \int _{\varOmega } m^h D^{h}_x f^h \mathrm {d}x , \end{aligned}$$where \(\nu ^h: {\overline{\varOmega }} \rightarrow {{\mathbb {R}}^d}\) is the discrete outer unit normal of \(\varOmega \). Finally, we assume there exists \(C>0\) depending on the domain \({\overline{\varOmega }} \times [0,1]\), so that, for any \(f \in C^1({\overline{\varOmega }} \times [0,1]; {\mathbb {R}})\) and \(v \in C^1({\overline{\varOmega }} \times [0,1]; {\mathbb {R}}^d)\), if \(({f}^h,{v}^h)\) are pointwise piecewise constant approximations,
$$\begin{aligned} \Vert D^{h}_t {f}^h  \partial _t f\Vert _\infty&\le C\Vert \partial _t^2 f \Vert _\infty (\varDelta t) ,&\Vert D^{h}_t {f}^h  \partial _t f \Vert _\infty&\le C\Vert \partial _t^2 f \Vert _\infty (\varDelta t) \\ \Vert D^{h}_x {v}^h  \nabla \cdot v\Vert _\infty&\le C \Vert D^2 v\Vert _\infty (\varDelta x),&\Vert D^{h}_x {f}^h  \nabla f \Vert _\infty&\le C \Vert D^2 f \Vert _\infty (\varDelta x) \\ \Vert {v}^h \cdot \nu ^h  v \cdot \nu \Vert _\infty&\le C \Vert v\Vert _\infty (\varDelta x) . \end{aligned}$$(See Sect. 2.2.3 for finite difference approximations satisfying these hypotheses.)

(H4)
The constraint relaxation parameters \(\delta _1, \delta _2, \delta _3, \delta _4 \ge 0\) are functions of h with \( \lim _{h \rightarrow 0} \delta _i = 0\), for all i. If the energy is of the form (H2c), we require that \(\delta _5\) is a function of h satisfying \( \lim _{h \rightarrow 0} \delta _5 = 0\) and \( \lim _{h \rightarrow 0} \left( \varDelta x + \varDelta t\right) /\delta _5 = 0\).

(H5)
The initial data of the continuum problem satisfy \(\rho _0 \in C^1({\overline{\varOmega }})\) and \( {\rho }_0^h\) is a pointwise piecewise constant approximation of \(\rho _0\).

(H6)
Given the domain, energy, and initial data described in the previous hypotheses, there exists a minimizer \(( {\rho }, {m})\) of the continuum Problem 1 satisfying \( {\rho } \in C^2([0,1]; C^1({\overline{\varOmega }}))\), \( {\rho } >0\), and \( {m} \in C^1([0,1]; C^2({\overline{\varOmega }}))\).
To ease notation in the following convergence proof, we observe that Problem \(1_{j,k}\) may be rewritten as follows in terms of \((\rho ^h, m^h)\), the piecewise constant functions on \(\varOmega \times [0,1]\) corresponding to the coefficients \((\rho _{j,k}, m_{j,k})\).
Problem \({1^h}\) (Discrete Dynamic JKO) Fix \(\tau , \delta _1, \delta _2, \delta _3, \delta _4 >0\), \({\mathcal {E}}^h\), and \(\rho _0^h\). Solve the constrained optimization problem,
where \((\rho ^h,m^h)\) belong to the constraint set \({\mathcal {C}}^h\) provided that they are piecewise constant functions on \(\varOmega \times [0,1]\) and the following inequalities hold
Similarly, we may rewrite the definition of the discrete energies in hypothesis (H2) in terms of a piecewise constant functions \(\rho ^h\) on \(\varOmega \) corresponding to \(\rho _j\),
Recall that, by definition of \(\varPhi \) in equation (13), \(\varPhi (\rho ^h, m^h)< +\infty \) only if \(\rho ^h\) is nonnegative. Consequently, if a minimizer \(\rho \) exists, it must be nonnegative.
We conclude this section with several remarks on the sharpness of the preceding hypotheses.
Remark 6
(assumption on domain \(\varOmega )\) In hypothesis (H1), we assume that \(\varOmega \) is an ndimensional hyperrectangle. We impose this assumption for simplicity, as it provides an natural interpretation of the discretized outer unit normal \(\nu ^h\), which is essential in imposing the boundary conditions for our PDE constraint at the discrete level. More generally, our convergence result can be extended to any Lipschitz domain, as long as sufficient care is taken to define the discrete outer unit normal and the corresponding no flux boundary conditions.
Remark 7
(assumption on energy) As described in hypothesis (H2), our convergence result applies to internal U, drift V, and interaction W potentials that are sufficiently regular on \({\overline{\varOmega }}\). Our assumptions on U are classical and ensure that the internal energy is lower semicontinuous with respect to weak* convergence [2, Remark 9.3.8]. Our assumptions on V and W, on the other hand, are somewhat stronger, and in practice, one often encounters partial differential equations for which the corresponding choices of V and W are not continuous. However, there are robust methods for approximating these potentials by continuous functions that ensure convergence of the gradient flows. For example, the second author and Topaloglu provide sufficient conditions on discontinuous interaction potentials W for which gradient flows of the regularized interaction potential, \(W_\varepsilon := W*\varphi _\varepsilon \) for a smooth mollifier \(\varphi _\varepsilon \), converge to gradient flows of the original interaction potential W, as well as conditions that ensure minimizers of \(W_\varepsilon \) converge to minimizers of W [59]. (The convergence of general stationary points of \(W_\epsilon \) that are not global minimizers to stationary point of W remains open.)
Remark 8
(assumption on \(\delta _5)\) In hypothesis (H4), it is essential that \(\delta _5\) not vanish too quickly with respect to other parameters in the discretization. A simple illustration of this fact arises in the case that \(\delta _1 \equiv \delta _2 \equiv \delta _3 \equiv \delta _4 \equiv 0\). In this case, we cannot choose \(\delta _5 \equiv 0\), since our pointwise piecewise approximation of the initial data \(\rho _0^h\) will not generally have the same mass as our pointwise piecewise approximation of the target measure \(\rho _1^h\), and if they do not have the same mass, minimizers of the discrete problem do not exist. Consequently, it would be impossible to prove that minimizers of the fully discrete problem converge to minimizers of the continuum problem. On the one hand, this does not greatly impact the performance of our numerical method, as can be seen by considering previous work by Papadakis, Péyre, and Oudet, which numerically implements this approach [95]. On the other hand, our numerical simulation in Fig. 3 indicates that poor choice of the relaxation parameters can cause the method to iterate longer than necessary, without any improvement in accuracy.
Our requirement that \(\lim _{h \rightarrow 0} (\varDelta x + \varDelta t)/\delta _5 =0\) is sufficient to fix this problem and ensure convergence of the method, and this requirement is nearly sharp. To see this, note that, for an arbitrary pointwise piecewise approximation \(\rho _0^h\) of a continuous function \(\rho _0\), we cannot in general achieve accuracy of \( \int _\varOmega \rho _0^h  \int _\varOmega \rho _0\) better than \(O (\varDelta x)\). If either \(\delta _1\) and \(\delta _3\), the parameters for the PDE constraint and the mass constraint, are chosen arbitrarily small, then \(\int _\varOmega \rho ^h(\cdot ,1)  \int _\varOmega \rho _0^h\) can likewise be made arbitrarily small. Thus, since \(\rho _0, \rho _1 \in {{\mathcal {P}}}_{ac}(\varOmega )\),
so we much have \(\delta _5 \ge O(\varDelta x)\). While a CFLtype condition is not necessary for the stability of our discretization of the PDE constraint, since \(\rho \) and m indeed become coupled in the continuum limit (see Eqs. (8) and (12)), one should expect \((\varDelta t) \le O(\varDelta x)\) to give the best balance between computational accuracy and cost, and we indeed observe this numerically. Combining these facts shows that enforcing that \(\delta _5\) cannot decay faster than \(O(\varDelta x + \varDelta t)\) by assuming \(\lim _{h \rightarrow 0} (\varDelta x + \varDelta t)/\delta _5 =0\) is nearly optimal.
Remark 9
(assumption on existence of smooth, positive minimizer) In hypothesis (H6), we suppose that there exists a sufficiently regular minimizer \(({\overline{\rho }},{\overline{m}})\), \({\bar{\rho }}>0\), of the continuum problem. Our proof of the existence of minimizers of the fully discrete problem and our proof that minimizers of the discrete problems converge to a minimizer of the continuum problem as \(h \rightarrow 0\) strongly rely on this assumption. In particular, the smoothness assumption allows us to use convergence of the finite difference operators, described in hypothesis (H3), to construct an element of \({\mathcal {C}}^h\) in Proposition 3. The positivity assumption allows us to conclude that \(\nabla _{\rho , m} \varPhi \) is uniformly bounded on the range of \({\bar{\rho }}\), which we use to prove the \(\limsup \) inequality for the recovery sequence in Theorem 2(b).
From the perspective of approximating gradient flows, which are solutions of diffusive partial differential equations (3), such regularity and positivity can be guaranteed as long as the initial data are smooth and positive and either the diffusion is sufficiently strong or the drift and interaction terms do not cause loss of regularity. On the other hand, developing conditions on the energy and initial data that ensure such regularity and positivity holds at the level of the JKO scheme, for minimizers of Problem 1, remains largely open: results on the propagation of \(L^p({{\mathbb {R}}^d})\) or BV bounds along the scheme have only recently emerged [17, 50, 63].
From the perspective of approximating Wasserstein geodesics, the now classical regularity theory developed by Caffarelli and Urbas ensures that if the source and target measures \(\rho _0\) and \(\rho _1\) are smooth and strictly positive, then the minimizer of Problem 1\(({\bar{\rho }},{\bar{m}})\) is also smooth and strictly positive. (See, for example, [105, Section 4.3] and [2, Section 8.3].)
Along with this analytical justification for our smoothness and positivity assumptions, our numerical results also indicate that such assumptions are in general necessary. For example in Fig. 4, we observe that if the source and target measure of a Wasserstein geodesic are not sufficiently smooth, the numerical solution introduces artificial regularity. Likewise, even in Fig. 6, we observe that the numerical simulation is strictly positive (though very close to zero in places), while the exact solution is identically zero outside of its support. Still, in spite of the fact that our theoretical convergence result requires smoothness and positivity assumptions, in practice our numerical method still performs well on nonsmooth or nonpositive problems, provided that the spatial and temporal discretization are taken to be sufficiently small; see Figs. 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, and 18.
Finally, these types of smoothness and positivity assumptions are typically needed in convergence proofs for numerical methods based on the JKO scheme. For example, in a method based on the Monge Ampére approximation of the Wasserstein distance, the exact solution is required to be uniformly bounded above and below [10]. Likewise, while rigorous convergence results for fully discrete numerical methods based on entropic or Fisher information regularization remain open, since these methods correspond to introducing numerical diffusion at the level of the PDE, they automatically enforce smoothness and positivity [28, 55, 82].
Existence of Minimizers
We now show that, under the hypotheses described in the previous section, minimizers of the fully discrete JKO scheme, Problem \({1^h}\), exist for all \(h>0\) sufficiently small. We begin with the following proposition, which constructs a specific element in the constraint set \({\mathcal {C}}^h\), which we will use both in our proof of existence of minimizers and in our \(\varGamma \)convergence results in the next section.
Proposition 3
(construction of element in \({\mathcal {C}}^h)\) Suppose that hypotheses (H1)–(H6) hold, and choose \((\rho , m) \in {\mathcal {C}}\) satisfying \( {\rho } \in C^2([0,1]; C^1({\overline{\varOmega }}))\), \( {\rho } >0\), and \( {m} \in C^1([0,1]; C^2({\overline{\varOmega }}))\). Then for \(h>0\) sufficiently small, there exists \(({\tilde{\rho }}^h, {\tilde{m}}^h) \in {\mathcal {C}}^h\) satisfying \(({\tilde{\rho }}^h, {\tilde{m}}^h) \xrightarrow {h \rightarrow 0} (\rho ,m)\) uniformly on \(\varOmega \times [0,1]\) and
If, in addition, the energy satisfies hypothesis (H2c) and \({\mathcal {E}}(\rho (\cdot , 1))<+\infty \), then we have
for all \(h>0\) sufficiently small.
Proof
We construct \((\rho ^h,m^h) \in {\mathcal {C}}^h\) as follows. Let \({\hat{m}}^h\) be a pointwise piecewise constant approximation of m; see Eq. (23). Recall that \(\nu ^h\) is the discrete outer unit normal vector. We define \({\tilde{m}}^h: \varOmega \times [0,1] \rightarrow {{\mathbb {R}}^d}\) componentwise to respect the no flux boundary conditions, letting \(({\tilde{m}}^h)_l\) denote the lth component of the vector for \(l = 1, \dots , d\). If \(x \in \partial \varOmega \), then we define
Otherwise, we take \({\tilde{m}}^h(x,t) = {\hat{m}}^h(x,t)\). Define \({\tilde{\rho }}^h: \varOmega \times [0,1] \rightarrow {\mathbb {R}}\) so that \({\tilde{\rho }}^h(x,0) = {\rho }_0^h\) and \(D^{h}_t {\tilde{\rho }}^h(x,t) + D^h_x {\tilde{m}}^h(x,t) \equiv 0\).
We begin by showing that \(({\tilde{\rho }}^h, {\tilde{m}}^h) \in {\mathcal {C}}^h\). By construction, for all \(h >0\),
Taking \(f^h \equiv 1\) in Hypothesis (H3) and applying the PDE constraint ensures that, for all \(s \in [0,1]\) and \(k \in {\mathbb {N}}\) so that \(k (\varDelta t) \le s < (k+1) \varDelta t\),
Thus, we also obtain
This concludes the proof that \(({\tilde{\rho }}^h, {\tilde{m}}^h) \in {\mathcal {C}}^h\).
We now show that \(({\tilde{\rho }}^h, {\tilde{m}}^h) \rightarrow (\rho ,m)\) uniformly on \(\varOmega \times [0,1]\) as \(h \rightarrow 0\). We begin by proving convergence of \({\tilde{m}}^h\) to m. Due to hypothesis (H1) on our domain \(\varOmega \), whenever \(e_i \cdot \nu ^h(x) \ne 0\), there exists \(y \in \partial \varOmega \) so that \(yx \le 2\sqrt{d} (\varDelta x)\) and \( \nu (y) = e_i\). Thus, whenever \(e_i \cdot \nu ^h(x) \ne 0\), the continuum boundary condition \(m(y,t) \cdot \nu (y) = 0\) ensures that for all \(t \in [0,1]\),
We also have that, for all \((x,t) \in \varOmega \times [0,1]\),
Therefore, for all \((x,t) \in \varOmega \times [0,1]\), there exists \(C_m = C_m(d, \Vert Dm\Vert _\infty , \Vert \partial _t m \Vert _\infty )>0\) so that
We now prove the convergence of \({\tilde{\rho }}^h\) to \(\rho \). Since \((\rho ,m)\) is a classical solution of the PDE constraint and \({\tilde{\rho }}^h: \varOmega \times [0,1] \rightarrow {\mathbb {R}}\) is defined by the conditions that \({\tilde{\rho }}^h(x,0) = {\hat{\rho }}_0^h\) and \(D^{h}_t {\tilde{\rho }}^h(x,t) + D^h_x {\tilde{m}}^h(x,t) \equiv 0\), for \((x,t) \in \varOmega \times [0,1]\) and \(k \in {\mathbb {N}}\) so that \(k (\varDelta t) \le t < (k+1) (\varDelta t)\), we have
Since \({\tilde{\rho }}^h \rightarrow \rho \) uniformly and \(\rho >0\), we immediately obtain (39).
Finally, suppose the energy satisfies (H2c). Since \({\mathcal {E}}(\rho (\cdot , 1)) = {\mathcal {G}}_{\rho _1}(\rho (\cdot , 1))< +\infty \), we have \(\rho (\cdot , 1) = \rho _1\). By inequality (41) and the fact that \(\rho ^h_1\) is a pointwise piecewise approximation of \(\rho (\cdot , 1)\),
where \(C_{\rho ,m} = C_{\rho ,m}(\varOmega ,\Vert \nabla \rho \Vert _\infty ,\Vert \nabla \cdot m \Vert _\infty , \Vert D^2m\Vert _\infty )>0\). By hypothesis (H4), \(\lim _{h \rightarrow 0} \frac{(\varDelta x + \varDelta t)}{\delta _5} \rightarrow 0\). Thus, for h sufficiently small,
which completes the proof.
\(\square \)
Theorem 1
(minimizers of discrete dynamic JKO exist) Suppose that hypotheses (H1)–(H6) hold. Then for all \(h>0\) sufficiently small, a minimizer of Problem \({1^h}\) exists.
Proof
First, we note that Proposition 3 ensures that, for \(h>0\) sufficiently small, the constraint set \({\mathcal {C}}^h\) is nonempty and contains some \((\rho ^h,m^h)\) satisfying \(\rho ^h > 0\). If the energy satisfies (H2a) or (H2b), then we immediately obtain \({\mathcal {E}}^h(\rho ^h( \cdot , 1))<+\infty \). Similarly, if the energy satisfies (H2c), then inequality (40) in Proposition 3 again ensures that \({\mathcal {E}}^h(\rho ^h(\cdot ,1)) < +\infty \).
Since \(\varPhi (\rho ^h,m^h)<+\infty \) whenever \(\rho ^h \ge 0\), this ensures that value of the objective function in the discrete minimization problem \({1^h}\) is not identically \(+\infty \) on the constraint set. Therefore,
and we may choose a minimizing sequence \((\rho ^h_n, m^h_n) \in {\mathcal {C}}^h\) that converges to the infimum. We may assume, without loss of generality, that
To conclude the proof of the theorem, we will now show that there exists \((\rho ^h_*, m^h_*)\) so that a subsequence of \((\rho ^h_n, m^h_n)\) converges to \((\rho ^h_*, m^h_*)\) uniformly on \(\varOmega \times [0,1]\). Then, since the objective functional \({\mathcal {E}}^h\) is lower semicontinuous along uniformly convergent sequences [1, Example 2.36] and the constraint set \({\mathcal {C}}^h\) is closed under uniform convergence for fixed \(h>0\), \( (\rho ^h_*, m^h_*)\) must be a minimizer of the fully discrete problem.
In order to obtain compactness of \((\rho ^h_n, m^h_n)\), first note that (42) ensures \( \varPhi ( \rho ^h, m^h) < +\infty \) on \({\overline{\varOmega }} \times [0,1]\), so \(\rho ^h \ge 0\) on \({\overline{\varOmega }}\). Furthermore, the mass constraint (38) ensures that there exists \(R = R(h) >0\), depending on \(\varOmega \), \((\varDelta x)\), \((\varDelta t)\), and \(\delta _3\) so that \(\rho ^h_n(x,t) \le R\) for all \((x,t) \in \varOmega \times [0,1]\). Therefore, the vector of coefficients \((\rho ^h_n)_{j,k}\) for this piecewise constant function satisfies \((\rho ^h_n)_{j,k} \in B_R(0) \subseteq {\mathbb {R}}^{N_x^d N_t}\). Consequently, by the Heine–Borel theorem, there exists a vector \((\rho ^h_*)_{j,k} \in {\mathbb {R}}^{N_x^d N_t}\) so that, up to a subsequence, \((\rho ^h_n)_{j,k} \rightarrow (\rho ^h_*)_{j,k}\). Therefore, if \(\rho ^h_*\) denotes the corresponding piecewise constant function, we have that, up to taking a subsequence which we again denote by \(\rho ^h_n(x,t)\), \(\lim _{n \rightarrow +\infty } \rho ^h_n(x,t) = \rho ^h_*(x,t)\) uniformly on \(\varOmega \times [0,1]\).
Next, we show that
If the energy satisfies (H2c), then \({\mathcal {E}}^h(\rho ^h_n(\cdot , 1)) \ge 0\) for all n, and the above inequality is immediate. If the energy satisfies (H2a) or (H2b), then this follows from the fact that U is bounded below on \([0, +\infty ]\), V and W are bounded below on \({\overline{\varOmega }}\) and \(\rho ^h_n(x,t) \rightarrow \rho ^h_*(x,t)\) uniformly.
Combining (43) and (44), we obtain
Furthermore, since \(0 \le \rho ^h_n(x,t) \le R\) for all \((x,t) \in \varOmega \times [0,1]\), \(n \in {\mathbb {N}}\), we have
Therefore, combining (45) and (46), we obtain that there exists \(R' = R'(h)>0\), depending on \(\varOmega \), \((\varDelta x)\), \((\varDelta t)\), and \(\delta _3\), so that \(m^h_n(x,t) \le R'\) for all \((x,t) \in \varOmega \times [0,1]\). Arguing as before, the Heine–Borel theorem ensures that, up to a subsequence, \(\lim _{n \rightarrow +\infty } m^h_n(x,t) = m^h_*(x,t)\) uniformly on \(\varOmega \times [0,1]\), for some piecewise constant function \(m^h_*(x,t)\). This gives the result.
\(\square \)
Convergence of Minimizers
We now prove that minimizers of the discrete dynamic JKO scheme, Problem \({1^h}\) converge to minimizers of Problem 1 as \(h \rightarrow 0\). We begin with the following lemma, showing that any \((\rho ^h, m^h) \in {\mathcal {C}}^h\) satisfies a weak form of the PDE constraint, in the limit as \(h \rightarrow 0\).
Lemma 1
(properties of \({\mathcal {C}}^h)\) Suppose that hypotheses (H1)–(H6) hold, and fix \((\rho ^h,m^h) \in {\mathcal {C}}^h\) so that \(\int _0^1 \int _\varOmega \varPhi (\rho ^h, m^h) <+\infty \) for each \(h>0\). Then \(\rho ^h(\cdot , 0) \rightarrow \rho _0\) in \(L^2(\varOmega )\), and there exist \(\rho \in {{\mathcal {P}}}(\varOmega \times [0,1])\) and \(\mu \in {{\mathcal {P}}}(\varOmega )\) so that, up to a subsequence, \(\rho ^h {\mathop {\rightharpoonup }\limits ^{*}}\rho \) and \(\rho ^h(\cdot , 1) {\mathop {\rightharpoonup }\limits ^{*}}\mu \). Furthermore, for any piecewise constant function \(f^h\) with \(\sup _{h>0} \Vert f^h \Vert _{L^2(\varOmega \times [0,1])} +\Vert f^h \Vert _{L^2(\partial \varOmega \times [0,1])} < +\infty \), we have
Proof
By hypothesis (H6), \( \rho ^h_0 \rightarrow \rho _0\) uniformly on \({\overline{\varOmega }}\). Likewise, the constraint on the initial data (38) and (H4) ensure \(\lim _{h \rightarrow 0} \Vert \rho ^h(\cdot , 0)  {\rho }_0^h \Vert _{L^2(\varOmega )} \le \lim _{h \rightarrow 0} \delta _4 = 0\). Thus, \(\rho ^h(\cdot , 0) \rightarrow \rho _0\) in \({L^2(\varOmega )}\).
We now turn to Eq. (47). By the PDE constraint and boundary conditions (37) and summation by parts, via hypotheses (H3),
where, in the last line, we use that (H4) ensures \(\delta _2, \delta _4 \rightarrow 0\) and the fact that \(f^h\) is bounded uniformly in h in \(L^2(\varOmega \times [0,1]\) and \(L^2(\partial \varOmega \times [0,1])\).
Next, we show that there exist \(\rho \in {{\mathcal {P}}}(\varOmega \times [0,1])\) and \(\mu \in {{\mathcal {P}}}(\varOmega )\) so that, up to a subsequence, \(\rho ^h {\mathop {\rightharpoonup }\limits ^{*}}\rho \) and \(\rho ^h(\cdot , 1) {\mathop {\rightharpoonup }\limits ^{*}}\mu \). By Hölder’s inequality and the mass constraint (38),
where, in the last line, we use that (H4) ensures \(\delta _3 \rightarrow 0\). Since hypothesis (H6) ensures \(\rho _0^h \rightarrow \rho _0\) uniformly and \(\int _\varOmega \rho _0 = 1\), we obtain,
Furthermore, since \(\int _0^1 \int _\varOmega \varPhi (\rho ^h,m^h) < +\infty \) for each \(h >0\), we must have \(\rho ^h \ge 0\) on \(\varOmega \times [0,1]\), and the above equation ensures \(\sup _{h >0} \Vert \rho ^h\Vert _{L^1(\varOmega \times [0,1])} < +\infty \). Thus, classical functional analysis results ensure there exists a subsequence that converges to some \(\rho \in {{\mathcal {P}}}(\varOmega \times [0,1])\) in the weak* topology (see, e.g., [20, Section 3]).
Finally, taking \(f^h \equiv 1\) in Eq. (47) gives,
Arguing as above, we obtain that, up to a further subsequence, \(\rho ^h(\cdot , 1) {\mathop {\rightharpoonup }\limits ^{*}}\mu (\cdot )\) for \(\mu \in {{\mathcal {P}}}(\varOmega )\). \(\square \)
We now prove that the discrete energies \({\mathcal {E}}^h\) are lower semicontinuous along weak* convergent sequences.
Proposition 4
(Lower semicontinuity of energies along weak* convergent sequences) Suppose that hypotheses (H1)–(H6) hold. Then, for any sequence of piecewise constant functions \(\rho ^h: \varOmega \rightarrow {\mathbb {R}}\) such that \(\rho ^h {\mathop {\rightharpoonup }\limits ^{*}}\rho \), we have \(\liminf _{h \rightarrow 0} {\mathcal {E}}^h(\rho ^h) \ge {\mathcal {E}}(\rho )\).
Proof
First, suppose the energy satisfies (H2a). Since the piecewise constant approximations \({\hat{V}}^h\) and \({\hat{W}}^h\) converge to V and W uniformly, for any sequence \(\rho ^h {\mathop {\rightharpoonup }\limits ^{*}}\rho \),
Furthermore, our assumptions on U guarantee that the internal energy term is lower semicontinuous with respect to weak* convergence [2, Remark 9.3.8], so \(\liminf _{h \rightarrow 0} \int _\varOmega U(\rho ^h(x)) \mathrm {d}x \ge \int _\varOmega U(\rho (x)) \mathrm {d}x\). Combining this with equations (4849) gives the result.
Next, suppose the energy satisfies (H2b). Since \(\rho _0>0\) on the compact set \({\overline{\varOmega }}\) and \(U'\) is uniformly continuous on \(\rho _0({\overline{\varOmega }}) \subset (0, +\infty )\), the fact that hypothesis (H6) ensures \({\hat{\rho }}_0^h \rightarrow \rho _0\) uniformly ensures \(U'({\hat{\rho }}_0^h) \rightarrow U'(\rho _0)\) uniformly. Therefore,
Likewise, since \({\hat{V}}^h\) and \({\hat{W}}^h\) converge to V and W uniformly, we also have
Combining these limits with the \(\liminf \) inequality for energies of the form (H2a) gives the result.
Finally, suppose the energy satisfies (H2c). Without loss of generality, we may assume that \(\liminf _{h \rightarrow 0} {\mathcal {G}}^h_{\rho _1}(\rho ^h) <+\infty \), so that up to a subsequence, \({\mathcal {G}}^h_{\rho _1}(\rho ^h) \equiv 0\) and \( \lim _{h \rightarrow 0} \Vert \rho ^h  \rho ^h_1 \Vert _{L^2(\varOmega )} = 0\). By uniqueness of limits, \(\rho = \rho _1\). Thus, since \({\mathcal {G}}^h_{\rho _1} \ge 0\), we have \(\liminf _{h \rightarrow 0} {\mathcal {G}}^h_{\rho _1}(\rho ^h) \ge 0 = {\mathcal {G}}_{\rho _1}(\rho ) \).
\(\square \)
We now apply Proposition 4 to prove the \(\varGamma \)convergence of Problem \({1^h}\) to Problem 1.
Theorem 2
(\(\varGamma \)convergence of discrete to continuum JKO) Suppose hypotheses (H1)–(H6) hold.

(a)
If \((\rho ^h, m^h) \in {\mathcal {C}}^h\) with \((\rho ^h, m^h) {\mathop {\rightharpoonup }\limits ^{*}}(\rho ,m)\), then \((\rho ,m) \in {\mathcal {C}}\) and
$$\begin{aligned}&\liminf _{h \rightarrow 0} \int _0^1 \int _\varOmega \varPhi ( \rho ^h, m^h) \mathrm {d}x \mathrm {d}t + 2 \tau {\mathcal {E}}^h(\rho ^h(\cdot , 1)) \\&\quad \ge \int _0^1 \int _\varOmega \varPhi ( \rho , m ) \mathrm {d}x \mathrm {d}t + 2 \tau {\mathcal {E}}(\rho (\cdot , 1)) . \end{aligned}$$ 
(b)
For any \((\rho , m) \in {\mathcal {C}}\) satisfying \(\rho \in C^2([0,1]; C^1({\overline{\varOmega }}))\), \(\rho >0\), and \(m \in C([0,1]; C^2({\overline{\varOmega }})\), there exists a sequence \(({\tilde{\rho }}^h, {\tilde{m}}^h) \in {\mathcal {C}}^h\) so that \(({\tilde{\rho }}^h ,{\tilde{m}}^h) \rightarrow (\rho ,m)\) uniformly and
$$\begin{aligned}&\limsup _{h \rightarrow 0} \int _0^1 \int _\varOmega \varPhi ( {\tilde{\rho }}^h, {\tilde{m}}^h) \mathrm {d}x \mathrm {d}t + 2 \tau {\mathcal {E}}^h({\tilde{\rho }}^h(\cdot , 1)) \\&\quad \le \int _0^1 \int _\varOmega \varPhi ( \rho , m ) \mathrm {d}x \mathrm {d}t + 2 \tau {\mathcal {E}}(\rho (\cdot , 1)). \end{aligned}$$
Proof
We first prove part (a). Suppose \((\rho ^h, m^h) \in {\mathcal {C}}^h\), with \(\rho ^h {\mathop {\rightharpoonup }\limits ^{*}}\rho \) and \(m^h {\mathop {\rightharpoonup }\limits ^{*}}m\). We begin by showing that the limit \((\rho ,m)\) belongs to \({\mathcal {C}}\). Fix \(f \in C^\infty (\varOmega \times [0,1])\) and let \(f^h\) be a pointwise piecewise constant approximation of f. (See Eq. (23).) By Lemma 1 and hypothesis (H3),
We conclude that \((\rho ,m)\) satisfies the PDE constraint in the sense of distributions (17), which gives \( {\rho } \in AC([0,1], {{\mathcal {P}}}(\varOmega ))\) [2, Lemma 8.1.2]. In particular, since \(\rho \) is continuous in time, we have that the \(\mu \) defined in Lemma 1 satisfies \(\mu = \rho (\cdot , 1)\).
We now consider the inequality in part (a). Since the integral functional \((\rho ,m) \mapsto \int _0^1 \int _\varOmega \varPhi (\rho ,m)\) is lower semicontinuous with respect to weak* convergence of measures [1, Example 2.36], we immediately obtain
This ensures \(m \in L^1([0,1],L^2(\rho ^{1}))\) and completes the proof that \((\rho ,m) \in {\mathcal {C}}\). Finally, since Lemma 1 ensures \(\rho ^h(\cdot , 1) {\mathop {\rightharpoonup }\limits ^{*}}\mu = \rho (\cdot , 1)\), applying Proposition 4 gives
which completes the proof of part (a).
We now turn to part (b). Let \(({\tilde{\rho }}^h, {\tilde{m}}^h) \in {\mathcal {C}}^h\) be the sequence constructed in Proposition 3, so \(({\tilde{\rho }}^h, {\tilde{m}}^h) \rightarrow (\rho ,m)\) uniformly. By inequality (39), there exists \(c>0\) so that \( {\rho }^h(x,t) \ge c \) for h sufficiently small. Therefore,
It remains to show that
First, suppose the energy satisfies either (H2a) or (H2b). By Eqs. (48)–(50), which hold for any weak* convergent sequence, and the fact that \(U'({\tilde{\rho }}^h(\cdot , 0)) \rightarrow U'(\rho (\cdot , 0) )\) uniformly, it suffices to show
Since \(U \in C([0, +\infty ])\), \(\rho (\cdot , 1) \in L^\infty ({{\mathbb {R}}^d})\), and \({\tilde{\rho }}^h(\cdot , 1) \rightarrow \rho (\cdot ,1)\) uniformly, \( U({\tilde{\rho }}^h(\cdot ,1)) \rightarrow \int U(\rho (\cdot ,1)) \) uniformly, which gives the result.
Finally, suppose the energy satisfies (H2c). Without loss of generality, suppose \({\mathcal {E}}(\rho (\cdot ,1)) = {\mathcal {G}}_{\rho _1}(\rho (\cdot , 1))< +\infty \). Inequality (40) ensures that, for h sufficiently small,
By definition of \({\mathcal {G}}_{\rho _1}^h\), this implies \({\mathcal {G}}^h_{\rho _1}({\tilde{\rho }}^h(\cdot , 1)) \equiv 0\). Therefore,
which gives the result. \(\square \)
We conclude this section by applying the \(\varGamma \)convergence proof from Theorem 2 to prove that any sequence of minimizers of the discrete Problem \({1^h}\) converges, up to a subsequence, to a minimizer of the continuum Problem 1.
Theorem 3
(Convergence of minimizers) Suppose that hypotheses (H1)–(H6) hold. Then, for any sequence of minimizers \((\rho ^h, m^h)\) of Problem \({1^h}\), we have, up to a subsequence, \(\rho ^h {\mathop {\rightharpoonup }\limits ^{*}}\rho \) and \(m^h {\mathop {\rightharpoonup }\limits ^{*}}m \), where \((\rho ,m)\) is a minimizer of Problem 1.
Note that, if the minimizer of the continuum Problem 1 is unique, then this theorem ensures that any sequence of minimizers of the discrete Problem \(1_{j,k}\) has a further subsequence that converges to this minimizer. Therefore, the sequence itself must converge to the unique minimizer of the continuum problem. (See Remark 2 for sufficient conditions that ensure the minimizer of the continuum problem is unique.)
Proof of Theorem 3
First, note that Lemma 1 ensures that there exist \(\rho \in {{\mathcal {P}}}(\varOmega \times [0,1])\) and \(\mu \in {{\mathcal {P}}}(\varOmega )\) so that, up to a subsequence, \(\rho ^h {\mathop {\rightharpoonup }\limits ^{*}}\rho \) and \(\rho ^h(\cdot , 1) {\mathop {\rightharpoonup }\limits ^{*}}\mu \). In order to prove an analogous weak* compactness result for \(m^h\) we first prove that, up to a subsequence,
By (H6), there exists a minimizer \(({\bar{\rho }}, {\bar{m}})\) of the continuum Problem 1 satisfying \( {\bar{\rho }} \in C^2([0,1]; C^1({\overline{\varOmega }}))\), \( {\bar{\rho }} >0\), and \( {\bar{m}} \in C^1([0,1]; C^2({\overline{\varOmega }}))\). Comparing the recovery sequence \(( {\tilde{\rho }}^h, {\tilde{m}}^h) \in {\mathcal {C}}^h\) from Theorem 2(b) for \(({\bar{\rho }}, \bar{ m})\) with the discrete minimizer \((\rho ^h,m^h) \in {\mathcal {C}}^h\), we obtain
Furthermore, Proposition 4 ensures that
which is bounded below by some constant, since hypothesis (H2c) ensures \({\mathcal {E}}\ge 0\) and hypotheses (H2a) or (H2b) ensures \({\mathcal {E}}(\mu ) >\infty \), since U, V, and W are bounded below and \(U'\) is bounded below on the range of the strictly positive density \(\rho _0\). Therefore, up to a subsequence, we obtain (51).
We now deduce weak* convergence of \(m^h\). By Hölder’s inequality, the fact that \(\rho ^h {\mathop {\rightharpoonup }\limits ^{*}}\rho \), and the definition of \(\varPhi \), we have
Thus, up to another subsequence, \(m^h {\mathop {\rightharpoonup }\limits ^{*}}m\) on \(\varOmega \times [0,1]\).
It remains to show that the limit \((\rho ,m)\) of \((\rho ^h, m^h)\) is a minimizer of Problem 1. By Theorem 2, part (a), we have \((\rho , m) \in {\mathcal {C}}\) and
Combining this with inequality (52) above, we conclude that \((\rho ,m) \in {\mathcal {C}}\) is also a minimizer of Problem 1, which completes the proof. \(\square \)
Numerical Results
In this section, we provide several examples demonstrating the efficiency and accuracy of our algorithms. We begin by using Algorithm 1 to compute Wasserstein geodesics between given source and target measures, and we then turn to Algorithm 3 to compute solutions of nonlinear gradient flows. In the following simulations, we take our computational domain \(\varOmega \) to be a square, imposing the no flux boundary conditions on m dimension by dimension. In practice, unless otherwise specified, we always impose the discrete PDE constraint via the Crank–Nicolson finite difference operators (28), and we choose \(\epsilon _1 = \epsilon _2 = \epsilon \) in the stopping criteria to be \(10^{5}\) unless otherwise specified. For the relaxation of the constraints in (30) and (31), we choose \(\delta _1 = \delta _2 = \delta _4 = \delta _5 = \delta \), and \(\delta _3\) differently, as specified in each example.
Wasserstein Geodesics
As described in Remark 1, a particular case of our numerical scheme provides a method for computing the Wasserstein geodesic between two probability densities. We begin by computing the Wasserstein geodesic between rescaled Gaussians in one dimension:
The target measure is simply a translation and dilation of the initial measure, \(\rho _0(x)=(0.5) g_{\mu _0, \theta _0 }(x)\) and \(\rho _1(x) = (0.5) g_{\mu _1, \theta _1}(x)\). The optimal transport map T(x) from \(\rho _0(x)\) to \(\rho _1(x)\) is given explicitly by^{Footnote 1}
Rewriting Eq. (15) for the geodesic \(\rho (x,t)\) and velocity v(x, t) induced by this transport map, via the definition of the push forward, we obtain
In Fig. 2, we apply Algorithm 1 to compute the Wasserstein geodesic \(\rho (x,t)\) between the initial and target densities (53), with means and variances \(\mu _0 = 1.5, \theta _0 = 0.3, \mu _1 = 1.5,\) and \(\theta _1 = 0.6\). On the left, we plot the evolution of the geodesic at various times. On the right, we plot the \(\ell ^1\) error of the densities, momenta, and Wasserstein distance as a function of the number of iterations, l, observing a rate of convergence of order \({\mathcal {O}}(1/l)\) (dashed black line). Here, the error is defined as
In Fig. 3, we illustrate how choosing the optimal scaling relationship between the relaxation parameter \(\delta \) and the spatial and temporal discretizations \((\varDelta x), (\varDelta t)\) allows the method to converge in fewer iterations. We contrast the choices \(\delta = (\varDelta x)^2 \), \(\delta = (\varDelta x)^3\), and \(\delta = 10^{8}\), for the example of the Wasserstein distance between geodesics, illustrated in Fig. 2, where the outer time step \(\tau = 1\), \((\varDelta x) \sim (\varDelta t)\), and \(\delta _3 = \delta \). Based on the order of accuracy of our Crank–Nicolson approximation of the PDE constraint, we expect that \(\delta = (\varDelta x)^2\) should give the optimal balance between accuracy and computational efficiency. (See Remark 3.)
In the plot on the left, we observe that for all choices of \(\delta \), the error between the numerical solution \(\rho ^{(l)}\) and the exact solution \(\rho ^*\) is identical, with the error saturating after \(10^5\) iterations. Thus, all three choices of \(\delta \) provide the same level of accuracy, and the best way to distinguish between them is to identify which choice of \(\delta \) causes the stopping criteria (33 and 34) to be satisfied in the least number of excess iterations after \(10^5\). The behavior of two key stopping criteria is shown in the plot on the right— the PDE constraint \(\Vert A u^{(l)}  b \Vert \) and the convergence monitor for the relative error of the dual variables \(\Vert \phi ^{(l)}  \phi ^{(l1)} \Vert /\Vert \phi ^{(l)} \Vert \). Of the four stopping criteria we consider (PDE constraint and three convergence monitors), these two are the last to be satisfied in all of the numerical simulations contained in this manuscript, hence these determine when our method terminates its iterations.
For the case of \(\delta = (\varDelta x)^2\) (red lines), we indeed observe that the PDE constraint (solid line) satisfies its stopping criteria (dashed line) by \(10^4\) iterations and the dual variables (starred line) satisfy their stopping criteria of \(10^{5}\) by \(10^5\) iterations. On the other hand, for the cases of \(\delta = (\varDelta x)^3\) (blue lines) and \(\delta = 10^{8}\) (green lines), we see that while the dual variables (starred lines) have satisfied their stopping criteria of \(10^{5}\) by \(10^4\) iterations, the PDE constraints (solid lines) do not satisfy their stopping criteria (dashed lines) until later—it takes more than \(10^5\) iterations for \(\delta = (\varDelta x)^3\) and more than \(10^7\) iterations for \(\delta = 10^{8}\). This example shows that choosing \(\delta \) without respecting the order of accuracy of the finite difference approximation in the PDE constraint, one wastes computational effort without improving the accuracy of the numerical solution.
Next, we compute Wasserstein geodesics between initial and target measures when neither are smooth nor strictly positive. In Fig. 4, we compute the geodesic between a profile of the British Parliament and its translation. We do not observe convergence to the exact geodesic, which would be a constant speed translation, and instead observe degradation of the parliamentary building at intermediate times, due to numerical smoothing. Similarly, in Fig. 5, we compute the geodesic between PacMan and a ghost, visualized as characteristic functions on sets in two dimensions. Again, we observe numerical smoothing around the edges of discontinuity. Both of these examples offer a numerical justification for the smoothness assumption we impose in our main convergence Theorem 3. In the absence of such smoothness, it appears that the method does not converge. Similar smoothness assumptions are required in the other numerical methods for Wasserstein geodesics for which rigorous convergence has been analyzed, including Monge Ampéretype methods [11, 68].
Wasserstein Gradient Flows: One Dimension
In this and the next section, we consider several examples of Wasserstein gradient flows, including some which have appeared in previous numerical studies [3, 29, 49, 99], to demonstrate the performance of our method for simulating solutions of nonlinear partial differential equations.
Porous Medium Equation
The porous medium equation
is the Wasserstein gradient flow of the energy (4), with \({U}(\rho ) = \frac{1}{m1} \rho ^m\) and \(V = W = 0\,\). A wellknown family of exact solutions is given by Barenblatt profiles (c.f. [104]), which are densities of the form
We now apply Algorithm 3 to simulate solutions of the \(m=2\) porous medium equation with Barenblatt initial data, \(t_0 = 10^{3}\) and \(C = (3/16)^{1/3}\). Here, the Euler discretization (27) is used. In Fig. 6, we plot the evolution of the numerical solution over time, and we observe good agreement with the exact solution of the form (56), which is displayed in dashed curve.
In Fig. 7, we analyze how the rate of convergence depends on the inner time step \(\varDelta t\), the spatial discretization \(\varDelta x\), and outer time step of the JKO scheme \(\tau \). We compute the error between the exact solution and the numerical solution in the \(\ell ^1\) norm, i.e.,
In the plot on the left of Fig. 7, we consider two fixed values of \(\tau \) and examine how the error depends on \(N_t\) and \(N_x = 10 N_t\). In both cases, the error quickly saturates, indicating that the outer time step \(\tau \) dominates the error. In the plot on the right, we fix \(N_t = 20\) and \(N_x = 200\) and consider how the error depends on \(\tau \). We observe slightly less than firstorder convergence in \(\tau \) for the classical JKO scheme (\({\mathcal {E}}^h = {\mathcal {F}}^h\)) and higherorder convergence for the Crank–Nicolson inspired scheme (\({\mathcal {E}}^h = {{\mathcal {H}}}^h\)). We believe these slower rates of convergence are due to the lower regularity of solutions to the porous medium equation with compactly supported initial data, which are merely Hölder continuous.
In Fig. 8, we consider the case of smooth, strictly positive initial data, given by a Gaussian with mean \(\mu =0\) and variance \(\theta = 0.2\) (53), in which case solutions of the PDE remain smooth over time. On the left, we show the evolution of the solutions over time, and on the right, we illustrate that the classical JKO scheme indeed attains firstorder accuracy, though the Crank–Nicolson inspired scheme is still less than secondorder accurate.
Nonlinear Fokker–Planck Equation
We now consider a nonlinear variant of the Fokker–Planck equation,
inspired by the porous medium equation described in the previous section (55). When V is a confining drift potential, all solutions approach the unique steady state
where \(C>0\) depends on the mass of the initial data, so that \(\int \rho _\infty \mathrm {d}x = \int \rho _0 \mathrm {d}x \), see [44, 51].
In Fig. 9, we simulate the evolution of solutions to the nonlinear Fokker–Planck equation with \(V(x) = x^2 \), \(m=2\), and initial data given by a Gaussian with mean \(\mu =0\) and variance \(\theta = 0.2\) (53). On the left, we plot the evolution of the density \(\rho (x,t)\) toward the steady state \(\rho _\infty (x)\). On the right, we compute the rate of decay of the corresponding energy (4) as a function of time, observing exponential decay as the solution approaches equilibrium. In this way, our method recovers analytic results on convergence to equilibrium of Carrillo, DiFrancesco, and Toscani [35, 51].
In Fig. 10, we analyze how the rate of convergence depends on the outer time step \(\tau \) of the scheme, for sufficiently small inner time step \(\varDelta t = 0.1\) and spatial discretization \(\varDelta x = 0.04\). We compute the error
We observe slightly faster than firstorder convergence for the traditional JKO scheme (\({\mathcal {E}}^h = {\mathcal {F}}^h\)) and higherorder convergence for the new Crank–Nicolson inspired scheme (\({\mathcal {E}}^h = {{\mathcal {H}}}^h\)). We believe this improvement in the rate of convergence as compared to our previous example for the porous medium equation, Fig. 7, is due to the rapid convergence to the steady state \(\rho _\infty \).
Aggregation Equation
In this section, we consider a nonlocal partial differential equation of Wasserstein gradient flow type, known as the aggregation equation
In recent years, there has been significant interest in interaction kernels W that are repulsive at short length scales and attractive at longer distances, such as the kernel with logarithmic repulsion and quadratic attraction
For this particular choice of W, there exists a unique equilibrium profile [38], given by
In Fig. 11, we simulate the solution to the aggregation equation with Gaussian initial data (53) with mean \(\mu =0\) and variance \(\theta =1\), analyzing convergence to equilibrium. On the left, we plot the evolution of the density \(\rho (x,t)\) at varying times, observing convergence to the equilibrium profile \(\rho _\infty (x)\). On the right, we compute the rate of the decay of the energy as a function of time, observing exponential decay as obtained by Carrillo, Ferreira, and Precioso [38] with a slightly slower numerical rate.
As the interaction potential W defined in Eq. (59) is not continuous, we make the following modifications to our discretization of the JKO scheme. To avoid evaluation of W(x) at \(x=0\), we set W(0) to equal the average value of W on the cell of width 2h centered at 0, i.e., \(W(0) = \frac{1}{2h} \int _{h}^{h} W(x) \mathrm {d}x\), where we apply GaussLegendre quadrature rule with four grid points to evaluate the integral. In addition to modifying the interaction kernel in this way, we also introduce an artificial diffusion term of the form \(\epsilon \partial _x( \rho \partial _x \rho )\) with \(\epsilon =1.6\times (\varDelta x)^2\) to the righthand side of (58), to avoid the possible overshoot at the boundary. (See also [29] for a similar treatment.)
Wasserstein Gradient Flows: Two Dimensions
In the following, we consider a few gradient flows in two dimensions. Here, the constraint relaxation parameters are always chosen as \(\delta = \delta _3 = 10^{6}\).
Aggregation Equation
We now continue our study of the aggregation equation (58) with repulsive–attractive interaction potentials in two dimensions, with interaction kernels of the form
using the convention that \(\frac{x^0}{0} = \text {ln}(x)\). It is well known that the repulsion near the origin of the potential determines the dimension of the support of the steady state measure, see [4, 34]. In the following simulations, we take the initial data to be a gaussian (53) with mean \(\mu =0\) and variance \(\theta =0.25\).
In Fig. 12, we simulate the evolution of solutions to the aggregation equation, with \(a=4\) and \(b=2\) in the interaction potential W, defined in Eq. (60). We observe that the solution concentrates on a Dirac ring with radius 0.5 centered at the origin, recovering analytical results on the existence of a stable Dirac ring equilibrium for these values of a and b [5, 13].
In Fig. 13, we simulate the evolution of solutions to the aggregation equation, with \(a = 2\) and \(b = 0\). We observe that the solution converges to a characteristic function on the disk of radius 1, centered at the origin, recovering analytic results on solutions of the aggregation equation with Newtonian repulsion [14, 34, 65]. We follow the same strategy described in Sect. 4.2.3 with \(\epsilon =1.6\times (\varDelta x^2+\varDelta y^2)\) to overcome the singularity of the interaction potential at \(x=0\) and potential overshooting.
Aggregation Drift Equation
Next, we compute solutions of aggregationdrift equations
where \(W(x) = x^2/2  \ln (x)\) and \(V(x) =  \frac{\alpha }{\beta } \ln (x)\). As shown in several analytical and numerical results [29, 42, 53], the steady state is a characteristic function on a torus or “milling profile”, with inner and outer radius given by
In Fig. 14, we simulate the long time behavior of a solution of the aggregationdrift equation with \(\alpha =1\) and \(\beta =4\) and Gaussian initial data (53), \(\mu = 0\), \(\theta = 0.25\), as well as the rate of the decay of the entropy as the solution converges to equilibrium. In Fig. (15), we plot the evolution of the density from a nonradially symmetric initial data, given by five Gaussians to the same equilibrium profile. We follow the same strategy described in Sect. 4.2.3 to overcome the singularity of the interaction potential at \(x=0\) and potential overshooting (\(\epsilon = 2\times (\varDelta x^2+ \varDelta y^2)\) in Fig. 14 and \(\epsilon =2.6\times (\varDelta x^2+ \varDelta y^2)\) in Fig. 15.)
Aggregation–Diffusion Equation
We close by simulating several examples of aggregation–diffusion equations
In recent years, there has been significant activity studying equations of this form, both analytically and numerically. When the interaction kernel W is attractive, the competition between the nonlocal aggregation \( \nabla \cdot (\rho \nabla W*\rho )\) and nonlinear diffusion \(\nu \varDelta \rho ^m\) causes solutions to blow up in certain regimes and exist globally in time in others, see for example [18, 19, 25, 26, 41] and the survey [32]. With fixed m, and in the presence of nonlocal interaction, the equation has a unique steady state which is radially decreasing up to a translation [15, 40].
In Fig. 16, we simulate a solution of the aggregation–diffusion equation with \(W(x) =  e^{x^2}/\pi \), \(\nu =0.1\), and \(m =3\), and initial data given by a rescaled characteristic function on the square,
Diffusion dominates both the short and long ranges, and the medium range aggregation leads to the formation of four bumps, which ultimately approach a single bump equilibrium. (See also [29].)
In Fig. 17, we simulate solutions of the Keller–Segel equation, which is an aggregation–diffusion equation (61) with a Newtonian interaction kernel, i.e., \(W(x)=\frac{1}{2\pi }\text {ln}(x)\) in two dimensions for \(\nu = 1\) and both \(m=1\) and \(m=2\), illustrating the role of the diffusion exponent in blowup or global existence of solutions. We choose the initial data to be given by a rescaled gaussian, obtained by multiplying equation (53) by a mass \(M= 9 \pi \), with mean \(\mu = 0\) and variance \(\theta = 0.5\). On the left, we take \(m =2\) and simulate the steady state of the Keller–Segel equation, which is a single bump. On the right, we simulate the longtime behavior of solutions for \(m=1\), in which case we are in the blow up regime. Indeed, at time \(t=2\), we observe the formation of a blowup profile, with the solution becoming sharply peaked at the origin.
In Fig. 18, we again simulate solutions of the Keller–Segel equation with \(m=2\), but in this case we take the initial data to be given by three localized bumps (Gaussian rings, i.e., the radial cut of the ring is a Gaussian with a center on the circle.) We observe a twostage evolution in which the each of the bumps converges to a localized quasistationary state, and then interact and merge into one single bump in the long time limit. This is a manifestation of the typical metastability phenomena, which is likely present in the majority of the diffusion dominated Keller–Segel models [24, 29, 32].
Notes
 1.
One way to see that this is the unique optimal transport map from \(\rho _0\) to \(\rho _1\) is to note that \(T \# \rho _0 = \rho _1\) and T(x) is the gradient of a convex function; see, for example, [2, Section 6.2.3].
References
 1.
L. Ambrosio, N. Fusco, and D. Pallara, Functions of bounded variation and free discontinuity problems, vol. 254, Clarendon Press Oxford, 2000.
 2.
L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows: in metric spaces and in the space of probability measures, Springer Science & Business Media, 2008.
 3.
R. Bailo, J. A. Carrillo, and J. Hu, Fully discrete positivitypreserving and energydissipating schemes for aggregation–diffusion equations with a gradientflow structure, Commun. Math. Sci., 18 (2020), pp. 1259–1303.
 4.
D. Balagué, J. A. Carrillo, T. Laurent, and G. Raoul, Dimensionality of local minimizers of the interaction energy, Arch. Ration. Mech. Anal., 209 (2013), pp. 1055–1088.
 5.
D. Balagué, J. A. Carrillo, T. Laurent, and G. Raoul, Nonlocal interactions by repulsive–attractive potentials: radial ins/stability, Phys. D, 260 (2013), pp. 5–25.
 6.
A. B. T. Barbaro, J. A. Cañizo, J. A. Carrillo, and P. Degond, Phase transitions in a kinetic flocking model of Cucker–Smale type, Multiscale Model. Simul., 14 (2016), pp. 1063–1088.
 7.
J.D. Benamou and Y. Brenier, A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem, Numer. Math., 84 (2000), pp. 375–393.
 8.
J.D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyre, Iterative Bregman projections for regularized transportation problems, SIAM. J. Sci. Comput., 37 (2015), pp. A111–A1138.
 9.
J.D. Benamou, G. Carlier, and M. Laborde, An augmented Lagrangian approach to Wasserstein gradient flows and applications, ESAIM: PROCEEDINGS AND SURVEYS, 54 (2016), pp. 1–17.
 10.
J.D. Benamou, G. Carlier, Q. Mérigot, and E. Oudet, Discretization of functionals involving the Monge–Ampère operator, Numer. Math., 134 (2016), pp. 611–636.
 11.
J.D. Benamou, B. Froese, and A. Oberman, Numerical solution of the optimal transportation problem using the Monge–Ampère equation, J. Comput. Phys., 260 (2014), pp. 107–126.
 12.
D. Benedetto, E. Caglioti, and M. Pulvirenti, A kinetic equation for granular media, RAIRO Modél. Math. Anal. Numér., 31 (1997), pp. 615–641.
 13.
A. L. Bertozzi, T. Kolokolnikov, H. Sun, D. Uminsky, and J. von Brecht, Ring patterns and their bifurcations in a nonlocal model of biological swarms, Commun. Math. Sci., 13 (2015), pp. 955–985.
 14.
A. L. Bertozzi, T. Laurent, and F. Léger, Aggregation and spreading via the Newtonian potential: the dynamics of patch solutions, Math. Models Methods Appl. Sci., 22 (2012), pp. 1140005, 39.
 15.
S. Bian and J.G. Liu, Dynamic and steady states for multidimensional Keller–Segel model with diffusion exponent\(m>0\), Comm. Math. Phys., 323 (2013), pp. 1017–1070.
 16.
A. Blanchet, V. Calvez, and J. A. Carrillo, Convergence of the masstransport steepest descent scheme for the subcritical Patlak–Keller–Segel model, SIAM J. Numer. Anal., 46 (2008), pp. 691–721.
 17.
A. Blanchet, E. A. Carlen, and J. A. Carrillo, Functional inequalities, thick tails and asymptotics for the critical mass Patlak–Keller–Segel model, Journal of Functional Analysis, 262 (2012), pp. 2142–2230.
 18.
A. Blanchet, J. A. Carrillo, and P. Laurençot, Critical mass for a Patlak–Keller–Segel model with degenerate diffusion in higher dimensions, Calc. Var. Partial Differential Equations, 35 (2009), pp. 133–168.
 19.
A. Blanchet, J. Dolbeault, and B. Perthame, Two dimensional Keller–Segel model in\({\mathbb{RR}}^2\): optimal critical mass and qualitative properties of the solution, Electron. J. Differential Equations, 2006 (2006), pp. 1–33 (electronic).
 20.
H. Brezis, Functional analysis, Sobolev spaces and partial differential equations, Springer Science & Business Media, 2010.
 21.
L. BriceñoArias, D. Kalise, Z. Kobeissi, M. Laurière, A. M. Gonzalez, and F. J. Silva, On the implementation of a primaldual algorithm for second order timedependent mean field games with local couplings, ESAIM: Proceedings and Surveys, 65 (2019), pp. 330–348.
 22.
L. M. BricenoArias, D. Kalise, and F. J. Silva, Proximal methods for stationary mean field games with local couplings, SIAM Journal on Control and Optimization, 56 (2018), pp. 801–836.
 23.
A. Burchard, R. Choksi, and I. Topaloglu, Nonlocal shape optimization via interactions of attractive and repulsive potentials, Indiana Univ. Math. J., 67 (2018), pp. 375–395.
 24.
M. Burger, R. Fetecau, and Y. Huang, Stationary states and asymptotic behavior of aggregation models with nonlinear local repulsion, SIAM J. Appl. Dyn. Syst., 13 (2014), pp. 397–424.
 25.
V. Calvez, J. A. Carrillo, and F. Hoffmann, Equilibria of homogeneous functionals in the faircompetition regime, Nonlinear Anal., 159 (2017), pp. 85–128.
 26.
V. Calvez, J. A. Carrillo, and F. Hoffmann, The geometry of diffusing and selfattracting particles in a onedimensional faircompetition regime, 2186 (2017), pp. 1–71.
 27.
M. CamposPinto, J. A. Carrillo, F. Charles, and Y.P. Choi, Convergence of a linearly transformed particle method for aggregation equations, to appear in Numer. Math., (2018).
 28.
G. Carlier, V. Duval, G. Peyré, and B. Schmitzer, Convergence of entropic schemes for optimal transport and gradient flows, SIAM Journal on Mathematical Analysis, 49 (2017), pp. 1385–1418.
 29.
J. A. Carrillo, A. Chertock, and Y. Huang, A finitevolume method for nonlinear nonlocal equations with a gradient flow structure, Commun. Comput. Phys., 17 (2015), pp. 233–258.
 30.
J. A. Carrillo, Y.P. Choi, and M. Hauray, The derivation of swarming models: meanfield limit and Wasserstein distances, in Collective Dynamics from Bacteria to Crowds: An Excursion Through Modeling, Analysis and Simulation, vol. 553 of CISM Courses and Lect., Springer Vienna, 2014, pp. 1–46.
 31.
J. A. Carrillo, K. Craig, and F. S. Patacchini, A blob method for diffusion, Calculus of Variations and Partial Differential Equations, 58 (2019), pp. 1–53.
 32.
J. A. Carrillo, K. Craig, and Y. Yao, Aggregation–diffusion equations: dynamics, asymptotics, and singular limits, (2019), pp. 65–108.
 33.
J. A. Carrillo, M. G. Delgadino, L. Desvillettes, and J. Wu, The landau equation as a gradient flow, arXiv preprint arXiv:2007.08591, (2020).
 34.
J. A. Carrillo, M. G. Delgadino, and A. Mellet, Regularity of local minimizers of the interaction energy via obstacle problems, Comm. Math. Phys., 343 (2016), pp. 747–781.
 35.
J. A. Carrillo, M. Di Francesco, and G. Toscani, Strict contractivity of the 2Wasserstein distance for the porous medium equation by masscentering, Proc. Amer. Math. Soc., 135 (2007), pp. 353–363.
 36.
J. A. Carrillo, M. DiFrancesco, A. Figalli, T. Laurent, and D. Slepčev, Globalintime weak measure solutions and finitetime aggregation for nonlocal interaction equations, Duke Math. J., 156 (2011), pp. 229–271.
 37.
J. A. Carrillo, B. Duering, D. Matthes, and D. S. McCormick, A Lagrangian scheme for the solution of nonlinear diffusion equations using moving simplex meshes, to appear in J. Sci. Comp., (2018).
 38.
J. A. Carrillo, L. C. F. Ferreira, and J. C. Precioso, A masstransportation approach to a one dimensional fluid mechanics model with nonlocal velocity, Adv. Math., 231 (2012), pp. 306–327.
 39.
J. A. Carrillo, M. Fornasier, G. Toscani, and F. Vecil, Particle, kinetic, and hydrodynamic models of swarming, Modeling and Simulation in Science, Engineering and Technology, (2010), pp. 297–336.
 40.
J. A. Carrillo, S. Hittmeir, B. Volzone, and Y. Yao, Nonlinear aggregation–diffusion equations: radial symmetry and long time asymptotics, Invent. Math., 218 (2019), pp. 889–977.
 41.
J. A. Carrillo, F. Hoffmann, E. Mainini, and B. Volzone, Ground states in the diffusiondominated regime, Calc. Var. Partial Differ. Equ., 57 (2018), p. 127.
 42.
J. A. Carrillo, Y. Huang, and S. Martin, Explicit flock solutions for Quasi–Morse potentials, European J. Appl. Math., 25 (2014), pp. 553–578.
 43.
J. A. Carrillo, Y. Huang, F. S. Patacchini, and G. Wolansky, Numerical study of a particle method for gradient flows, Kinet. Relat. Models, 10 (2017), pp. 613–641.
 44.
J. A. Carrillo, A. Jüngel, P. A. Markowich, G. Toscani, and A. Unterreiter, Entropy dissipation methods for degenerate parabolic problems and generalized Sobolev inequalities, Monatsh. Math., 133 (2001), pp. 1–82.
 45.
J. A. Carrillo, R. McCann, and C. Villani, Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates, Revista Matematica Iberoamericana, 19 (2003), pp. 971–1018.
 46.
J. A. Carrillo, R. J. McCann, and C. Villani, Contractions in the 2Wasserstein length space and thermalization of granular media, Arch. Ration. Mech. Anal., 179 (2006), pp. 217–263.
 47.
J. A. Carrillo and J. Moll, Numerical simulation of diffusive and aggregation phenomena in nonlinear continuity equations by evolving diffeomorphisms, SIAM J. Sci. Comput., 31 (2009), pp. 4305–4329.
 48.
J. A. Carrillo, F. S. Patacchini, P. Sternberg, and G. Wolansky, Convergence of a particle method for diffusive gradient flows in one dimension, SIAM J. Math. Anal., 48 (2016), pp. 3708–3741.
 49.
J. A. Carrillo, H. Ranetbauer, and M. Wolfram, Numerical simulation of nonlinear continuity equations by evolving diffeomorphisms, J. Comput. Phys., 326 (2016), pp. 186–202.
 50.
J.A. Carrillo and F. Santambrogio, \(l^\infty \)estimates for the JKO scheme in parabolic–elliptic Keller–Segel systems, Quarterly of Applied Mathematics, 76 (2018), pp. 515–530.
 51.
J. A. Carrillo and G. Toscani, Asymptotic\(L^1\)decay of solutions of the porous medium equation to selfsimilarity, Indiana Univ. Math. J., 49 (2000), pp. 113–142.
 52.
A. Chambolle and T. Pock, A firstorder primaldual algorithm for convex problems with application to imaging, J. Math. Imaging Vis., 40 (2011), pp. 120–145.
 53.
Y. Chen and T. Kolokolnikov, A minimal model of predator–swarm interactions, Journal of The Royal Society Interface, 11 (2014).
 54.
A. Chertock and A. Kurganov, A secondorder positivity preserving centralupwind scheme for chemotaxis and haptotaxis models, Numer. Math., 111 (2008), pp. 169–205.
 55.
L. Chizat, G. Peyré, B. Schmitzer, and F.X. Vialard, Scaling algorithms for unbalanced optimal transport problems, Math. Comp., 87 (2018), pp. 2563–2609.
 56.
K. Craig, Nonconvex gradient flow in the Wasserstein metric and applications to constrained nonlocal interactions, Proceedings of the London Mathematical Society, 114 (2017), pp. 60–102.
 57.
K. Craig and A. Bertozzi, A blob method for the aggregation equation, Math. Comp., 85 (2016), pp. 1681–1717.
 58.
K. Craig, I. Kim, and Y. Yao, Congested aggregation via newtonian interaction, Arch. Ration. Mech. Anal., to appear (2017).
 59.
K. Craig and I. Topaloglu, Convergence of regularized nonlocal interaction energies, SIAM Journal on Mathematical Analysis, 48 (2016), pp. 34–60.
 60.
K. Craig and I. Topaloglu, Aggregationdiffusion to constrained interaction: minimizers & gradient flows in the slow diffusion limit, Ann. Inst. H. Poincaré Anal. Non Linéaire, 37 (2020), pp. 239–279.
 61.
M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, in Advances in neural information processing systems, 2013, pp. 2292–2300.
 62.
D. Davis and W. Yin, A threeoperator splitting scheme and its optimization applications, Setvalued and variational analysis, 25 (2017), pp. 829–858.
 63.
G. De Philippis, A. R. Mészáros, F. Santambrogio, and B. Velichkov, BV estimates in optimal transportation and applications, Archive for Rational Mechanics and Analysis, 219 (2016), pp. 829–860.
 64.
M. Erbar, A gradient flow approach to the Boltzmann equation, arXiv preprint arXiv:1603.00540, (2016).
 65.
R. C. Fetecau, Y. Huang, and T. Kolokolnikov, Swarm dynamics and equilibria for a nonlocal aggregation model, Nonlinearity, 24 (2011), pp. 2681–2716.
 66.
F. Filbet, A finite volume scheme for the Patlak–Keller–Segel chemotaxis model, Numer. Math., 104 (2006), pp. 457–488.
 67.
R. L. Frank and E. H. Lieb, A” liquidsolid” phase transition in a simple model for swarming, based on the” no flatspots” theorem for subharmonic functions, Indiana University Mathematics Journal, (2018), pp. 1547–1569.
 68.
W. Gangbo and R. McCann, The geometry of optimal transportation, Acta. Math., 177 (1996), pp. 113–161.
 69.
L. Gosse and G. Toscani, Lagrangian numerical approximations to onedimensional convolution–diffusion equations, SIAM J. Sci. Comput., 28 (2006), pp. 1203–1227.
 70.
B. F. Hamfeldt, Viscosity subsolutions of the second boundary value problem for the Monge–Ampére equation, arXiv preprint arXiv:1807.04216, (2018).
 71.
D. Holm and V. Putkaradze, Aggregation of finitesize particle with variable mobility, Phys. Rev. Lett., (2005), p. 95: 226106.
 72.
H. Huang and J.G. Liu, Error estimate of a random particle blob method for the Keller–Segel equation, Math. Comp., 86 (2017), pp. 2719–2744.
 73.
P.E. Jabin, A review of the mean field limits for Vlasov equations, Kinet. Relat. Models, 7 (2014), pp. 661–711.
 74.
P.E. Jabin and Z. Wang, Mean field limit for stochastic particle systems, in Active Particles. Vol. 1. Advances in Theory, Models, and Applications, Model. Simul. Sci. Eng. Technol., Birkhäuser/Springer, Cham, 2017, pp. 379–402.
 75.
R. Jordan, D. Kinderlehrer, and F. Otto, The variational formulation of the Fokker–Plank equation, SIAM. J. Math. Anal., 29 (1998), pp. 1–17.
 76.
E. Keller and L. Segel, Traveling bands of chemotactic bacteria: a theoretical analysis, J. Theoret. Biol., 30 (1971), pp. 6420–6437.
 77.
T. Kolokolnikov, J. A. Carrillo, A. Bertozzi, R. Fetecau, and M. Lewis, Emergent behavior in multiparticle systems with nonlocal interactions, Phys. D, 260 (2013), pp. 1–4.
 78.
L. Laguzet, High order variational numerical schemes with application to NashMFG vaccination games, Ric. Mat., 67 (2018), pp. 247–269.
 79.
G. Legendre and G. Turinici, Secondorder in time schemes for gradient flows in Wasserstein and geodesic metric spaces, C. R. Math. Acad. Sci. Paris, 355 (2017), pp. 345–353.
 80.
W. Li, J. Lu, and L. Wang, Fisher information regularization schemes for wasserstein gradient flows, Journal of Computational Physics, (2020), p. 109449.
 81.
W. Li, S. Osher, and W. Gangbo, A fast algorithm for earth mover’s distance based on optimal transport and\(l_1\)type regularization, arXiv:1609.07092v3, (preprint).
 82.
W. Li, P. Yin, and S. Osher, Computations of optimal transport distance with fisher information regularization, Journal of Scientific Computing, 75 (2018), pp. 1581–1595.
 83.
J.G. Liu, M. Tang, L. Wang, and Z. Zhou, An accurate front capturing scheme for tumor growth models with a free boundary limit, Journal of Computational Physics, 364 (2018), pp. 73 – 94.
 84.
J.G. Liu, M. Tang, L. Wang, and Z. Zhou, Analysis and computation of some tumor growth models with nutrient: from cell density models to free boundary dynamics, DCDSB, accepted (2018).
 85.
J.G. Liu, L. Wang, and Z. Zhou, Positivitypreserving and asymptotic preserving method for 2d KellerSegal equations, Mathematics of Computation, 87 (2018), pp. 1165–1189.
 86.
J.G. Liu and R. Yang, A random particle blob method for the Keller–Segel equation and convergence analysis, Math. Comp., 86 (2017), pp. 725–745.
 87.
J. Maas, Gradient flows of the entropy for finite markov chains, Journal of Functional Analysis, 261 (2011), pp. 2250–2292.
 88.
D. Matthes and H. Osberger, A convergent Lagrangian discretization for a nonlinear fourthorder equation, Found. Comput. Math., 17 (2017), pp. 73–126.
 89.
D. Matthes and S. Plazotta, A variational formulation of the bdf2 method for metric gradient flows, ESAIM: Mathematical Modelling and Numerical Analysis, 53 (2019), pp. 145–172.
 90.
B. Maury, A. RoudneffChupin, and F. Santambrogio, A macroscopic crowd motion model of gradient flow type, Math. Models Methods Appl. Sci., 20 (2010), pp. 1787–1821.
 91.
B. Maury, A. RoudneffChupin, F. Santambrogio, and J. Venel, Handling congestion in crowd motion modeling, Netw. Heterog. Media, 6 (2011), pp. 485–519.
 92.
H. Osberger and D. Matthes, Convergence of a fully discrete variational scheme for a thinfilm equation, in Topological optimization and optimal transport, vol. 17 of Radon Ser. Comput. Appl. Math., De Gruyter, Berlin, 2017, pp. 356–399.
 93.
F. Otto, Double degenerate diffusion equations as steepest descent, Sonderforschungsbereich, 256 (1996).
 94.
F. Otto, The geometry of dissipative evolution equations: the porous medium equation, Comm. Partial Differential Equations, 26 (2001), pp. 101–174.
 95.
N. Papadakis, G. Peyre, and E. Oudet, Optimal transport with proximal splitting, SIAM. J. Image. Sci., 7 (2014), pp. 212–238.
 96.
B. Perthame, F. Quiros, and J. Vazquez, The Hele–Shaw asymptotics for mechanical models of tumor growth, Arch. Ratio. Mech. Anal., 212 (2014), pp. 93–127.
 97.
G. Peyré and M. Cuturi, Computational Optimal Transport, book in preparation, personal communication, 2018.
 98.
F. Santambrogio, Optimal transport for applied mathematicians, Birkäuser, NY, (2015), pp. 99–102.
 99.
Z. Sun, J. A. Carrillo, and C.W. Shu, A discontinuous Galerkin method for nonlinear parabolic equations and gradient flow problems with interaction potentials, J. Comput. Phys., 352 (2018), pp. 76–104.
 100.
M. Tang, N. Vauchelet, I. Cheddadi, I. VigonClementel, D. Drasdo, and B. Perthame, Composite waves for a cell population system modeling tumor growth and invasion, Chin. Ann. Math. Ser. B, 34 (2013), pp. 295–318.
 101.
C. Topaz, A. Bertozzi, and M. Lewis, A nonlocal continuum model for biological aggregation, Bull. Math. Bio., 68 (2006), pp. 1601–1623.
 102.
G. Toscani, Onedimensional kinetic models of granular flows, Math. Model. Numer. Anal., 34 (2000), pp. 1277–1291.
 103.
J. Vazquez, The Porous Medium Equation, Oxford Mathematical Monographs, Oxford University Press, 2007. Oxford, UK.
 104.
J. L. Vázquez, The porous medium equation, Oxford Mathematical Monographs, The Clarendon Press, Oxford University Press, Oxford, 2007. Mathematical theory.
 105.
C. Villani, Topics in Optimal Transport, 58 AMS, Grad. Stud. Math., 2003. Providence, RI.
 106.
M. Yan, A new primal–dual algorithm for minimizing the sum of three functions with a linear operator, Journal of Scientific Computing, (2018), pp. 1–20.
Acknowledgements
The authors would like to thank Ming Yan for fruitful discussions on primal dual methods and preliminary code for the three operator splitting algorithm. They would like to thank Wuchen Li for many helpful conversations. Finally, they would like to thank the anonymous reviewers for their useful observations and suggestions, which greatly improved this work. JAC was supported the Advanced Grant NonlocalCPD (Nonlocal PDEs for Complex Particle Dynamics: Phase Transitions, Patterns and Synchronization) of the European Research Council Executive Agency (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 883363). JAC was also partially supported by the EPSRC Grant Number EP/P031587/1. KC was supported by NSF DMS1811012 and a Hellman Faculty Fellowship. LW and CW are partially supported by NSF DMS1620135, 1903420, 1846854.
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Communicated by Eitan Tadmor.
A Further Details of Numerical Implementation
A Further Details of Numerical Implementation
In this section, we provide explicit formulas for the matrix \(\tilde{\mathsf {A}}\) and the vectors u and \({{\tilde{b}}}\) introduced in Problem 3(b) in Sect. 2.3, which play a key role in the implementation of Algorithms 1, 2, and 3. For simplicity, we consider the case of one space dimension, and the discretization takes the form (28). The constructions of \({{A}}\) and \({{b}}\) in Problem 3(a) are very similar except a slightly different treatment of \(\rho \) at final time. From now on, for simplicity of notation, we will drop the tildes for the matrix \(\tilde{\mathsf {A}}\) and vector \({\tilde{b}}\).
Define \(N=(N_x+1) (N_t+1)\). Let \(\otimes \) denote the Kronecker tensor product, \({\mathsf {I}}_{N_x+1}\) the identity matrix of size \(N_x+1\), and \((\mathbf {x})_M\) the column vector in \({\mathbb {R}}^M\) with all components equal to x. Then we define
and the matrix \({\mathsf {A}}\in {\mathbb {R}}^{M \times 2N}\) takes the form
Here, \({\mathsf {A}}_\rho \in {\mathbb {R}}^{N\times N}\) reads
where \({\mathsf {D}}_t^{(1)}\), \({\mathsf {D}}_t^{(2)} \in {\mathbb {R}}^{(N_t+1) \times (N_t + 1)}\), and \({\mathsf {I}}_x^{(1)} \in {\mathbb {R}}^{(N_x + 1) \times (N_x + 1)} \) are
Here, \({\mathsf {D}}_t^{(1)}\) and \({\mathsf {D}}_t^{(2)}\) correspond to the temporal discretization and initial condition for \(\rho \). Likewise,
where \({\mathsf {D}}_x^{(1)}\), \({\mathsf {D}}_x^{(2)} \in {\mathbb {R}}^{(N_x + 1)}\), and \({\mathsf {B}}_t^{(1)} \in {\mathbb {R}}^{(N_t + 1) \times (N_t + 1)}\):
For mass conservation, let \({\mathsf {S}}_\rho = (\mathbf{\varDelta x})_{N_x+1} ^t\), then \({\mathsf {A}}_\text {mass} = {\mathsf {I}}_{N_t + 1} \otimes {\mathsf {S}}_\rho \). In sum, different \({\mathsf {A}}_i\) can be written as
Accordingly, \(b \in {\mathbb {R}}^{N + N_t + 1 }\) collects all the initial conditions for \(\rho \) and boundary conditions for m. More specifically, it writes
and \(b_1 = 0\).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Carrillo, J.A., Craig, K., Wang, L. et al. Primal Dual Methods for Wasserstein Gradient Flows. Found Comput Math (2021). https://doi.org/10.1007/s10208021095031
Received:
Revised:
Accepted:
Published:
Keywords
 Optimal transport
 Optimization schemes
 Steepest descent schemes
 Gradient flows
 Minimizing movements
 Primal dual methods
Mathematics Subject Classification
 35A15
 47J25
 47J35
 49M29
 65K10
 82B21