1 Introduction

Given a Riemannian manifold (Mg), one may consider the optimal transport problem with cost given by squared Riemannian distance. This induces the 2-Wasserstein distance \(W_2\) on \({\mathcal {P}}_2(M)\), the space of probability measures on M with finite second moments (i.e. probability measures \(\mu \) such that

$$\begin{aligned} \int _M d(x,x_0)^2 d\mu (x)<\infty \end{aligned}$$

for every \(x_0\in M\)). The metric space \(({\mathcal {P}}_2(M),W_2)\) is called the 2-Wasserstein space, and is known to be a geodesic space ([21] Chapter 7).

In [17], Otto proposed that \({\mathcal {P}}_2 (M)\) admits a formal Riemannian structure and developed a formal calculus on \({\mathcal {P}}_2 (M)\). This later became what is known as Otto calculus [21] and was made rigorous by Ambrosio-Gigli-Savaré [2]. In particular, Otto calculus allows one to compute displacement Hessians of functionals along geodesics in \({\mathcal {P}}_2 (M)\). This is useful for characterizing a displacement convex functional (i.e. convex along every geodesic) by the non-negativity of its displacement Hessian. In a seminal work by Otto and Villani [18], it was shown that the displacement convexity of the entropy functional is related to the Ricci curvature of (Mg). Since then, the notion of displacement convexity has been useful in many other areas. For instance, it has inspired new heuristics and proofs of various functional inequalities [1, 8].

Further advances have been made towards understanding the relationships between the geometry of the underlying space and the induced geometry of \({\mathcal {P}}(M)\), the space of probability measures on M. In his Ph.D. thesis [19], Schachter studied the optimal transport problem on \({\mathbb {R}}^d\) with cost induced by a Tonelli Lagrangian. The case \(d=1\) was considered in [20], and this work was later used in [3] and [15].

In his work, Schachter developed an Eulerian calculus, extending the Otto calculus. Among the other contributions of his thesis, Schachter derived a canonical form for the displacement Hessians of functionals. Using Eulerian calculus, he found a new class of displacement convex functionals on \(S^1\) [20], which includes those found by Carrillo and Slepčev in [7]. In the case when the cost is given by squared Riemannian distance, Schachter proved that his displacement Hessian agrees with Villani’s displacement Hessian in [21], which is a quadratic form involving the Bakry–Emery tensor.

Summary of main results: In this manuscript, a generalized notion of curvature \({\mathcal {K}}_x\) (Definition 5.6) is proposed for the manifold \(M={\mathbb {R}}^d\) equipped with a general Tonelli Lagrangian L, and is given by

$$\begin{aligned} {\mathcal {K}}_x (\xi ) {:}{=}{{\,\textrm{tr}\,}}\bigg (\nabla \xi (x)^2 + A(x,\xi (x))\nabla \xi (x) + B(x,\xi (x))\bigg ) \end{aligned}$$

for vector fields \(\xi \in C^2({\mathbb {R}}^d;{\mathbb {R}}^d)\). The maps A and B are defined in Lemma 5.1. We prove that this generalized curvature is independent of the choice of coordinates (Theorem 5.7). In the case where \(\xi \) take a special form (that naturally arises from the optimal transport problem), we provide an explicit formula for \({\mathcal {K}}_x\) in Theorem 5.8. Lastly, we furnish an example of a Lagrangian cost with non-negative generalized curvature that is not given by squared Riemannian distance. This induces a geometry on the L-Wasserstein space where the generalized entropy functional (4.1) is displacement convex along suitable curves.

This paper is organized as follows: In the first four sections, we will review the optimal transport problem induced by a Tonelli Lagrangian, up to and including the notion of displacement convexity. The thesis of Schachter [19] provides a good overview of key definitions and results needed. Section 2 covers some basic notation. Section 3 reviews some ideas from [19]; chief among them is the relationship between the various formulations of the optimal transport problem. Section 4 discusses functionals along curves in Wasserstein space, including a computation of the displacement Hessian. Section 5 introduces the definition and various properties of the generalized curvature \({\mathcal {K}}_x\). Lastly, Sect. 6 provides an example of a Lagrangian with everywhere non-negative generalized curvature.

2 Notation

We will take our underlying manifold to be \(M = {\mathbb {R}}^d\) and identify its tangent bundle \(T{\mathbb {R}}^d\cong {\mathbb {R}}^d \times {\mathbb {R}}^d\). Let \({\mathcal {P}}^{ac} = {\mathcal {P}}^{ac}({\mathbb {R}}^d)\) denote the set of probability measures on \({\mathbb {R}}^d\) that are absolutely continuous with respect to the \(d-\)dimensional Lebesgue measure (denoted \({\mathcal {L}}^d\)). An element of \({\mathcal {P}}^{ac}\) will often be identified by its density \(\rho \). Given \(\rho \in {\mathcal {P}}^{ac}\) and a measurable function \(T:{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\), \(T_{\#}\rho \) will denote the push-forward measure of \(\rho \).

Definition 2.1

(Tonelli Lagrangian) A function \(L:{\mathbb {R}}^d \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) is called a Tonelli Lagrangian if it satisfies the following conditions:

  1. (i)

    \(L\in C^2 ({\mathbb {R}}^d \times {\mathbb {R}}^d)\).

  2. (ii)

    For every \(x\in {\mathbb {R}}^d\), the function \(L(x,\cdot ):{\mathbb {R}}^d \rightarrow {\mathbb {R}}\) is strictly convex.

  3. (iii)

    L has asymptotic superlinear growth in the variable v, in the sense that there exists a constant \(c_0\in {\mathbb {R}}\) and a function \(\theta :{\mathbb {R}}^d \rightarrow [0,+\infty )\) with

    $$\begin{aligned} \lim _{|v|\rightarrow +\infty }\frac{\theta (v)}{|v|}=+\infty \end{aligned}$$

    such that

    $$\begin{aligned} L(x,v)\ge c_0 + \theta (v) \end{aligned}$$
    (2.1)

    for all \((x,v)\in {\mathbb {R}}^d \times {\mathbb {R}}^d\).

Throughout this manuscript, \(L \in C^k ({\mathbb {R}}^d \times {\mathbb {R}}^d)\), \(k\ge 3\) will be assumed to be a Tonelli Lagrangian and we will work with the underlying space \(({\mathbb {R}}^d, L)\). We denote the gradient with respect to the x (position) and v (velocity) variables by \(\nabla _x L, \nabla _v L\in {\mathbb {R}}^d\) respectively. Similarly, the second-order derivatives will be denoted by \(\nabla _{xx}^2\,L\), \(\nabla _{vv}^2\,L\), \(\nabla _{xv}^2\,L\), \(\nabla _{vx}^2L = \nabla _{xv}^2 L^{\top } \in {\mathbb {R}}^{d\times d}\). We will assume that the Hessian \(\nabla _{vv}^2 L(x,v)\) is positive-definite for every \((x,v)\in {\mathbb {R}}^d \times {\mathbb {R}}^d\). The time derivative of a function f(t) will be denoted by \({\dot{f}} = \frac{df}{dt}\).

3 Optimal transport problem induced by a Tonelli Lagrangian

3.1 Lagrangian optimal transport problem

The goal of this section is to establish the different formulations of the optimal transport problem with cost induced by a Tonelli Lagrangian L. In this first subsection, the Lagrangian optimal transport problem will be presented. We will also briefly recall the classical Monge–Kantorovich theory. Most of the material in the subsection can be found in [5, 11, 19, 21]. In subsection 3.2 we will present an Eulerian perspective and its connections to viscosity solutions of the Hamilton–Jacobi equation.

Definition 3.1

(Action functional) Let \(T>0\) and \(\gamma \in W^{1,1}([0,T];{\mathbb {R}}^d)\) be a curve. The action of L on \(\gamma \) is

$$\begin{aligned} {\mathcal {A}}_{L,T}(\gamma ) = \int _{0}^{T}L(\gamma (t),{\dot{\gamma }}(t))\;dt. \end{aligned}$$
(3.1)

This induces a cost function \(c_{L,T}: {\mathbb {R}}^d \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} c_{L,T}(x,y) = \inf \{ {\mathcal {A}}_{L,T}(\gamma )\; :\; \gamma \in W^{1,1}([0,T];{\mathbb {R}}^d), \gamma (0) = x, \gamma (T) = y\}. \end{aligned}$$
(3.2)

A curve \(\gamma \) with \(\gamma (0) = x, \gamma (T) = y\) is called an action-minimizing curve from x to y if \({\mathcal {A}}_{L,T}(\gamma ) = c_{L,T}(x,y)\).

Theorem 3.2

([11] Appendix B) For any \(x,y\in {\mathbb {R}}^d\), there exists an action-minimizing curve \(\gamma \) from x to y such that

  1. (i)

    \({\mathcal {A}}_{L,T}(\gamma ) = c_{L,T}(x,y)\)

  2. (ii)

    \(\gamma \in C^k ([0,T];{\mathbb {R}}^d)\)

  3. (ii)

    \(\gamma \) satisfies the Euler–Lagrange equation

    $$\begin{aligned} \frac{d}{dt}((\nabla _v L)(\gamma ,{\dot{\gamma }})) = (\nabla _x L)(\gamma ,{\dot{\gamma }}) \end{aligned}$$
    (3.3)

Definition 3.3

(Lagrangian flow) The Lagrangian flow \(\Phi :[0,+\infty )\times {\mathbb {R}}^d\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\times {\mathbb {R}}^d\) is defined by

$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{d}{dt}((\nabla _v L)(\Phi )) = (\nabla _x L)(\Phi )\\ \Phi (0,x,v)=(x,v) \end{array}\right. } \end{aligned}$$

We refer the reader to [11] and [19] for further properties of the cost function \(c_{L,T}\). In particular, it is locally Lipschitz and thus differentiable almost everywhere by Rademacher’s theorem. Moreover, if either \(\frac{\partial }{\partial x}c_{L,T}(x_0,y_0)\) or \(\frac{\partial }{\partial y}c_{L,T}(x_0,y_0)\) exists at \((x_0,y_0)\), then the action-minimizing curve from \(x_0\) to \(y_0\) is unique. With the cost \(c_{L,T}\), we may state the Monge problem and the Kantorovich problem.

Definition 3.4

(Monge problem) Let \(\rho _0,\rho _T\in {\mathcal {P}}^{ac}\). The Monge optimal transport problem from \(\rho _0\) to \(\rho _T\) for the cost \(c_{L,T}\) is the minimization problem

$$\begin{aligned} \inf _M \bigg \{\int _{{\mathbb {R}}^d}c_{L,T}(x,M(x))\rho _0(x) \;dx\;:\;M_{\#}\rho _0=\rho _T\;,\; M\text { Borel measurable} \bigg \}. \end{aligned}$$
(3.4)

Definition 3.5

(Kantorovich problem) Let \(\Pi (\rho _0,\rho _T)\) denote the set of all probability measures on \({\mathbb {R}}^d \times {\mathbb {R}}^d\) with marginals \(\rho _0\) and \(\rho _T\). Then the Kantorovich optimal transport problem from \(\rho _0\) to \(\rho _T\) for the cost \(c_{L,T}\) is the minimization problem

$$\begin{aligned} \inf _\pi \bigg \{ \int _{{\mathbb {R}}^d \times {\mathbb {R}}^d}c_{L,T}(x,y)\;d\pi (x,y) \;:\;\pi \in \Pi (\rho _0,\rho _T) \bigg \}. \end{aligned}$$
(3.5)

A minimizer \(\pi \) is called an optimal transport plan. The infimum in (3.5) is denoted \(W_{c_{L,T}}(\rho _0, \rho _T)\) and it is called the Kantorovich cost from \(\rho _0\) to \(\rho _T\).

If \(W_{c_{L,T}}(\rho _0, \rho _T)\) is finite, then the Monge problem with cost \(c_{L,T}\) admits an optimizer M (called the Monge map) that is unique \(\rho _0-\)almost everywhere [11]. Note that the Monge problem is only concerned with the initial and final states (i.e. \(\rho _0,\rho _T\)). To interpolate between \(\rho _0\) and \(\rho _T\) in a way that respects the cost \(c_{L,T}\), we consider the Lagrangian formulation of the optimal transport problem induced by L.

Definition 3.6

(Lagrangian optimal transport problem) Let \(\rho _0,\rho _T\in {\mathcal {P}}^{ac}\). The Lagrangian optimal transport problem from \(\rho _0\) to \(\rho _T\) induced by the Tonelli Lagrangian L is the minimization problem

$$\begin{aligned} \inf _\sigma \bigg \{ \int _{0}^T \int _{{\mathbb {R}}^d}L(\sigma (t,x),{\dot{\sigma }}(t,x))\rho _0(x)\;dx\;dt \bigg \} \end{aligned}$$
(3.6)

where the infimum is taken over all \(\sigma :[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) such that

  1. (i)

    \(\sigma (\cdot ,x)\in W^{1,1}([0,T];{\mathbb {R}}^d)\) for every \(x\in {\mathbb {R}}^d\)

  2. (ii)

    \(\sigma (t,\cdot )\) is Borel measurable for every \(t\in [0,T]\)

  3. (iii)

    \(\sigma (0,x) = x\) for every \(x\in {\mathbb {R}}^d\)

  4. (iv)

    \(\sigma (T,\cdot )_{\#}\rho _0 = \rho _T\)

In [19], it is shown that if \(W_{c_{L,T}}(\rho _0, \rho _T)\) is finite, then the Lagrangian optimal transport problem admits an optimizer \(\sigma \) such that \(\sigma (\cdot ,x)\) is an action-minimizing curve from \(\sigma (0,x)=x\) to \(\sigma (T,x)\) for every \(x\in {\mathbb {R}}^d\). Moreover, the map \(\sigma (T,\cdot )\) coincides with the Monge map M and so is unique \(\rho _0-\)almost everywhere. With an optimizer \(\sigma \), we can define the notion of displacement interpolation, which is the analogue of a geodesic in \({\mathcal {P}}^{ac}\).

Definition 3.7

(Displacement interpolant) Let \(\rho _0,\rho _T\in {\mathcal {P}}^{ac}\) be such that the Kantorovich cost \(W_{c_{L,T}}(\rho _0, \rho _T)\) is finite. Let \(\sigma \) be an optimizer of the Lagrangian optimal transport problem. Then the displacement interpolant between \(\rho _0\) and \(\rho _T\) for the cost \(c_{L,T}\) is the measure-valued map

$$\begin{aligned}{}[0,T]\ni t \mapsto \mu _t = \sigma (t,\cdot )_{\#}\rho _0. \end{aligned}$$

Since \(\mu _t\) is absolutely continuous with respect to \({\mathcal {L}}^d\) for every \(t\in [0,T]\) ([11] Theorem 5.1), we will also identify \(\mu _t\) with its density \(\rho _t\). Subsequently, we will always denote a displacement interpolant by a function \(\rho : [0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) and use the notation \(\rho _t = \rho (t,\cdot )\) whenever the intention is clear. Since the maps \(\sigma (t,\cdot )\) are uniquely defined (\(\rho _0-\)almost everywhere) on the support of \(\rho _0\), the displacement interpolant is well-defined. Moreover, the map \(\sigma \big |_{[0,t]\times {\mathbb {R}}^d}\) for an intermediary time \(t\in [0,T]\) optimizes the Lagrangian optimal transport problem from \(\rho _0\) to \(\rho _t\), i.e.

$$\begin{aligned} W_{c_{L,t}}(\rho _0,\rho _t) = \int _{0}^t \int _{{\mathbb {R}}^d}L(\sigma (s,x),{\dot{\sigma }}(s,x))\rho _0(x)\;dx\;ds. \end{aligned}$$

In order to discuss the Eulerian formulation of the optimal transport problem, we need to introduce the Kantorovich duality. We do so in accordance with the convention of [21].

Theorem 3.8

(Kantorovich duality) The Kantorovich optimal transport problem from \(\rho _0\) to \(\rho _T\) for the cost \(c_{L,T}\) has a dual formulation

$$\begin{aligned}&\inf _\pi \bigg \{ \int _{{\mathbb {R}}^d \times {\mathbb {R}}^d}c_{L,T}(x,y)\;d\pi (x,y) \;:\;\pi \in \Pi (\rho _0,\rho _T) \bigg \}\\&\quad = \sup _{(u_0,u_T)}\bigg \{ \int _{{\mathbb {R}}^d}u_T(y)\rho _T(y)\;dy - \int _{{\mathbb {R}}^d}u_0(x)\rho _0(x)\;dx \;:\;(u_0,u_T)\in L^1 (\rho _0)\times L^1(\rho _T)\;,\\&\qquad \qquad u_T(y)-u_0(x)\le c_{L,T}(x,y)\quad \forall (x,y)\in {\mathbb {R}}^d \times {\mathbb {R}}^d\bigg \} \end{aligned}$$

Moreover, we may assume that

$$\begin{aligned} u_T(y)&= \inf _{x\in {\mathbb {R}}^d} \big \{ u_0(x) + c_{L,T}(x,y) \big \}\\ u_0(x)&= \sup _{y\in {\mathbb {R}}^d} \big \{ u_T(y) - c_{L,T}(x,y) \big \} \end{aligned}$$

If \((u_0, u_T)\) is an optimizer of the dual problem, then \(u_0\) and \(u_T\) are called Kantorovich potentials.

Remark 3.9

If the Monge optimal transport problem from \(\rho _0\) to \(\rho _T\) for the cost \(c_{L,T}\) admits a minimizer M (unique \(\rho _0-\)almost everywhere), then any optimal transport plan \(\pi \in \Pi (\rho _0,\rho _T)\) is concentrated on the graph of M [11]. Moreover, if \(u_0\) and \(u_T\) are Kantorovich potentials, then

$$\begin{aligned} u_T(y) - u_0(x) \le c_{L,T}(x,y) \end{aligned}$$

for every \((x,y)\in {\mathbb {R}}^d \times {\mathbb {R}}^d\) and we have equality

$$\begin{aligned} u_T(M(x)) - u_0(x) = c_{L,T}(x,M(x)) \end{aligned}$$

for x \(\rho _0-\)almost everywhere (see [21] Theorem 5.10).

3.2 Eulerian formulation

The paper by Benamou and Brenier [4] is one of the earliest works establishing the Eulerian formulation and its connection to Hamilton–Jacobi equations. Subsequently, the relationships between the different formulations of the optimal transport problem were further studied (for instance, [5]).

In particular, the Eulerian view establishes the displacement interpolant as a solution to the continuity equation. First, we state some basic facts about the Hamiltonian.

The Hamiltonian associated with the Tonelli Lagrangian L is defined as the Legendre transform of L with respect to the variable v, i.e.

$$\begin{aligned} H(x,p) = \sup _{v\in {\mathbb {R}}^d}\{\langle p,v \rangle - L(x,v)\}. \end{aligned}$$
(3.7)

Thus, the Hamiltonian H satisfies the Fenchel-Young inequality

$$\begin{aligned} \langle v,p \rangle \le H(x,p) + L(x,v) \end{aligned}$$
(3.8)

for all \(x,v,p\in {\mathbb {R}}^d\), with equality if and only if

$$\begin{aligned} p = (\nabla _v L)(x,v). \end{aligned}$$
(3.9)

Moreover, \(H\in C^k ({\mathbb {R}}^d \times {\mathbb {R}}^d)\) and

$$\begin{aligned} (\nabla _v L)(x,(\nabla _p H)(x,r))= (\nabla _p H)(x, (\nabla _v L)(x,r) ) = r. \end{aligned}$$
(3.10)

Let \(u_0:{\mathbb {R}}^d \rightarrow [-\infty ,+\infty ]\) be a function and \(T>0\). We define the Lax-Oleinik evolution \(u:[0,T]\times {\mathbb {R}}^d\rightarrow [-\infty ,+\infty ]\) of \(u_0\) by

$$\begin{aligned} u(t,x)&{:}{=}\inf _{\gamma } \bigg \{ u_0(\gamma (0)) + \int _{0}^{t}L(\gamma (\tau ),{\dot{\gamma }}(\tau ))\;d\tau \;:\;\gamma \in W^{1,1}([0,t];{\mathbb {R}}^d)\;,\;\gamma (t) = x \bigg \} \nonumber \\&= \inf _{\gamma } \bigg \{ u_0(\gamma (0)) + {\mathcal {A}}_{L,t}(\gamma )\;:\;\gamma \in W^{1,1}([0,t];{\mathbb {R}}^d)\;,\;\gamma (t) = x \bigg \} \nonumber \\&=\inf _{y\in {\mathbb {R}}^d} \big \{ u_0(y) + c_{L,t}(y,x) \big \} \end{aligned}$$
(3.11)

so that \(u(0,x) = u_0(x)\).

Remark 3.10

Since L is bounded below, if there exists some \((t^*,x^*)\in (0,T]\times {\mathbb {R}}^d\) such that \(u(t^*,x^*)\) is finite, then u is finite on all of \([0,T]\times {\mathbb {R}}^d\).

It is known that if u is finite, then it is a viscosity solution of the Hamilton–Jacobi equation

$$\begin{aligned} \frac{\partial u}{\partial t} + H(x,\nabla u) = 0 \end{aligned}$$
(3.12)

(see [9] Section 7.2 and [10] Theorem 1.1).

Definition 3.11

(Calibrated curve) Let \(f:[t_0,t_1]\times {\mathbb {R}}^d\) be a function. A curve \(\gamma \in W^{1,1}([t_0,t_1];{\mathbb {R}}^d)\) is called a \((f,L)-\)calibrated curve if \(f(t_0,\gamma (t_0))\), \(f(t_1,\gamma (t_1))\) and \(\int _{t_0}^{t_1}L(\gamma (t),{\dot{\gamma }}(t))\;dt\) are all finite and

$$\begin{aligned} f(t_1,\gamma (t_1)) - f(t_0,\gamma (t_0)) = \int _{t_0}^{t_1}L(\gamma (t),{\dot{\gamma }}(t))\;dt. \end{aligned}$$
(3.13)

In the following proposition, we mention some properties of u that are of interest to us. The proofs can be found in [6, 9, 10].

Proposition 3.12

Let u be defined as in (3.11). If u is finite, then the following hold:

  1. (i)

    u is continuous and locally semi-concave on \((0,T)\times {\mathbb {R}}^d\).

  2. (ii)

    u is a viscosity solution of the Hamilton–Jacobi equation

    $$\begin{aligned} \frac{\partial u}{\partial t} + H(x,\nabla u) = 0. \end{aligned}$$
  3. (iii)

    If \([a,b]\subset [0,T]\) and \(\gamma :[a,b]\rightarrow {\mathbb {R}}^d\) is a \((u,L)-\)calibrated curve, then u is differentiable at \((t,\gamma (t))\) for every \(t\in [a,b]\) and we have

    $$\begin{aligned} \nabla u(t,\gamma (t)) = (\nabla _v L)(\gamma (t),{\dot{\gamma }}(t)). \end{aligned}$$
    (3.14)
  4. (iv)

    If u is differentiable at \((t^*, x^*)\), then there is at most one \((u,L)-\)calibrated curve \(\gamma :[a,b]\rightarrow {\mathbb {R}}^d\) with \(t^* \in [a,b]\) and \(\gamma (t^*) = x^*\).

We now return to the optimal transport problem from \(\rho _0\in {\mathcal {P}}^{ac}\) to \(\rho _T\in {\mathcal {P}}^{ac}\) induced by L. Suppose that \(W_{c_{L,T}}(\rho _0,\rho _T)\) is finite and let \(u_0\in L^1(\rho _0)\) be a Kantorovich potential.

Proposition 3.13

Let \(\sigma : [0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) be an optimizer of the Lagrangian optimal transport problem from \(\rho _0\) to \(\rho _T\) induced by L. Let \((u_0,u_T)\) be the corresponding Kantorovich potentials and \(u:[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) be the Lax–Oleinik evolution of \(u_0\). Then \((\nabla u)(t,\sigma (t,x))\) exists for all \(t\in [0,T]\) and x \(\rho _0-\)almost everywhere. In addition, \(\sigma \) satisfies the relation

$$\begin{aligned} {\dot{\sigma }}(t,x) = (\nabla _p H)(\sigma (t,x), (\nabla u)(t,\sigma (t,x))). \end{aligned}$$
(3.15)

Proof

By Remark 3.10, u is finite since \(u(T,\cdot ) = u_T \in L^1(\rho _T)\). By Remark 3.9, the Kantorovich potentials \((u_0,u_T)\) satisfy

$$\begin{aligned} u_T(\sigma (T,x)) - u_0(x)&= c_{L,T}(x,\sigma (T,x))\\ \iff u(T,\sigma (T,x)) - u(0,\sigma (0,x))&= c_{L,T}(x,\sigma (T,x)) \end{aligned}$$

for x \(\rho _0-\)almost everywhere (recall that \(\sigma (T,\cdot )\) coincides with the Monge map). Thus, for \(\rho _0-\)almost every x, the curve \(t\mapsto \sigma (t,x)\) is a \((u,L)-\)calibrated curve and so \((\nabla u)(t,\sigma (t,x)) = (\nabla _v L)(\sigma (t,x),{\dot{\sigma }}(t,x))\) exists by Proposition 3.12. Using identity (3.10), we get

$$\begin{aligned} {\dot{\sigma }}(t,x) = (\nabla _p H)(\sigma (t,x), (\nabla u)(t,\sigma (t,x))). \end{aligned}$$

\(\square \)

Remark 3.14

Let \(V:[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) be a time-dependent vector field that agrees with \((\nabla _p H)(x,(\nabla u)(t,x))\) on the set

$$\begin{aligned} S_t {:}{=}\{ \sigma (t,y)\in {\mathbb {R}}^d\;:\; y\in \text {supp}(\rho _0)\;,\;(\nabla u)(t,\sigma (t,y))\;\text {exists} \} \end{aligned}$$

for each \(t\in [0,T]\). Using the definition of the displacement interpolant \(\rho _t = \sigma (t,\cdot )_{\#}\rho _0\), and the fact that \((\nabla u)(t,\sigma (t,x))\) exists for all \(t\in [0,T]\) and \(\rho _0-\)almost every \(x\in {\mathbb {R}}^d\), we have that the set

$$\begin{aligned} \{ \sigma (t,y)\in {\mathbb {R}}^d\;:\; y\in \text {supp}(\rho _0)\;,\; u\;\text {not differentiable at }(t,\sigma (t,y))\} \end{aligned}$$

is a set of zero \(\rho _t-\)measure. Thus, \(S_t\) has full \(\rho _t-\)measure and so \(V(t,x) = (\nabla _p H)(x,(\nabla u)(t,x))\) \(\rho _t-\)almost everywhere. By (3.15), \({\dot{\sigma }}(t,x) = V(t,\sigma (t,x))\) for all \(t\in [0,T]\) and \(\rho _0-\)almost every \(x\in {\mathbb {R}}^d\). This means that \(\rho _t\) and V solve the continuity equation

$$\begin{aligned} \frac{\partial \rho _t}{\partial t} + \nabla \cdot (\rho _t V) = 0 \end{aligned}$$
(3.16)

in the sense of distributions ([19] Proposition 3.4.3).

4 Generalized entropy functional and displacement Hessian

Otto calculus and Schachter’s Eulerian calculus both allow for explicit computations, assuming that all relevant quantities possess sufficient regularity. However, the regularity of a displacement interpolant \(\rho \) depends on the Lagrangian L, the initial and final densities \((\rho _0, \rho _T)\), and the optimal trajectories \(\sigma \) (or the velocity field V in the Eulerian framework). In general, the Kantorovich potential \(u_0\) arising from an optimal transport problem induced by a Tonelli Lagrangian L is only known to be semiconcave, differentiable \({\mathcal {L}}^d -\)almost everywhere, and its gradient \(\nabla u_0\) is only locally bounded (see [13] and [14] Appendix C). This implies that the initial velocity \(V(0,x) = (\nabla _p H)(x,\nabla u_0 (x))\) is only locally bounded. As such, even if the initial density \(\rho _0\) is smooth, its regularity may fail to propagate along the displacement interpolant.

For our purpose of computing displacement Hessians, we require displacement interpolants to be of class \(C^2\). Fortunately, such displacement interpolants do exist and we can construct them if we impose two additional criteria on L.

4.1 \(C^2\) displacement interpolants

Let \(L\in C^{k+1} ({\mathbb {R}}^d \times {\mathbb {R}}^d)\), \(k\ge 3\) be a Tonelli Lagrangian satisfying two additional criteria (see [6] Chapters 6.3, 6.4).

  1. (L1)

    There exists \({\tilde{c}}_0\ge 0\) and \({\tilde{\theta }}:[0,\infty )\rightarrow [0,\infty )\) with

    $$\begin{aligned} \lim _{r\rightarrow +\infty }\frac{{\tilde{\theta }}(r)}{r} = +\infty \end{aligned}$$

    such that

    $$\begin{aligned} L(x,v)\ge {\tilde{\theta }}(|v|)-{\tilde{c}}_0. \end{aligned}$$

    In addition, \({\tilde{\theta }}\) is such that for any \(M>0\) there exists \(K_M >0\) with

    $$\begin{aligned} {\tilde{\theta }}(r+m)\le K_M[1+{\tilde{\theta }}(r)] \end{aligned}$$

    for all \(m\in [0,M]\) and all \(r\ge 0\).

  2. (L2)

    For any \(r>0\), there exists \(C_r > 0\) such that

    $$\begin{aligned} |(\nabla _x L)(x,v)| + |(\nabla _v L)(x,v)| < C_r {\tilde{\theta }}(|v|) \end{aligned}$$

    for all \(|x|\le r\), \(v\in {\mathbb {R}}^d\).

Some common examples of Tonelli Lagrangians satisfying these criteria include the Riemannian kinetic energy

$$\begin{aligned} L(x,v) = \frac{1}{2}g_x(v,v) \end{aligned}$$

where \(g_x\) denotes the underlying Riemannian metric tensor, and Lagrangians that arise from mechanics

$$\begin{aligned} L(x,v) = \frac{1}{2}g_x(v,v) + U(x) \end{aligned}$$

for some appropriate potential \(U:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\).

Let H be the corresponding Hamiltonian.

Lemma 4.1

Let \(u_0\in C^{k+1}({\mathbb {R}}^d)\) with \(u_0(x)\ge -{\tilde{c}}_0\) for all \(x\in {\mathbb {R}}^d\). Let \(u:[0,+\infty )\times {\mathbb {R}}^d \rightarrow [-\infty ,+\infty ]\) be the Lax–Oleinik evolution of \(u_0\), as defined in (3.11). For \(x\in {\mathbb {R}}^d\), consider the Lagrangian flow (introduced in Definition 3.3)

$$\begin{aligned} \Phi (t,x,V(0,x)) = (\Phi _1(t,x,V(0,x)), \Phi _2(t,x,V(0,x)))\;,\quad t\in [0,+\infty ) \end{aligned}$$

where \(V:[0,+\infty )\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) is a time-dependent vector field defined by

$$\begin{aligned} V(t,x) = (\nabla _p H)(x,(\nabla u)(t,x)). \end{aligned}$$

(Here, \(\Phi _1\) and \(\Phi _2\) are the x and v components of \(\Phi \) respectively.) If we let \(\sigma (t,x) = \Phi _1(t,x,V(0,x))\), then \({\dot{\sigma }}(t,x) = V(t,\sigma (t,x))\) for all \(t\in [0,+\infty )\), \(x\in {\mathbb {R}}^d\). Moreover, \(\sigma (t,\cdot ):{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) is a \(C^{k}-\)diffeomorphism for every \(t\in [0,+\infty )\).

Proof

Since L and \(u_0\) are both bounded below, we have

$$\begin{aligned} u(t,x)&= \inf _{\gamma } \bigg \{ u_0(\gamma (0)) + \int _{0}^{t}L(\gamma (\tau ),{\dot{\gamma }}(\tau ))\;d\tau \;,\;\gamma (t)=x\bigg \}\\&\ge -{\tilde{c}}_0 - {\tilde{c}}_0 t\\&> -\infty \end{aligned}$$

and so u is finite. From [10], u is a continuous viscosity solution of the Hamilton–Jacobi equation (3.12) and we know that for each \((t,x)\in (0,+\infty )\times {\mathbb {R}}^d\), there exists a unique \((u,L)-\)calibrated curve \(\gamma _x: [0,t]\rightarrow {\mathbb {R}}^d\) such that \(\gamma _x(t) = x\). Moreover, \((\nabla u)(s,\gamma _x (s))\) exists for all \(s\in [0,t]\) and is given by

$$\begin{aligned} (\nabla u)(s,\gamma _x (s))&= (\nabla _v L)(\gamma _x (s),{\dot{\gamma }}_x (s)) \\ \iff {\dot{\gamma }}_x (s)&= (\nabla _p H)(\gamma _x (s) , (\nabla u) (s,\gamma _x (s)))\\ \iff {\dot{\gamma }}_x (s)&= V(s,\gamma _x (s)) \end{aligned}$$

Since each \(\gamma _x\) is necessarily an action-minimizing curve from \(\gamma _x (0)\) to \(\gamma _x (t) = x\), it is the unique solution curve to the Euler–Lagrange system

$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{d}{dt}((\nabla _v L)(\gamma ,{\dot{\gamma }})) = (\nabla _x L)(\gamma ,{\dot{\gamma }})\\ \gamma (0) = \gamma _x (0)\\ {\dot{\gamma }}(0) = {\dot{\gamma }}_x (0) \end{array}\right. } \end{aligned}$$

Therefore, \(\sigma (t,\cdot ):{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) is a bijection for all \(t\in [0,+\infty )\) and \({\dot{\sigma }}(t,x) = V(t,\sigma (t,x))\) for all \((t,x)\in [0,+\infty )\times {\mathbb {R}}^d\).

Lastly, since \(L\in C^{k+1} ({\mathbb {R}}^d \times {\mathbb {R}}^d)\) and \(u_0\in C^{k+1} ({\mathbb {R}}^d)\), we have that \(u\in C^{k+1} ([0,+\infty )\times {\mathbb {R}}^d)\) [6]. As \(\nabla _p H \in C^{k}({\mathbb {R}}^d \times {\mathbb {R}}^d; {\mathbb {R}}^d)\), we have \(V(t,\cdot )\in C^{k}({\mathbb {R}}^d; {\mathbb {R}}^d)\) and so \(\sigma (t,\cdot ):{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) is a \(C^{k}-\)diffeomorphism for every \(t\in [0,+\infty )\). \(\square \)

Proposition 4.2

Let \(\rho _0\in {\mathcal {P}}^{ac} \cap C_{c}^2 ({\mathbb {R}}^d)\) be a compactly supported density. Then for any \(T>0\), there exists a \(C^2\) displacement interpolant \(\rho : [0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) with \(\rho (0,\cdot ) = \rho _0\).

Proof

Let \(u_0, u, V, \sigma \) be defined as in Lemma 4.1 and fix \(T>0\). For \(t\in [0,T]\), define

$$\begin{aligned} \rho (t,\cdot ) = \sigma (t,\cdot )_{\#}\rho _0. \end{aligned}$$

We claim that \(\sigma \) is an optimizer of the Lagrangian optimal transport problem from \(\rho _0\) to \(\rho _T = \rho (T,\cdot )\), which would imply that \(\rho \) is indeed a displacement interpolant. Let \(\phi :[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) satisfy the four conditions in Definition 3.6. By Lemma 4.1, \(t\mapsto \sigma (t,x)\) is a \((u,L)-\)calibrated curve for each \(x\in {\mathbb {R}}^d\). Thus, for every \(x\in {\mathbb {R}}^d\),

$$\begin{aligned} u(T,\sigma (T,x)) - u(0,\sigma (0,x))&= \int _{0}^{T}L(\sigma (t,x),{\dot{\sigma }}(t,x))\;dt \\ \iff u(T,\sigma (T,x)) - u_0(x)&= \int _{0}^{T}L(\sigma (t,x),{\dot{\sigma }}(t,x))\;dt \\ \implies \int _{{\mathbb {R}}^d} [u(T,\sigma (T,x)) - u_0(x)]\rho _0(x)\;dx&= \int _{{\mathbb {R}}^d}\int _{0}^{T}L(\sigma (t,x),{\dot{\sigma }}(t,x))\rho _0(x)\;dt\;dx \end{aligned}$$

By the definition of pushforward measure, the LHS of the last equality is

$$\begin{aligned} \int _{{\mathbb {R}}^d} [u(T,\sigma (T,x)) - u_0(x)]\rho _0(x)\;dx&= \int _{{\mathbb {R}}^d} u(T,y)\rho _T(y)\;dy - \int _{{\mathbb {R}}^d} u_0(x)\rho _0(x)\;dx\\&= \int _{{\mathbb {R}}^d} [u(T,\phi (T,x)) - u(0,\phi (0,x))]\rho _0(x)\;dx \end{aligned}$$

where the last equality is due to the assumption that \(\phi (T,\cdot )_{\#}\rho _0 = \rho _T\) and \(\phi (0,x) = x\). By the definition of u (i.e. (3.11)), we have that

$$\begin{aligned} u(T,\phi (T,x)) - u(0,\phi (0,x))&\le \int _{0}^{T}L(\phi (t,x),{\dot{\phi }}(t,x))\;dt\\ \implies \int _{{\mathbb {R}}^d} [u(T,\phi (T,x)) - u(0,\phi (0,x))]\rho _0(x)\;dx&\le \int _{{\mathbb {R}}^d} \int _{0}^{T}L(\phi (t,x),{\dot{\phi }}(t,x))\rho _0(x)\;dt\;dx \end{aligned}$$

for every \(x\in {\mathbb {R}}^d\). Thus,

$$\begin{aligned} \int _{{\mathbb {R}}^d}\int _{0}^{T}L(\sigma (t,x),{\dot{\sigma }}(t,x))\rho _0(x)\;dt\;dx \le \int _{{\mathbb {R}}^d} \int _{0}^{T}L(\phi (t,x),{\dot{\phi }}(t,x))\rho _0(x)\;dt\;dx. \end{aligned}$$

Since \(\phi \) was arbitrary, \(\sigma \) is indeed an optimizer of the Lagrangian optimal transport problem from \(\rho _0\) to \(\rho _T\).

By Lemma 4.1, \(\sigma (t,\cdot ):{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) is a \(C^{k}-\)diffeomorphism for every \(t\in [0,T]\) and \(\sigma (\cdot ,x)\in C^{k+1}([0,T];{\mathbb {R}}^d)\) for every \(x\in {\mathbb {R}}^d\). Using the change-of-variables formula,

$$\begin{aligned} \rho (t,y)&= \frac{\rho _0(\cdot )}{|\text {det} \nabla \sigma (t,\cdot )|}\bigg |_{[\sigma (t,\cdot )]^{-1}(y)}\\&= \frac{\rho _0(\cdot )}{\text {det} \nabla \sigma (t,\cdot )}\bigg |_{[\sigma (t,\cdot )]^{-1}(y)} \end{aligned}$$

where \(\text {det} \nabla \sigma (t,\cdot ) >0\) because \(\sigma (0,x) = x \implies \text {det} \nabla \sigma (0,\cdot ) = 1\). Since \(k\ge 3\), \(\rho \in C^2([0,T]\times {\mathbb {R}}^d)\). \(\square \)

4.2 Displacement Hessian

Let \(F\in C^2 ((0,+\infty ))\cap C([0,+\infty ))\) be a function satisfying

  1. (F1)

    \(F(0) = 0\),

  2. (F2)

    \(s^2 F''(s)\ge sF'(s) - F(s) \ge 0,\) \(\quad \forall s\in [0,+\infty )\).

If \(\rho _0 \in {\mathcal {P}}^{ac}\) is such that \(F(\rho _0)\in L^1 ({\mathbb {R}}^d)\), we define the generalized entropy functional

$$\begin{aligned} {\mathcal {F}}(\rho _0) = \int _{{\mathbb {R}}^d}F(\rho _0 (x))\;dx. \end{aligned}$$
(4.1)

This is well-defined at least on \({\mathcal {P}}^{ac}\cap C_{c}^0 ({\mathbb {R}}^d)\) since \(F(0)= 0\) implies

$$\begin{aligned} \int _{{\mathbb {R}}^d}F(\rho _0 (x))\;dx = \int _{\text {supp}(\rho _0)}F(\rho _0(x))\;dx \end{aligned}$$

which is finite.

Remark 4.3

If \(\rho _0\) is the density of a fluid and \({\mathcal {F}}(\rho _0)\) is the internal energy, then \(\rho _0 F'(\rho _0) - F(\rho _0)\) can be interpreted as a pressure [12, 21].

Definition 4.4

(Displacement convexity) The generalized entropy functional \({\mathcal {F}}\) is said to be convex along a displacement interpolant \(\rho _t\), \(t\in [0,T]\), if \({\mathcal {F}}(\rho _t)\) is finite and

$$\begin{aligned} {\mathcal {F}}(\rho _t) \le \frac{T-t}{T}{\mathcal {F}}(\rho _0) + \frac{t}{T}{\mathcal {F}}(\rho _T) \end{aligned}$$
(4.2)

for every \(t\in [0,T]\). \({\mathcal {F}}\) is said to be displacement convex if it is convex along every displacement interpolant (on which \({\mathcal {F}}\) is real-valued).

Remark 4.5

When the displacement interpolant is a “straight line", McCann proved that \({\mathcal {F}}\) is displacement convex if \(s\mapsto s^d F(s^{-d})\) is convex and non-increasing on \((0,+\infty )\) [16]. In this context, a “straight line" displacement interpolant refers to one of the form

$$\begin{aligned} \rho _t = \bigg ( \frac{T-t}{T}\text {id} + \frac{t}{T}M \bigg )_{\#}\rho _0. \end{aligned}$$

where M is the Monge map between \(\rho _0\) and \(\rho _T\).

Along a suitable displacement interpolant \(\rho _t\), if the map \(t\mapsto {\mathcal {F}}(\rho _t)\) is \(C^2\), then the condition that \(\frac{d^2}{dt^2}{\mathcal {F}}(\rho _t)\ge 0\) ensures convexity of \({\mathcal {F}}\) along \(\rho _t\). The following displacement Hessian formula is a special case of Theorem 4.3.2 of [19].

Theorem 4.6

(Displacement Hessian formula) Let \(\rho \in C^2 ([0,T]\times {\mathbb {R}}^d)\) be a displacement interpolant, with \(\rho _0 = \rho (0,\cdot )\) compactly supported. Let \(\sigma :[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) be an optimizer of the Lagrangian optimal transport problem from \(\rho _0\) to \(\rho _T\). Let \(V:[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) be defined as in Remark 3.14 so that \(\rho ,V\) satisfy the continuity equation \({\dot{\rho }} = -\nabla \cdot (\rho V)\). Assume that \(\sigma \) and V are \(C^2\) at least on the set

$$\begin{aligned} \bigcup _{t\in [0,T]} \{t\}\times \text {supp}(\rho _t). \end{aligned}$$

Then \(\frac{d^2}{dt^2}{\mathcal {F}}(\rho )\) exists for every \(t\in [0,T]\) and is given by

$$\begin{aligned} \frac{d^2}{dt^2}{\mathcal {F}}(\rho ) = \int _{{\mathbb {R}}^d}(\rho G'(\rho )-G(\rho ))(\nabla \cdot V)^2 + G(\rho )({{\,\textrm{tr}\,}}((\nabla V)^2)-\nabla \cdot W)\;dx \end{aligned}$$
(4.3)

where \(G:[0,+\infty )\rightarrow {\mathbb {R}}\) is defined by

$$\begin{aligned} G(s)&= sF'(s) - F(s)\end{aligned}$$
(4.4)
$$\begin{aligned} G'(s)&= sF''(s) \end{aligned}$$
(4.5)

and

$$\begin{aligned} W = {\dot{V}} + \nabla VV. \end{aligned}$$
(4.6)

Remark 4.7

The requirement that \(\rho _0\) is compactly supported serves to ensure that \({\mathcal {F}}\) is finite along \(\rho \). In addition, the compactness of supp\((\rho _0)\) and the continuity of \(\sigma \) together ensures that the set \(\{\sigma (t,x)\;:\; t\in [0,T]\;,\; x\in \text {supp}(\rho _0)\}\) is compact. Thus,

$$\begin{aligned} \Sigma {:}{=}\bigcup _{t\in [0,T]}\text {supp}(\rho _t) \end{aligned}$$

is bounded, up to a set of zero \({\mathcal {L}}^d -\)measure. This means that \(\frac{d^2}{dt^2}{\mathcal {F}}(\rho )\) exists for every \(t\in [0,T]\) and satisfies

$$\begin{aligned} \frac{d^2}{dt^2}{\mathcal {F}}(\rho )&= \frac{d^2}{dt^2} \int _{\Sigma }F(\rho (t,x))\;dx\\&= \int _{\Sigma }\frac{d^2}{dt^2}F(\rho (t,x))\;dx. \end{aligned}$$

Remark 4.8

By Remark 3.14, for every \(t\in [0,T]\), \(V(t,\cdot )\) is uniquely defined on supp\((\rho _t)\) \(\rho _t-\)almost everywhere. Thus, (4.3) is well-defined.

Proof

The displacement Hessian is

$$\begin{aligned} \frac{d^2}{dt^2}{\mathcal {F}}(\rho )&= \int F''(\rho ){\dot{\rho }}^2 + F'(\rho )\ddot{\rho }\;dx\\&= \int F''(\rho ){\dot{\rho }}^2 - F'(\rho )\nabla \cdot ({\dot{\rho }}V + \rho {\dot{V}})\;dx \end{aligned}$$

Integrating by parts, the above expression becomes

$$\begin{aligned}&\int F''(\rho ){\dot{\rho }}^2 + \langle \nabla (F'(\rho )) , {\dot{\rho }}V + \rho {\dot{V}}\rangle \;dx\\&\quad = \int F''(\rho )\bigg ({\dot{\rho }}^2 + \langle \nabla \rho , {\dot{\rho }}V+\rho {\dot{V}}\rangle \bigg )\;dx. \end{aligned}$$

Using the continuity equation \({\dot{\rho }} = -\nabla \cdot (\rho V)\), the definitions of W and G, and integration by parts, this integral can be written as

$$\begin{aligned} \int \rho G'(\rho )(\nabla \cdot V)^2 - G(\rho )\nabla \cdot \bigg ((\nabla \cdot V)V-\nabla VV + W\bigg )\;dx. \end{aligned}$$

A straightforward computation then yields the desired formula. \(\square \)

Remark 4.9

Recall that \(\rho G'(\rho ) - G(\rho ) = \rho ^2 F''(\rho ) - \rho F'(\rho ) + F'(\rho )\ge 0\) and \(G(\rho ) = \rho F'(\rho ) - F(\rho ) \ge 0\) by assumption (F2). Thus, the condition that \({{\,\textrm{tr}\,}}((\nabla V)^2)-\nabla \cdot W \ge 0\) would ensure that \(\frac{d^2}{dt^2}{\mathcal {F}}(\rho )\ge 0\). In the case where the cost is given by squared Riemannian distance, the term \({{\,\textrm{tr}\,}}((\nabla V)^2)-\nabla \cdot W\) is a quadratic form involving the Bakry–Emery tensor [19, 21]. In the following section, we will generalize this quadratic form for an arbitrary Tonelli Lagrangian.

5 Generalized curvature for Tonelli Lagrangians

The goal of this section is to define a generalized curvature for the space \(({\mathbb {R}}^d, L)\). In principle, this generalized curvature is similar to the Ricci curvature in the sense that it is related to the deformation of a shape flowing along action-minimizing curves. The generalized curvature, however, will not be a tensor because it will depend on both the tangent vector and its gradient. Throughout this section, we will assume that L is a \(C^3\) Tonelli Lagrangian.

Let \(T>0\) and \(\sigma :[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) be such that

$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{d}{dt}\big ( (\nabla _v L)(\sigma (t,x),{\dot{\sigma }}(t,x)) \big ) = (\nabla _x L)(\sigma (t,x),{\dot{\sigma }}(t,x))\;, \quad \forall (t,x)\in [0,T]\times {\mathbb {R}}^d\\ \sigma (0,x) = x\;, \quad \forall x\in {\mathbb {R}}^d\\ \sigma (t,\cdot ) : {\mathbb {R}}^d\rightarrow {\mathbb {R}}^d \text { is a } C^3-\text {diffeomorphism for every } t\in [0,T] \end{array}\right. } \end{aligned}$$

Let \(V:[0,T]\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) be a time-dependent vector field defined by \({\dot{\sigma }}(t,x) = V(t,\sigma (t,x))\) so that \(V(t,\cdot )\in C^2({\mathbb {R}}^d; {\mathbb {R}}^d)\) for every \(t\in [0,T]\). Following the method outlined in [21] Chapter 14, we first derive Lemma 5.1, which is an ODE of the Jacobian matrix \(\nabla \sigma \).

Lemma 5.1

Define \(A,B:{\mathbb {R}}^d \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d\times d}\) by

$$\begin{aligned} A(x,v)&= (\nabla _{vv}^2 L)(x,v)^{-1}\bigg [ \frac{d}{dt}\big ((\nabla _{vv}^2 L)(\gamma _{x,v}(t),{\dot{\gamma }}_{x,v}(t))\big )\bigg |_{t=0} + (\nabla _{vx}^2 L)(x,v) - (\nabla _{xv}^2 L)(x,v) \bigg ] \end{aligned}$$
(5.1)
$$\begin{aligned} B(x,v)&= (\nabla _{vv}^2 L)(x,v)^{-1}\bigg [ \frac{d}{dt}\big ((\nabla _{vx}^2 L)(\gamma _{x,v}(t),{\dot{\gamma }}_{x,v}(t))\big )\bigg |_{t=0} - (\nabla _{xx}^2 L)(x,v) \bigg ] \end{aligned}$$
(5.2)

where \(\gamma _{x,v}:[0,\epsilon ) \rightarrow {\mathbb {R}}^d\) is the unique curve satisfying the Euler–Lagrange equation with initial conditions \(\gamma _{x,v}(0) = x\), \({\dot{\gamma }}_{x,v}(0) = v\). Then the Jacobian \(\nabla \sigma \) satisfies a second-order matrix equation

$$\begin{aligned} \nabla \ddot{\sigma } + A(\sigma ,{\dot{\sigma }})\nabla {\dot{\sigma }} + B(\sigma ,{\dot{\sigma }})\nabla \sigma = 0. \end{aligned}$$
(5.3)

Proof

Taking the spatial gradient of the Euler–Lagrange equation,

$$\begin{aligned} 0&= \nabla _x \bigg ( \frac{d}{dt}\big ( (\nabla _v L)(\sigma ,{\dot{\sigma }}) \big ) - (\nabla _x L)(\sigma ,{\dot{\sigma }}) \bigg ) \\&= \frac{d}{dt}\bigg ((\nabla _{vx}^2 L)(\sigma ,{\dot{\sigma }})\nabla \sigma + (\nabla _{vv}^2 L)(\sigma ,{\dot{\sigma }})\nabla {\dot{\sigma }}\bigg ) - (\nabla _{xx}^2 L)(\sigma ,{\dot{\sigma }})\nabla \sigma - (\nabla _{xv}^2 L)(\sigma ,{\dot{\sigma }})\nabla {\dot{\sigma }}\\&= \frac{d}{dt}\big ((\nabla _{vx}^2 L)(\sigma ,{\dot{\sigma }})\big )\nabla \sigma + (\nabla _{vx}^2 L)(\sigma ,{\dot{\sigma }})\nabla {\dot{\sigma }} + \frac{d}{dt}\big ((\nabla _{vv}^2 L)(\sigma ,{\dot{\sigma }})\big )\nabla {\dot{\sigma }} + (\nabla _{vv}^2 L)(\sigma ,{\dot{\sigma }})\nabla \ddot{\sigma }\\&\quad - (\nabla _{xx}^2 L)(\sigma ,{\dot{\sigma }})\nabla \sigma - (\nabla _{xv}^2 L)(\sigma ,{\dot{\sigma }})\nabla {\dot{\sigma }} \end{aligned}$$

To conclude, we group by the terms \(\nabla \sigma , \nabla {\dot{\sigma }}, \nabla \ddot{\sigma }\) and multiply by \((\nabla _{vv}^2 L)(\sigma ,{\dot{\sigma }})^{-1}\). \(\square \)

Lemma 5.2

Define

$$\begin{aligned} {\mathcal {U}}(t,x) = (\nabla V)(t,\sigma (t,x)). \end{aligned}$$
(5.4)

Then

$$\begin{aligned} \dot{{\mathcal {U}}} + {\mathcal {U}}^2 + A(\sigma ,{\dot{\sigma }}){\mathcal {U}} + B(\sigma ,{\dot{\sigma }}) = 0. \end{aligned}$$
(5.5)

Proof

First, we note that since \({\dot{\sigma }}(t,x) = V(t,\sigma (t,x))\), we get

$$\begin{aligned} (\nabla {\dot{\sigma }})(t,x)&= (\nabla V)(t,\sigma (t,x))(\nabla \sigma )(t,x)\\ \implies (\nabla V)(t,\sigma (t,x))&= (\nabla {\dot{\sigma }})(t,x) ((\nabla \sigma )(t,x))^{-1} \end{aligned}$$

and so

$$\begin{aligned} {\mathcal {U}}(t,x) = (\nabla {\dot{\sigma }})(t,x) ((\nabla \sigma )(t,x))^{-1}. \end{aligned}$$

Using the matrix identity \(\frac{d}{dt}M^{-1} = -M^{-1}{\dot{M}}M^{-1}\),

$$\begin{aligned} \dot{{\mathcal {U}}}&= (\nabla \ddot{\sigma })(\nabla \sigma )^{-1} - (\nabla {\dot{\sigma }})(\nabla \sigma )^{-1}(\nabla {\dot{\sigma }})(\nabla \sigma )^{-1}\\&= (\nabla \ddot{\sigma })(\nabla \sigma )^{-1} - {\mathcal {U}}^2. \end{aligned}$$

By Lemma 5.1, \(\nabla \ddot{\sigma } = -A(\sigma ,{\dot{\sigma }})(\nabla {\dot{\sigma }}) - B(\sigma ,{\dot{\sigma }})(\nabla \sigma )\) and so

$$\begin{aligned} 0&= \dot{{\mathcal {U}}} + {\mathcal {U}}^2 + \bigg (A(\sigma ,{\dot{\sigma }})(\nabla {\dot{\sigma }}) - B(\sigma ,{\dot{\sigma }})(\nabla \sigma )\bigg )(\nabla \sigma )^{-1}\\&= \dot{{\mathcal {U}}} + {\mathcal {U}}^2 + A(\sigma ,{\dot{\sigma }}){\mathcal {U}} + B(\sigma ,{\dot{\sigma }}). \end{aligned}$$

\(\square \)

We want to show that the term \({{\,\textrm{tr}\,}}((\nabla V)^2)-\nabla \cdot W\) appearing in the displacement Hessian formula (4.3) arises from Eq. (5.5). Taking the trace of (5.5), we have

$$\begin{aligned} \frac{d}{dt}\bigg ((\nabla \cdot V)(t,\sigma )\bigg ) + {{\,\textrm{tr}\,}}\bigg ((\nabla V)(t,\sigma )^2 + A(\sigma ,{\dot{\sigma }})(\nabla V)(t,\sigma ) + B(\sigma ,{\dot{\sigma }})\bigg ) = 0. \end{aligned}$$
(5.6)

On the other hand, direct computation yields

$$\begin{aligned} \frac{d}{dt}\bigg ((\nabla \cdot V)(t,\sigma )\bigg )&= (\nabla \cdot {\dot{V}})(t,\sigma ) + \langle V(t,\sigma ), (\nabla (\nabla \cdot V))(t,\sigma ) \rangle . \end{aligned}$$

Since \(V(t,\sigma (t,x)) = {\dot{\sigma }}(t,x)\) and \(\sigma (0,x) = x\), we may restate the above equation as

$$\begin{aligned}&(\nabla \cdot {\dot{V}})(t,x) + \langle V(t,x), (\nabla (\nabla \cdot V))(t,x) \rangle \nonumber \\&\quad + {{\,\textrm{tr}\,}}\bigg ((\nabla V)(t,x)^2 + A(x,V(t,x))(\nabla V)(t,x) + B(x,V(t,x))\bigg ) = 0. \end{aligned}$$
(5.7)

Using the identities

$$\begin{aligned} \nabla \cdot ((\nabla \cdot V)V) = (\nabla \cdot V)^2 + \langle V,\nabla (\nabla \cdot V) \rangle \end{aligned}$$

and

$$\begin{aligned} {\dot{V}} = -\nabla VV + W , \end{aligned}$$

we see that

$$\begin{aligned} \nabla \cdot {\dot{V}} + \langle V,\nabla (\nabla \cdot V) \rangle&= \nabla \cdot (-\nabla VV + W) + \nabla \cdot ((\nabla \cdot V)V) - (\nabla \cdot V)^2. \end{aligned}$$

By the computation of the displacement Hessian from the previous section, this is precisely \({{\,\textrm{tr}\,}}((\nabla V)^2)-\nabla \cdot W\).

At this point, (5.7) holds for all time-dependent \(C^2\) vector fields whose integral curves satisfy the Euler–Lagrange equation  (3.3). To show that (5.7) holds for an arbitrary fixed vector field, we first need to make sense of the term \({\dot{V}}\) by introducing Definition 5.4.

Proposition 5.3

Given any fixed vector field \(V_0 \in C^2({\mathbb {R}}^d; {\mathbb {R}}^d)\), we may extend it for a short time to a unique time-dependent vector field V(tx), \(t\in [0,\epsilon )\) with the following properties:

  1. (i)

    \(V(0,\cdot ) = V_0\)

  2. (ii)

    The integral curves of V satisfy the Euler–Lagrange equation, i.e.

    $$\begin{aligned} {\dot{\sigma }}(t,x)&= V(t,\sigma (t,x))\\ \sigma (0,x)&= x\\ \frac{d}{dt}((\nabla _v L)(\sigma ,{\dot{\sigma }}))&= (\nabla _x L)(\sigma ,{\dot{\sigma }}) \end{aligned}$$

Proof

We recall Definition 3.3 and the existence of a Lagrangian flow \(\Phi = (\Phi _1,\Phi _2)\) satisfying \(\frac{d}{dt}((\nabla _v L)(\Phi )) = (\nabla _x L)(\Phi )\). Set \(\sigma (t,x) = \Phi _1(t,x,V_0(x))\). The maps \(\sigma (t,\cdot ):{\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) are defined for all \(t\in [0,+\infty )\) and there exists \(\epsilon >0\) such that \(\sigma (t,\cdot )\) is invertible for \(t\in [0,\epsilon )\). Thus, for \(t\in [0,\epsilon )\), we may define the desired vector field by

$$\begin{aligned} V(t,y) = {\dot{\sigma }}(t,\sigma ^{-1}(t,y)). \end{aligned}$$
(5.8)

\(\square \)

Definition 5.4

Given a Tonelli Lagrangian L, we define the operation

$$\begin{aligned} \Gamma _L : C^2({\mathbb {R}}^d ; {\mathbb {R}}^d)&\rightarrow C^2({\mathbb {R}}^d ; {\mathbb {R}}^d)\\ V_0&\mapsto {\dot{V}}(0,\cdot ) \end{aligned}$$

as in Proposition 5.3.

Remark 5.5

By the Euler–Lagrange equation (3.3), we can give an explicit formula for \(\Gamma _L(V_0)\). Suppose \(\sigma (t,x)\) and V(tx) satisfy the two properties in Proposition 5.3, then

$$\begin{aligned} \ddot{\sigma }(t,x) = {\dot{V}}(t,\sigma (t,x))+(\nabla V)(t,\sigma (t,x))V(t,\sigma (t,x)). \end{aligned}$$

Since

$$\begin{aligned} \ddot{\sigma }(t,x) = (\nabla _{vv}^2 L)(x,V(t,x))^{-1}\bigg ( (\nabla _x L)(x,V(t,x)) - (\nabla _{vx}^2 L)(x,V(t,x))V(t,x) \bigg ) \end{aligned}$$

by the Euler–Lagrange equation, we have

$$\begin{aligned} (\Gamma _L(V_0))(x)&= {\dot{V}}(0,x)\\&= (\nabla _{vv}^2 L)(x,V_0(x))^{-1}\bigg ( (\nabla _x L)(x,V_0(x)) - (\nabla _{vx}^2 L)(x,V_0(x))V_0(x) \bigg ) \\&\quad - (\nabla V_0 (x))V_0(x). \end{aligned}$$

Definition 5.6

(Generalized curvature) Let \(\xi \in C^2({\mathbb {R}}^d; {\mathbb {R}}^d)\). For each \(x\in {\mathbb {R}}^d\), we define the generalized curvature \({\mathcal {K}}_x\) by

$$\begin{aligned} {\mathcal {K}}_x (\xi ) {:}{=}{{\,\textrm{tr}\,}}\bigg (\nabla \xi (x)^2 + A(x,\xi (x))\nabla \xi (x) + B(x,\xi (x))\bigg ) \end{aligned}$$
(5.9)

where \(A,B:{\mathbb {R}}^d \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d\times d}\) are defined as in Lemma 5.1.

Theorem 5.7

Let \(\xi \in C^2({\mathbb {R}}^d; {\mathbb {R}}^d)\). Then

$$\begin{aligned} -\big (\nabla \cdot \big (\Gamma _L (\xi )\big )\big )(x) - \langle \xi (x),(\nabla (\nabla \cdot \xi ))(x) \rangle = {\mathcal {K}}_x(\xi ). \end{aligned}$$
(5.10)

In particular, the generalized curvature \({\mathcal {K}}_x\) is intrinsic, i.e. does not depend on the choice of coordinates.

Proof

By Proposition 5.3, we may extend \(\xi \) for a short time to a time-dependent vector field V(tx), with \(V(0,\cdot ) = \xi \), whose integral curves satisfy the Euler–Lagrange equation. Thus, (5.7) holds for V and we have

$$\begin{aligned} {\mathcal {K}}_x(\xi )&= {\mathcal {K}}_x(V(0,\cdot ))\\&\overset{(5.7)}{=} -(\nabla \cdot {\dot{V}})(0,x) - \langle V(0,x),(\nabla (\nabla \cdot V))(0,x) \rangle \\&= -\big (\nabla \cdot \big (\Gamma _L (\xi )\big )\big )(x) - \langle \xi (x),(\nabla (\nabla \cdot \xi ))(x) \rangle \end{aligned}$$

To show that \({\mathcal {K}}_x\) is intrinsic, we will show that the operator

$$\begin{aligned} \xi \mapsto -\nabla \cdot \big (\Gamma _L (\xi )\big ) - \langle \xi ,\nabla (\nabla \cdot \xi ) \rangle \end{aligned}$$
(5.11)

is invariant under a change of coordinates. By Definition 5.4 and the definition of divergence, \(-\nabla \cdot \big (\Gamma _L (\xi )\big )\) is coordinate-free. Next, observe that \(\langle \xi ,\nabla (\nabla \cdot \xi ) \rangle \) is the directional derivative of \(\nabla \cdot \xi \) (which is coordinate-free) with respect to \(\xi \). Thus,

$$\begin{aligned} \langle \xi (x),\nabla (\nabla \cdot \xi )(x) \rangle&= \lim _{h\rightarrow 0}\frac{(\nabla \cdot \xi )(x + h\xi (x)) - (\nabla \cdot \xi )(x)}{h} \end{aligned}$$

is also coordinate-free. \(\square \)

In the case where \(\xi (x) = (\nabla _p H)(x,\nabla u(x)) \iff \nabla u(x) = (\nabla _v L)(x,\xi (x))\) for some potential \(u:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\) (cf. Proposition 3.12, Lemma 4.1), we can derive an explicit formula for \({\mathcal {K}}_x(\xi )\).

Theorem 5.8

(Formula for \({\mathcal {K}}_x(\xi )\)) Let \(\xi \in C^2({\mathbb {R}}^d; {\mathbb {R}}^d)\) be such that there exists \(u\in C^2({\mathbb {R}}^d)\), with

$$\begin{aligned} \nabla u(x) = (\nabla _v L)(x, \xi (x))\;,\quad \forall x\in {\mathbb {R}}^d. \end{aligned}$$

Then,

$$\begin{aligned} {\mathcal {K}}_x(\xi )&= L^{ik}\frac{\partial \xi _j}{\partial x_k}\frac{\partial ^2 L}{\partial v_j \partial v_l}\frac{\partial \xi _l}{\partial x_i} - L^{im}\frac{\partial ^3 L}{\partial v_m \partial v_j \partial v_k}\xi _l \frac{\partial \xi _j}{\partial x_i}\frac{\partial \xi _k}{\partial x_l}\\&\quad + L^{im}\frac{\partial ^3 L}{\partial v_m \partial v_j \partial v_k}L^{kl}\frac{\partial L}{\partial x_l}\frac{\partial \xi _j}{\partial x_i} - L^{ir}\frac{\partial ^3 L}{\partial v_r \partial v_j \partial v_k}L^{kl}\frac{\partial ^2 L}{\partial x_l \partial v_m}\frac{\partial \xi _j}{\partial x_i}\xi _m\\&\quad - L^{kl}\frac{\partial ^3 L}{\partial x_k \partial v_j \partial v_l}\frac{\partial \xi _j}{\partial x_i}\xi _i + L^{ij}\frac{\partial ^3 L}{\partial x_i \partial v_j \partial v_k}L^{kl}\frac{\partial L}{\partial x_l}\\&\quad - L^{ij}\frac{\partial ^3 L}{\partial x_i \partial v_j \partial v_k}L^{kl}\frac{\partial ^2 L}{\partial x_l \partial v_m}\xi _m - L^{ij}\frac{\partial ^2 L}{\partial x_j \partial x_i} \end{aligned}$$

where all terms involving L are evaluated at \((x,\xi (x))\).

Proof

See Appendix. \(\square \)

In conclusion, the displacement Hessian formula (4.3) can be written as

$$\begin{aligned} \frac{d^2}{dt^2}{\mathcal {F}}(\rho ) = \int _{{\mathbb {R}}^d}(\rho G'(\rho )-G(\rho ))(\nabla \cdot V)^2 + G(\rho ){\mathcal {K}}_x (V)\;dx. \end{aligned}$$
(5.12)

6 Displacement convexity for a non-Riemannian Lagrangian cost

In this section, we provide an example of a Lagrangian cost that is not a squared Riemannian distance. We prove using a perturbation argument that the corresponding generalized curvature is non-negative and thus the generalized entropy functional is convex along \(C^2\) displacement interpolants.

Let g(x) be a positive definite matrix for every \(x\in {\mathbb {R}}^d\) so that \(\frac{1}{2}\langle v, g(x)v\rangle \), \(v\in {\mathbb {R}}^d\) defines a Riemannian metric. Let \(g_{ij} = g_{ij}(x)\) denote the ij-th entry of g(x) and \(g^{ij} = g^{ij}(x)\) denote the ij-th entry of the inverse matrix \(g(x)^{-1}\). Further assume that the \(g_{ij}\) are bounded with bounded derivatives, and that the corresponding Bakry–Emery tensor (denoted BE\(_g\)) is bounded from below. That is,

$$\begin{aligned} \text {BE}_g = \text {Ric}+\nabla ^2 \bigg (\frac{1}{2}\text {log}(\text {det }g)\bigg )\ge k_g >0. \end{aligned}$$

Define the Lagrangian

$$\begin{aligned} L(x,v) = \frac{1}{2}\langle v, g(x)v\rangle \end{aligned}$$

and the perturbed Lagrangian

$$\begin{aligned} {\tilde{L}}(x,v) = \frac{1}{2}\langle v, g(x)v\rangle + \varphi (v) \end{aligned}$$

where \(\varphi :{\mathbb {R}}^d \rightarrow {\mathbb {R}}\) is a smooth perturbation (for instance, take \(\varphi \) to be of Schwartz class). Using Theorem 5.8, the respective generalized curvatures are given by

$$\begin{aligned} {\mathcal {K}}_x(\xi )&= \underbrace{g^{ik}\frac{\partial \xi _j}{\partial x_k}g_{jl}\frac{\partial \xi _l}{\partial x_i}}_{\MakeUppercase {i}} - \underbrace{g^{kl}\frac{\partial g_{jl}}{\partial x_k}\frac{\partial \xi _j}{\partial x_i}\xi _i}_{\MakeUppercase {v}} \\&\quad + \underbrace{g^{ij}\frac{\partial g_{jk}}{\partial x_i}g^{kl}\frac{\partial g_{mn}}{\partial x_l}\xi _m \xi _n}_{\MakeUppercase {vi}}\\&\quad - \underbrace{g^{ij}\frac{\partial g_{jk}}{\partial x_i}g^{kl}\frac{\partial g_{nm}}{\partial x_l}\xi _n\xi _m}_{\MakeUppercase {vii}} - \underbrace{g^{ij}\frac{\partial ^2 g_{kl}}{\partial x_j \partial x_i}\xi _k \xi _l}_{\MakeUppercase {viii}} \end{aligned}$$

and

$$\begin{aligned} \tilde{{\mathcal {K}}}_x(\xi )&= \underbrace{{\tilde{L}}^{ik}\frac{\partial \xi _j}{\partial x_k}\frac{\partial ^2 {\tilde{L}}}{\partial v_j \partial v_l}\frac{\partial \xi _l}{\partial x_i}}_{\tilde{\MakeUppercase {i}}} - \underbrace{{\tilde{L}}^{im}\frac{\partial ^3 \varphi }{\partial v_m \partial v_j \partial v_k}\xi _l \frac{\partial \xi _j}{\partial x_i}\frac{\partial \xi _k}{\partial x_l}}_{\tilde{\MakeUppercase {ii}}}\\&\quad + \underbrace{{\tilde{L}}^{im}\frac{\partial ^3 \varphi }{\partial v_m \partial v_j \partial v_k}{\tilde{L}}^{kl}\frac{\partial g_{nr}}{\partial x_l}\xi _n \xi _r \frac{\partial \xi _j}{\partial x_i}}_{\tilde{\MakeUppercase {iii}}} - \underbrace{{\tilde{L}}^{ir}\frac{\partial ^3 \varphi }{\partial v_r \partial v_j \partial v_k}{\tilde{L}}^{kl}\frac{\partial g_{mn}}{\partial x_l}\xi _n \frac{\partial \xi _j}{\partial x_i}\xi _m}_{\tilde{\MakeUppercase {iv}}}\\&\quad - \underbrace{{\tilde{L}}^{kl}\frac{\partial g_{jl}}{\partial x_k} \frac{\partial \xi _j}{\partial x_i}\xi _i}_{\tilde{\MakeUppercase {v}}} + \underbrace{{\tilde{L}}^{ij}\frac{\partial g_{jk}}{\partial x_i} {\tilde{L}}^{kl}\frac{\partial g_{mn}}{\partial x_l}\xi _m \xi _n}_{\tilde{\MakeUppercase {vi}}}\\&\quad - \underbrace{{\tilde{L}}^{ij}\frac{\partial g_{jk}}{\partial x_i}{\tilde{L}}^{kl}\frac{\partial g_{mn}}{\partial x_l}\xi _n \xi _m}_{\tilde{\MakeUppercase {vii}}} - \underbrace{{\tilde{L}}^{ij}\frac{\partial ^2 g_{kl}}{\partial x_j \partial x_i}\xi _k \xi _l}_{\tilde{\MakeUppercase {viii}}} \end{aligned}$$

By Theorem A.3.1 of [19] and (5.7), \({\mathcal {K}}_x(\xi ) = ||g^{-1}\nabla \xi ^\top ||_{\text {HS}}^2 + \text {BE}_g(\xi )\), where \(||\cdot ||_{\text {HS}}\) denotes the Hilbert-Schmidt norm. Thus, we have a lower bound

$$\begin{aligned} {\mathcal {K}}_x(\xi ) = ||g^{-1}\nabla \xi ^\top ||_{\text {HS}}^2 + \text {BE}_g(\xi ) \ge c_g||\nabla \xi ||^2 + k_g ||\xi ||^2 \end{aligned}$$

where \(c_g>0\) is a constant depending on g. Fix \(\epsilon >0\) such that \(\epsilon \le \min \{\frac{c_g}{10},\frac{k_g}{12}\}\). Our goal is to choose \(\varphi \) so that

  1. 1.

    \(|{\tilde{L}}^{ij} - L^{ij}| = |{\tilde{L}}^{ij} - g^{ij}|\) is sufficiently small, i.e. \(||\nabla ^2 \varphi ||\) is close to zero, and

  2. 2.

    \(|\frac{\partial ^3 \varphi }{\partial v_i \partial v_j \partial v_k}|\) is sufficiently small.

To this end, we choose \(\varphi \) such that

$$\begin{aligned} |\tilde{{\mathcal {K}}}_x(\xi ) - {\mathcal {K}}_x(\xi )|&\le |\tilde{\MakeUppercase {i}}-\MakeUppercase {i}|+ |\tilde{\MakeUppercase {v}}-\MakeUppercase {v}|+ |\tilde{\MakeUppercase {vi}}-\MakeUppercase {vi}|+ |\tilde{\MakeUppercase {vii}}-\MakeUppercase {vii}|+ |\tilde{\MakeUppercase {viii}}-\MakeUppercase {viii}|\\&\quad + |\tilde{\MakeUppercase {ii}}|+|\tilde{\MakeUppercase {iii}}|+|\tilde{\MakeUppercase {iv}}|\\&\le \epsilon ||\nabla \xi ||^2 + 2\epsilon ||\nabla \xi ||||\xi || + \epsilon ||\xi ||^2 + \epsilon ||\xi ||^2 + \epsilon ||\xi ||^2 \\&\quad + \epsilon ||\nabla \xi ||^2 + 2\epsilon ||\nabla \xi ||||\xi || + 2\epsilon ||\nabla \xi ||||\xi ||\\&\le 5\epsilon ||\nabla \xi ||^2 + 6\epsilon ||\xi ||^2\\&\le \frac{c_g}{2}||\nabla \xi ||^2 + \frac{k_g}{2}||\xi ||^2 \end{aligned}$$

Since \({\mathcal {K}}_x(\xi ) \ge c_g||\nabla \xi ||^2 + k_g ||\xi ||^2\), we conclude that \(\tilde{{\mathcal {K}}}_x(\xi ) \ge 0\).