Skip to main content

A second order equation for Schrödinger bridges with applications to the hot gas experiment and entropic transportation cost

Abstract

The Schrödinger problem is obtained by replacing the mean square distance with the relative entropy in the Monge–Kantorovich problem. It was first addressed by Schrödinger as the problem of describing the most likely evolution of a large number of Brownian particles conditioned to reach an “unexpected configuration”. Its optimal value, the entropic transportation cost, and its optimal solution, the Schrödinger bridge, stand as the natural probabilistic counterparts to the transportation cost and displacement interpolation. Moreover, they provide a natural way of lifting from the point to the measure setting the concept of Brownian bridge. In this article, we prove that the Schrödinger bridge solves a second order equation in the Riemannian structure of optimal transport. Roughly speaking, the equation says that its acceleration is the gradient of the Fisher information. Using this result, we obtain a fine quantitative description of the dynamics, and a new functional inequality for the entropic transportation cost, that generalize Talagrand’s transportation inequality. Finally, we study the convexity of the Fisher information along Schrödigner bridges, under the hypothesis that the associated reciprocal characteristic is convex. The techniques developed in this article are also well suited to study the Feynman–Kac penalisations of Brownian motion.

This is a preview of subscription content, access via your institution.

Fig. 1

Notes

  1. \(\mathbf {P}\) may not be a probability measure, but an infinite measure. This won’t be a problem as long as \(\mathbf {Q}\) is a probability measure. The note [34] takes care of this issue in detail.

  2. Schrödinger writes in [49] “un écart spontané et considerable par rapport à cette uniformité” .

  3. We recall that \((v_t)\) is the velocity field of \((\mu _t)\), \(\langle \cdot , \cdot \rangle _{\mathbf {T}_{\mu _t}}\) is the inner product in \(L^2_{\mu _t}\) and \(\frac{\mathbf {D}}{dt}\) the covariant derivative. We also denote \(Dv_t\) the Jacobian matrix of \(v_t\). Finally, we abbreviate \(\partial _{x_i}\) with \(\partial _i\), and adopt the same convention for higher-order derivatives.

  4. In the original result of Benamou and Brenier \(v_t\) is not the velocity field of \((\mu _t)\), but just an arbitrary weak solution to the continuity equation. However, it is easy to see that the representation formula for the Wasserstein distance remains true if we restrict the minimization to the couples \((\mu _t,v_t)\) such that \((v_t)\) is the velocity field of \((\mu _t)\).

  5. The constant \(\frac{\lambda }{2}\) instead of \(\lambda \) is because the generator \(\mathscr {L}\) has a \(\frac{1}{2}\Delta \) as second order part instead of \(\Delta \).

References

  1. Ambrosio, L., Gangbo, W.: Hamiltonian odes in the wasserstein space of probability measures. Commun. Pure Appl. Math. 61(1), 18–53 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  2. Ambrosio, L., Gigli, N.: A user’s guide to optimal transport. In: Piccoli, B., Rascle, M. (eds.) Modelling and Optimisation of Flows on Networks: Cetraro, Italy 2009, pp. 1–155. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-32160-3_1

    Chapter  Google Scholar 

  3. Bakry, D., Gentil, I., Ledoux, M.: Analysis and Geometry of Markov Diffusion Operators, vol. 348. Springer, Berlin (2013)

    MATH  Google Scholar 

  4. Benamou, J.-D., Brenier, Y.: A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numer. Math. 84(3), 375–393 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  5. Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cattiaux, P., Léonard, C.: Minimization of the Kullback information of diffusion processes. Annal. de l’IHP Probab. et Stat. 30(1), 83–132 (1994)

    MathSciNet  MATH  Google Scholar 

  7. Chen, Y., Tryphon, T.G., Michele, P.: On the relation between optimal transport and Schrödinger bridges: a stochastic control viewpoint. J. Optim. Theory Appl. 169(2), 671–691 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen, Y., Tryphon, T.G.: Optimal steering of a linear stochastic system to a final probability distribution, part i. IEEE Trans. Autom. Control 61(5), 1158–1169 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chen, Y., Tryphon, T.G.: Optimal steering of a linear stochastic system to a final probability distribution, part ii. IEEE Trans. Autom. Control 61(5), 1170–1180 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  10. Chow, S.-N., Li, W., Zhou, H.: A discrete schrodinger equation via optimal transport on graphs. arXiv preprint arXiv:1705.07583 (2017)

  11. Clark, J.M.C.: A local characterization of reciprocal diffusions. Appl. Stoch. Anal. 5, 45–59 (1991)

    MathSciNet  MATH  Google Scholar 

  12. Conforti, G.: Fluctuations of bridges, reciprocal characteristics, and concentration of measure. preprint arXiv:1602.07231 to appear in Annales de l’Institut Henri Poincaré (2016)

  13. Conforti, G., Léonard, C.: Reciprocal classes of random walks on graphs. Stoch. Process. Appl. 127(6), 1870–1896 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  14. Conforti, G., Von Renesse, M.: Couplings, gradient estimates and logarithmic Sobolev inequality for Langevin bridges. Probab. Theory Related Fields (2017). available online

  15. Cruzeiro, A.B., Zambrini, J.C.: Malliavin calculus and Euclidean quantum mechanics. I. Functional calculus. J. Funct. Anal. 96(1), 62–95 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  16. Dai Pra, P.: stochastic control approach to reciprocal diffusion processes. Appl. Math. Optim. 23(1), 313–329 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  17. Dawson, D., Gorostiza, L., Wakolbinger, A.: Schrödinger processes and large deviations. J. Math. Phys. 31(10), 2385–2388 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  18. Donald, A.: Dawson and Jürgen Gärtner. Multilevel large deviations and interacting diffusions. Probab. Theory Relat. Fields 98(4), 423–487 (1994)

    Article  Google Scholar 

  19. Do Carmo, M.P., Flaherty, J.F.: Riemannian Geometry, vol. 115. Birkhäuser, Boston (1992)

    Book  Google Scholar 

  20. Föllmer, H.: Random fields and diffusion processes. In École d’Été de Probabilités de Saint-Flour XV–XVII, 1985–87, pp. 101–203. Springer (1988)

  21. Föllmer, H., Gantert, N., et al.: Entropy minimization and schrödinger processes in infinite dimensions. Ann. Probab. 25(2), 901–926 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  22. Galichon, A., Kominers, S.D., Weber, S.: The nonlinear bernstein-schrödinger equation in economics. In: Nielsen, F., Barbaresco, F. (eds.) Geometric Science of Information, pp. 51–59. Springer International Publishing, Cham (2015)

    Chapter  Google Scholar 

  23. Gentil, I., Léonard, C., Ripani, L.: About the analogy between optimal transport and minimal entropy. Annales de la facultés des sciences de Toulouse Sér. 6 26(3), 569–700 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  24. Gianazza, U., Savaré, G., Toscani, G.: The wasserstein gradient flow of the fisher information and the quantum drift-diffusion equation. Arch. Ration. Mech. Anal. 194(1), 133–220 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  25. Gigli, N.: Second Order Analysis on (P2(M),W2). Memoirs of the American Mathematical Society, Providence (2012)

    Google Scholar 

  26. Gigli, N., Tamanini, L.: Second order differentiation formula on compact RCD*(K, N) spaces. arXiv preprint arXiv:1701.03932 (2017)

  27. Gozlan, N., Roberto, C., Samson, P.-M., Tetali, P.: Kantorovich duality for general transport costs and applications. J. Funct. Anal. 273(11), 3327–3405 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  28. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge university press, Cambridge (2012)

    Book  Google Scholar 

  29. Krener, A.J.: Reciprocal diffusions and stochastic differential equations of second order. Stochastics 107(4), 393–422 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  30. Krener, A.J.: Reciprocal diffusions in flat space. Probab. Theory Relat. Fields 107(2), 243–281 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  31. Léonard, C.: From the Schrödinger problem to the Monge–Kantorovich problem. J. Funct. Anal. 262(4), 1879–1920 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  32. Léonard, C.: A survey of the Schrödinger problem and some of its connections with optimal transport. Discrete Contin. Dyn. Syst. 34(4), 1533–1574 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  33. Léonard, C., Rœlly, S., Zambrini, J.C.: Reciprocal processes. A measure-theoretical point of view. Probab. Surv. 11, 237–269 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  34. Léonard, C.: Some properties of path measures. In Séminaire de Probabilités XLVI, pp. 207–230. Springer (2014)

  35. Léonard, C.: On the convexity of the entropy along entropic interpolations. In: Gigli, N. (ed.) Measure Theory in Non-Smooth Spaces, Partial Differential Equations and Measure Theory. De Gruyter Open, Berlin (2017)

    Google Scholar 

  36. Léonard, C., et al.: Lazy random walks and optimal transport on graphs. Ann. Probab. 44(3), 1864–1915 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  37. Levy, B.C., Krener, A.J.: Dynamics and kinematics of reciprocal diffusions. J. Math. Phys. 34(5), 1846–1875 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  38. Li, W., Yin, P., Osher, S.: Computations of optimal transport distance with fisher information regularization. J. Sci. Comput. 19, 1–15 (2017)

    Article  MATH  Google Scholar 

  39. Lott, J., Villani, C.: Ricci curvature for metric-measure spaces via optimal transport. Ann. Math. 169, 903–991 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  40. Mikami, T.: Monge’s problem with a quadratic cost by the zero-noise limit of h-path processes. Probab. Theory Relat. Fields 129(2), 245–260 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  41. Nelson, E.: Dynamical Theories of Brownian Motion, vol. 2. Princeton University Press, Princeton (1967)

    MATH  Google Scholar 

  42. Otto, F.: The geometry of dissipative evolution equations: the porous medium equation. Commun. Partial Differ. Equ. 26(1–2), 101–174 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  43. Otto, F., Villani, C.: Generalization of an inequality by talagrand and links with the logarithmic sobolev inequality. J. Funct. Anal. 173(2), 361–400 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  44. Rœlly, S., Thieullen, M.: A characterization of reciprocal processes via an integration by parts formula on the path space. Probab. Theory Relat. Fields 123(1), 97–120 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  45. Rœlly, S., Thieullen, M.: Duality formula for the bridges of a brownian diffusion: application to gradient drifts. Stoch. Process. Appl. 115(10), 1677–1700 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  46. Roynette, B., Yor, M.: Penalising Brownian Paths, vol. 1969. Springer, Berlin (2009)

    Book  MATH  Google Scholar 

  47. Rüschendorf, L., Thomsen, W.: Note on the Schrödinger equation and I-projections. Stat. Probab. Lett. 17(5), 369–375 (1993)

    Article  MATH  Google Scholar 

  48. Schrödinger, E.: Über die Umkehrung der Naturgesetze. Sitzungsberichte Preuss. Akad. Wiss. Berlin. Phys. Math. 144, 144–153 (1931)

    MATH  Google Scholar 

  49. Schrödinger, E.: La théorie relativiste de l’électron et l’ interprétation de la mécanique quantique. Ann. Inst Henri Poincaré 2, 269–310 (1932)

    MathSciNet  MATH  Google Scholar 

  50. Solomon, J., De Goes, F., Peyré, G., Cuturi, M., Butscher, A., Nguyen, A., Tao, D., Guibas, L.: Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Grap.(TOG) 34(4), 66 (2015)

    MATH  Google Scholar 

  51. Sturm, K.-T.: On the geometry of metric measure spaces. Acta Math. 196(1), 65–131 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  52. Talagrand, M.: Transportation cost for gaussian and other product measures. Geom. Funct. Anal. GAFA 6(3), 587–600 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  53. Thieullen, M.: Second order stochastic differential equations and non-Gaussian reciprocal diffusions. Probab. Theory Relat. Fields 97(1–2), 231–257 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  54. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin (2008)

    MATH  Google Scholar 

  55. von Renesse, M.-K.: An optimal transport view of Schrödinger’s equation. Can. Math. Bull. 55(4), 858–869 (2012)

    Article  MATH  Google Scholar 

  56. von Renesse, M.-K., Sturm, K.-T.: Transport inequalities, gradient estimates, entropy and Ricci curvature. Commun. Pure Appl. Math. 58(7), 923–940 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  57. Wakolbinger, A.: A simplified variational characterization of Schrödinger processes. J. Math. Phys. 30(12), 2943–2946 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  58. Zambrini, J.C.: Variational processes and stochastic versions of mechanics. J. Math. Phys. 27(9), 2307–2330 (1986)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The author acknowledges support from CEMPI Lille and the University of Lille 1. He also wishes to thank Christian Léonard for having introduced him to the Schrödinger problem, and for many fruitful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giovanni Conforti.

Appendix

Appendix

The following Lemma has been used in the proof of Theorem 1.4. Here, we denote \(\dot{f},\ddot{f}\) the first and second derivatives of a function on the real line.

Lemma 4.1

Let \(\phi :[0,1]\rightarrow \mathbb {R}\) be twice differentiable on (0, 1) and continuous on [0, 1] .

  1. (i)

    If \(\ddot{\phi }_t \ge \lambda \dot{\phi }_t\) for all \(t \in (0,1)\), then

    $$\begin{aligned} \forall t \in [0,1], \quad \phi _t \le \phi _1 + (\phi _0 - \phi _1 ) \frac{1-\exp (-\lambda (1-t) )}{1-\exp (-\lambda )}.\end{aligned}$$
    (70)
  2. (ii)

    If \(\ddot{\phi }_t \ge - \lambda \dot{\phi }_t\) for all \(t \in (0,1)\), then

    $$\begin{aligned}\forall t \in [0,1], \quad \phi _t \le \phi _0 + (\phi _1 - \phi _0 ) \frac{1-\exp (-\lambda t )}{1-\exp (-\lambda )}. \end{aligned}$$

Note that the rhs of (70) rewrites nicely as \( \frac{\exp (\lambda ) - \exp (\lambda t)}{\exp (\lambda ) -1} \phi _0 + \frac{\exp (\lambda t) - 1}{\exp (\lambda ) -1} \phi _1\).

Proof

We prove only (i), as (ii) follows from (i) with a simple time-reversal argument. Let g be the unique solution of the differential equation

$$\begin{aligned} \ddot{g}_t= \lambda \dot{g}_t,\ 0<t<1,\quad g_0=\phi _0,\ g_1=\phi _1. \end{aligned}$$
(71)

All we have to show is: \(h:= \phi -g\le 0,\) because a direct calculation shows that the solution of (71) is \( \displaystyle { g_t= \phi _1 + (\phi _0 - \phi _1 ) \frac{1-\exp (-\lambda (1-t) )}{1-\exp (-\lambda )}. }\)

We see that \(\ddot{h}_t\ge \lambda \dot{h}_t, \ 0<t<1\) with \(h_0=h_1=0.\) Considering the function \(u_t:=e ^{ -\lambda t}\dot{h}_t,\)\(0\le t\le 1,\) we have \(\dot{u}_t=e^{ -\lambda t}[\ddot{h}_t-\lambda \dot{h}_t]\ge 0,\) which implies that u is increasing, that is:

$$\begin{aligned} \dot{h}_t\ge \dot{h}_{t^*} e ^{ \lambda (t-t_*)},\qquad 0\le t_*\le t\le 1. \end{aligned}$$
(72)

Suppose ad absurdum that \(h_{t_o}>0\) for some \(0<t_o<1.\) As \(h_0=0,\) there exists some \(0< t_*\le t_o\) such that \(h_{t_*}>0\) and \(\dot{h}_{t_*}>0.\) In view of (72), this implies that h is increasing on \([t_*,1].\) In particular, \(h_{1}\ge h_{t_*}>0,\) contradicting \(h_1=0.\) Hence \(h\le 0.\)\(\square \)

1.1 Hessian of the entropy and gradient of the Fisher information

1.1.1 Hessian of the entropy

In this paragraph we make some formal computations, whose aim is to give an explanation for Eq. (40). We assume \(U=0\) for simplicity. Let \(\mu \in \mathcal {C}^{b,+}_{\infty }\) and \(\nabla \varphi \in \mathcal {C}_{\infty }^c \) be fixed. We consider the constant speed geodesic \((\mu _t)\) such that \(\mu _0 = \mu \) and \(v_0 = \nabla \varphi \). Then, by definition

$$\begin{aligned} \mathbf {Hess}^{\mathcal {W}}\mathscr {H}_U(\nabla \varphi ) = \frac{\mathbf {D}}{dt} \nabla ^{\mathcal {W}}\mathscr {H}_U(\mu _t)\big |_{t=0}. \end{aligned}$$

Using the identification of the covariant derivative at Lemma 3.3 and (37) we have that

$$\begin{aligned}\mathbf {Hess}^{\mathcal {W}}\mathscr {H}_U(\nabla \varphi ) = \partial _t \nabla \log \mu _t + \overline{\nabla }_{v_t} \nabla \log \mu _t \big |_{t=0} \, . \end{aligned}$$

Using the continuity equation

$$\begin{aligned} \partial _t \nabla \log \mu _t= & {} -\nabla \left( \frac{1}{\mu _t} \nabla \cdot (\mu _t v_t) \right) \\= & {} -\nabla (\mathbf {div}(v_t) ) -\nabla \langle \nabla \log \mu _t,v_t\rangle . \end{aligned}$$

Evaluating at \(t=0\) and using \(v_0=\nabla \varphi \), we can rewrite the latter as

$$\begin{aligned} -\nabla \Delta \varphi - \mathbf {Hess} \log \mu (\nabla \varphi ) - \mathbf {Hess} \, \varphi ( \nabla \log \mu ). \end{aligned}$$

Therefore, observing that \(\overline{\nabla }_{v_t} \nabla \log \mu _t \big |_{t=0} = \mathbf {Hess} \log \mu (\nabla \varphi )\), we arrive at

$$\begin{aligned} \mathbf {Hess}^{\mathcal {W}}\mathscr {H}_U(\nabla \varphi ) = -\nabla \Delta \varphi - \mathbf {Hess} \, \varphi ( \nabla \log \mu ) .\end{aligned}$$

Hence, using an integration by parts:

$$\begin{aligned} \langle \mathbf {Hess}^{\mathcal {W}}\mathscr {H}_U(\nabla \varphi ), \nabla \varphi \rangle _{\mathbf {T}_{\mu _t}}= & {} - \int _M \langle \nabla \varphi , \nabla \Delta \varphi \rangle d\mu - \int _M \langle \nabla \varphi , \mathbf {Hess} \varphi (\nabla \log \mu )\rangle d \mu \\= & {} - \int _M \langle \nabla \varphi , \nabla \Delta \varphi \rangle d\mu - \int _M \langle \nabla \mu , \mathbf {Hess} \varphi (\nabla \varphi ) \rangle d \mathbf {vol}\\= & {} -\int _M \langle \nabla \varphi , \nabla \Delta \varphi \rangle d\mu - \frac{1}{2}\int _M \langle \nabla \mu , \nabla |\nabla \varphi |^2 \rangle d \mathbf {vol}\\= & {} \int _M \frac{1}{2} \Delta |\nabla \varphi |^2- \langle \nabla \varphi , \nabla \Delta \varphi \rangle d \mu . \end{aligned}$$

At this point one can use the Bochner–Weitzenböck formula

$$\begin{aligned}\frac{1}{2} \Delta |\nabla \varphi |^2 = \langle \nabla \varphi , \nabla \Delta \varphi \rangle + |\mathbf {Hess} \varphi |^2_{\text {HS}} + \mathbf {\mathfrak {Ric} } (\nabla \varphi , \nabla \varphi ) \end{aligned}$$

and the hypothesis (13) to obtain the conclusion.

1.1.2 Gradient of the Fisher information

In this section, we shall make some formal computations to justify (5). As we did before, we assume \(U=0\) for simplicity. Differentiating the relation (38) and using the definition of Hessian we get

$$\begin{aligned}\nabla ^{\mathcal {W}}\mathscr {I}(\mu ) = 2\, \mathbf {Hess}^{\mathcal {W}}_{\mu } \mathscr {H}(\nabla ^{\mathcal {W}}\mathscr {H}(\mu )) \end{aligned}$$

By the definition of Hessian

$$\begin{aligned}\mathbf {Hess}^{\mathcal {W}}_{\mu } (\nabla ^{\mathcal {W}}\mathscr {H}(\mu ) ) = \frac{\mathbf {D}}{dt} \nabla ^{\mathcal {W}}\mathscr {H}(\mu _t) \Big |_{t=0} ,\end{aligned}$$

where \((\mu _t)\) is any regular enough curve such that \(\mu _0=\mu \), \(v_0=\nabla ^{\mathcal {W}}\mathscr {H}(\mu )\). From Lemma 3.3 such covariant is the projection on the space of gradient vector fields of

$$\begin{aligned} \partial _t \nabla ^{\mathcal {W}}\mathscr {H}(\mu _t) + \overline{\nabla }_{v_t} \nabla ^{\mathcal {W}}\mathscr {H}(\mu _t) \end{aligned}$$

Using the continuity equation in the form \(\partial _t \log \mu _t = - \nabla \cdot v_t - v_t \cdot \nabla \log \mu _t\), and recalling that \(\nabla ^{\mathcal {W}}\mathscr {H}(\mu _t) = \nabla \log (\mu _t)\) we arrive at

$$\begin{aligned} \partial _t \nabla ^{\mathcal {W}}\mathscr {H}(\mu _t) \Big |_{t=0}= & {} - \nabla (\nabla \cdot v_t + v_t \cdot \nabla \log \mu _t ) \Big |_{t=0} \\= & {} - \nabla \big (\nabla \cdot (\nabla \log \mu )\big ) - \nabla ( |\nabla \log \mu |^2 ) \\= & {} - \nabla \Delta \log \mu - \nabla |\nabla \log \mu |^2. \end{aligned}$$

On the other hand

$$\begin{aligned} \overline{\nabla }_{v_t} \nabla ^{\mathcal {W}}\mathscr {H}(\mu _t) \Big |_{t=0} = \mathbf {Hess} \log (\mu _t)(v_t) \Big |_{t=0} = \mathbf {Hess} \log (\mu )(\nabla \log \mu ) = \frac{1}{2} \nabla |\nabla \log \mu |^2. \end{aligned}$$

Therefore

$$\begin{aligned}\partial _t \nabla ^{\mathcal {W}}\mathscr {H}(\mu _t) + \overline{\nabla }_{v_t} \nabla ^{\mathcal {W}}\mathscr {H}(\mu _t) \Big |_{t=0} =- \nabla \Delta \log \mu - \frac{1}{2}\nabla |\nabla \log \mu |^2,\end{aligned}$$

and since the rhs of this vector field is of gradient type,

$$\begin{aligned} \nabla ^{\mathcal {W}}\mathscr {I}(\mu ) = 2\frac{\mathbf {D}}{dt} \nabla ^{\mathcal {W}}\mathscr {H}(\mu _t) \Big |_{t=0} =- 2\nabla \Delta \log \mu - \nabla |\nabla \log \mu |^2 ,\end{aligned}$$

which is (5).

1.2 Lemmas 4.2 and 4.3

These Lemmas are needed in the proof of Theorem 1.2 and 1.3.

Lemma 4.2

Let Assumption 1.1 (B) hold. Then \( \mathcal {T}^{\sigma }_{U} (\mu ,\nu )<+\infty \) and the dual representation (2) holds. Moreover

  1. (i)

    f and g are compactly supported and \(f_t,g_t\) globally bounded on \([0,1] \times \mathbb {R}^d\).

  2. (ii)

    For any \(1 \le l \le 3 \) and \(\varepsilon \in (0,1)\), there exist constants \(A_{l,\varepsilon },B_{l,\varepsilon }\) such that

    $$\begin{aligned} \forall x \in M \sup _{\begin{array}{c} \\ t \in [\varepsilon ,1-\varepsilon ] \end{array} } \sup _{ \begin{array}{c} v_1,..,v_l \in \mathbb {R}^d\\ |v_1|,..,|v_l| \le 1 \end{array}} |\partial _{v_l} \ldots \partial _{v_1} \log f_t(x)| \le A_{l,\varepsilon }+B_{l,\varepsilon }|x|^l , \end{aligned}$$

    and the same conclusion holds replacing \(f_t\) by \(g_t\).

  3. (iii)

    For any \(\varepsilon \in (0,1)\) there exists a constant \(A_{2,\varepsilon }\) such that

    $$\begin{aligned}\sup _{x \in \mathbb {R}^d, t \in [\varepsilon , 1-\varepsilon ]} \sup _{\begin{array}{c} v_2,v_1 \in \mathbb {R}^d\\ |v_2|,|v_1|\le 1 \end{array}} | \partial _{v_2}\partial _{v_1} \log f_t(x) | \le A_{2,\varepsilon }, \end{aligned}$$

and the same conclusion holds replacing \(f_t\) by \(g_t\).

Proof

Since all statements concerning g are proven in the same way as those for f, we limit ourselves to prove the latter ones. In the proof, we assume that \(\sigma =1\), the proof for the general case being almost identical. The fact that \( \mathcal {T}^{\sigma }_{U} (\mu ,\nu )<+\infty \) can be easily settled using point (b) in [32, Prop. 2.5], whereas the dual representation is obtained from [31, Th 2.8]. Let us show that f is compactly supported. Observe

$$\begin{aligned} \frac{d \mu }{d \mathbf {m}} = f(x)g_0(x). \end{aligned}$$

Since \(g_0 \in \mathcal {C}^{+}_{\infty }\), and \(\frac{d\mu }{d\mathbf {m}}\) is compactly supported, f must have the same support as \( \frac{d\mu }{d\mathbf {m}}\). Moreover, since \(g_0\) is bounded from below on the support of f, the fact that \(\frac{d\mu }{d\mathbf {m}}\) is bounded from above, implies that f is bounded from above. It follows from the very definition of \(f_t\) that they must be bounded as well. The proof of (i) is complete. We only do the proof of (ii) and (iii) in the case \(d=1\). This proof can be extended with no difficulty to the general case. We first make some preliminary observations. For \(\alpha \) fixed, the transition density of the Ornstein-Uhlenbeck semigroup is

$$\begin{aligned}p_t(x,z)= \Big ( \frac{\gamma (\alpha ,t)}{2\pi }\Big )^{-1/2} \phi ( \gamma (\alpha ,t) (z- \exp (-\alpha t) x )) \end{aligned}$$

where

$$\begin{aligned}\phi (z) = \exp \left( -\frac{z^2}{2}\right) , \quad \gamma (\alpha ,t) = 2\frac{\alpha }{(1-\exp (- 2 \alpha t)) }. \end{aligned}$$

The derivatives of \(\phi \) can be computed using the Hermite polynomials \((H_m)_{m \ge 0}\). We have

$$\begin{aligned}\forall m\in \mathbb {N}, \quad \phi ^{m} (z) =(-1)^m H_m(z) \phi (z). \end{aligned}$$

Thus, we obtain the following formula for the m-th derivative of the transition density w.r.t. x:

$$\begin{aligned} \partial ^m_x p_t(x,z) = (-1)^m \gamma (\alpha ,t)^{m} \exp (-m \alpha t) H_m(z-\exp (-\alpha t) x) p_t(x,z) . \end{aligned}$$
(73)

Finally, observe that we can rewrite \(f_t\) equivalently in the form

$$\begin{aligned} f_t(x) = \int _{\mathbb {R}} p_t(x,z)f(z)dz. \end{aligned}$$
(74)

Let us now prove (ii). Fix \(1 \le l \le 3\). Using (74), we can write \(\partial ^l_x \log f_t(x)\) as a sum of finitely terms of the form

$$\begin{aligned}f_t(x)^{-k} \prod _{j=1}^k \int f(z) \partial ^{i_j}_x p_t(x,z) dz \end{aligned}$$

where \(k \le l\) and \(i_1,..,i_k\) are integers summing up to l. Plugging (73) in this expression, the desired conclusion follows using the fact that \(H_m\) is a polynomial of degree m, f is compactly supported, and \(\gamma (\alpha ,t)\) is uniformly bounded from above and below for \(t \in [\varepsilon ,1-\varepsilon ]\). To prove (iii), we compute explicitly \(\partial ^2_x \log f_t(x)\), using (73):

$$\begin{aligned}&\exp (-2 \alpha t) \gamma (\alpha ,t)^{2}f^{-2}_t(x) \\&\quad \times \left( \int f(z) H_2(z-\exp (-\alpha t) x) p_t(x,z)dz\, \int f(z) p_t(x,z)dz\right. \\&\left. \quad -\left[ \int f(z) H_1(z-\exp (-\alpha t) x) p_t(x,z)dz \right] ^2 \right) . \end{aligned}$$

Using the explicit form of the first two Hermite polynomials and some standard calculations, the latter expression is seen to be equal to

$$\begin{aligned}&\exp (-2\alpha t) \gamma (\alpha ,t) f_t(x)^{-2} \\&\quad \times \left( \int f(z) z^2 p_t(x,z)dz\, \int f(z) p_t(x,z)dz -\left[ \int f(z) z p_t(x,z)dz \right] ^2 \right) . \end{aligned}$$

The conclusion then follows using the fact that f is compactly supported and that \(\gamma (\alpha ,t)\) is uniformly bounded from above and below for \(t \in [\varepsilon ,1-\varepsilon ]\). \(\square \)

Lemma 4.3

Let Assumption 1.1(A) hold. Then the dual representation 2 holds and

  1. (i)

    \( \mathcal {T}^{\sigma }_{U} (\mu ,\nu ) < \infty \)

  2. (ii)

    For any \(\varepsilon \in (0,1)\)\(f_t,g_t,\tilde{f}_t,\tilde{g}_t\) are \(\mathcal {C}^{b,+}_{\infty }\) over \(D_{\varepsilon }\) and \(\log f_t, \log g_t, \log \tilde{f}_t,\log \tilde{g}_t\) are \(\mathcal {C}^{b}_{\infty }\) over \(D_{\varepsilon }\).

Proof

Let \(\varphi _{\mu } = \frac{d \mu }{d \mathbf {m}}\),\(\varphi _{\nu } = \frac{d \nu }{d \mathbf {m}}\) and define \(\pi \in \Pi (\mu ,\nu )\) by \(\pi (x,y) = \varphi _{\mu }(x) \varphi _{\nu }(y) \mathbf {m}\otimes \mathbf {m}(dx dy)\). The theory of Malliavin calculus ensures that \((X_0,X_1)_{\#} \mathbf {P}\) is an absolutely continuous measure on \(M \times M\) with positive smooth density. Since M is compact, then \((X_0,X_1)_{\#} \mathbf {P}\) is equivalent to \( \mathbf {m}\otimes \mathbf {m}\). Therefore, for some constant \(C<+\infty \),

$$\begin{aligned}\mathscr {H}(\pi | (X_0,X_1)_{\#} \mathbf {P}) \le C \, \mathscr {H}(\pi | \mathbf {m}\otimes \mathbf {m}) = C ( \mathscr {H}(\mu | \mathbf {m})+\mathscr {H}(\nu | \mathbf {m})) <+\infty . \end{aligned}$$

Thus \( \mathcal {T}^{\sigma }_{U} (\mu ,\nu )<+\infty \). It is also a result of Malliavin calculus that \(f_t,g_t\) are of class \(\mathcal {C}^{+}_{\infty }\) on any \(D_{\varepsilon }\). But then, since M is compact, they are also in \(\mathcal {C}^{b,+}_{\infty }\) and uniformly bounded from below, which gives that \(\log f_t,\log g_t \) are in \(\mathcal {C}^{b}_{\infty }\). The statement about \(\tilde{f}_t,\tilde{g}_t\) and their logarithms follows from the one for \(f_t,g_t\) and the compactness of M. \(\square \)

1.3 Proof of Lemma 3.2

Proof

We can rewrite (2) as

$$\begin{aligned} \frac{d \hat{\mathbf {Q}}}{d \mathbf {P}} = f(X_0) g(X_1). \end{aligned}$$

Moreover, since \(\mathbf {P}\) is stationary, we have for any t that \({X_t}_{\#} \mathbf {P}= \mathbf {m}\). Therefore

$$\begin{aligned} \frac{d\mu _t}{d\mathbf {m}}(x) = \frac{d({X_t}_{\#} \hat{\mathbf {Q}})}{d({X_t}_{\#} \mathbf {P})}(x)= & {} E_{\mathbf {P}}[ f(X_0)g(X_1) | X_t =x ] \\&{\mathop {=}\limits ^{\begin{array}{c} \text {Markov}\\ \text {property} \end{array}}}&E_{\mathbf {P}}[f(X_0) |X_t=x] E_{\mathbf {P}}[g(X_1) |X_t=x] \\= & {} f_t(x)g_t(x). \end{aligned}$$

Observing that \(d\mathbf {m}=\exp (-2U) d \mathbf {vol}\), (42) follows from the definition of \(\tilde{f}_t\) and \(\tilde{g}_t\). To prove the second statement, fix \(\varepsilon \in (0,1)\). We observe that, since U is taken to be smooth, the well known results of Malliavin calculus grant that the function \(f_t\) and \(g_t\) are of class \(\mathcal {C}^{+}_{\infty }\), and thus classical solutions on \(D_{\varepsilon }\) of the forward and backward Kolmogorov equations

$$\begin{aligned} \partial _t f_t = \mathscr {L}f_t, \quad \partial _t g_t = - \mathscr {L}g_t. \end{aligned}$$
(75)

Using some standard algebraic manipulations and the positivity of \(f_t,g_t\) one finds that \( \tilde{f}_t\) and \( \tilde{g}_t\) are classical solutions on \(D_{\varepsilon }\) of

$$\begin{aligned} \partial _t \tilde{f}_t = \frac{1}{2} \Delta \tilde{f}_t - \mathscr {U}\tilde{f}_t , \quad \partial _t \tilde{g}_t =- \frac{1}{2} \Delta \tilde{g}_t + \mathscr {U}\tilde{g}_t =0, \end{aligned}$$
(76)

where \(\mathscr {U}\) was defined at (5). Using this, we prove that \(\frac{1}{2}\nabla (\log g_t-\log f_t) = \frac{1}{2}\nabla (\log \tilde{g}_t-\log \tilde{f}_t)\) is a classical solution to the continuity equation on \(D_{\varepsilon }\). Indeed

$$\begin{aligned}&\partial _t \mu _t{\mathop {=}\limits ^{(42)}} (\partial _t \tilde{f}_t) \tilde{g}_t + \tilde{f}_t (\partial _t \tilde{g}_t) \\&\quad {\mathop {=}\limits ^{(76)}} \frac{1}{2} \tilde{g}_t \Delta \tilde{f}_t - \frac{1}{2} \tilde{f}_t \Delta \tilde{g}_t \\&\quad = \frac{1}{2} \nabla \cdot (\tilde{g}_t \nabla \tilde{f}_t) - \frac{1}{2} \nabla \cdot (\tilde{f}_t \nabla \tilde{g}_t) \\&\quad = \frac{1}{2} \nabla \cdot \big (\tilde{f}_t\tilde{g}_t( \nabla \log \tilde{f}_t -\nabla \log \tilde{g}_t)\big ) \\&\quad {\mathop {=}\limits ^{(42)}} \frac{1}{2} \nabla \cdot \big (\mu _t( \nabla \log \tilde{f}_t -\nabla \log \tilde{g}_t)\big ). \end{aligned}$$

Thus, \((t,x)\mapsto \frac{1}{2}\nabla (\log g_t-\log f_t)\) solves the continuity equation, it is of gradient type and, thanks to Lemma 3.1 and (42), \( \sup _{t \in [\varepsilon ,1-\varepsilon ]}\frac{1}{2}| \nabla (\log g_t-\log f_t)|_{\mathbf {T}_{\mu _t}} < + \infty \) also holds. The conclusion then follows. \(\square \)

1.4 Proof of Lemma 3.3

Proof

Fix \(\varepsilon \in (0,1)\). As a preliminary step, we compute \(\partial _t \tau ^{\varepsilon }_t(\xi _t)\). Using the group property we get

$$\begin{aligned} \tau ^{\varepsilon }_{t+h}(\xi _{t+h})-\tau ^{\varepsilon }_t(\xi _t)= & {} \tau ^{\varepsilon }_t \big ( \tau ^t_{t+h}(\xi _{t+h}) - \xi _t \big ) \\= & {} h \tau ^{\varepsilon }_t \big ( \partial _t \xi _t \big )+ \tau ^{\varepsilon }_t \big (\tau ^t_{t+h}(\xi _{t})- \xi _t \big ) + o(h), \end{aligned}$$

where \(o(h)/h \rightarrow 0\) as \(h \rightarrow 0\). Recalling Definition 2.3 and the definition of flow map we get

$$\begin{aligned} \tau ^t_{t+h}(\xi _{t})(x)- \xi _t (x)= & {} (\tau _x)^t_{t+h}\big (\xi _t \circ \mathbf {T}(t,t+h,x)-\xi _t(x)\big )\\= & {} h \overline{\nabla }_{\partial _t \mathbf {T}(t,t,x)} \xi _t(x) +o(h) \\= & {} h \overline{\nabla }_{v_t(x)} \xi _t(x) +o(h) . \end{aligned}$$

Therefore, we have shown that, as a pointwise limit

$$\begin{aligned} \lim _{h \rightarrow 0} \frac{ \tau ^t_{t+h} \xi _{t+h} - \xi _t}{h} = \partial _t \xi _t + \overline{\nabla }_{v_t} \xi _t, \end{aligned}$$
(77)

which implies that

$$\begin{aligned} \partial _t \tau ^{\varepsilon }_t(\xi _t) = \tau ^{\varepsilon }_t(\partial _t \xi _t + \overline{\nabla }_{v_t} \xi _t). \end{aligned}$$

Let us now prove the absolute continuity of \((\xi _t)\) along \((\mu _t)\) using what we have just shown. We have

$$\begin{aligned}&| \tau ^{\varepsilon }_s (\xi _s) - \tau ^{\varepsilon }_t (\xi _t) |_{L^2_{\mu _{\varepsilon }}} \\&\quad =\Big ( \int _{M} \Big | \int _t^s \tau ^{\varepsilon }_r \big ( \partial _r \xi _r + \overline{\nabla }_{v_r} \xi _r \big ) dr\Big |^2 d \mu _{\varepsilon }\Big )^{\frac{1}{2}}\\&\quad {\mathop {\le }\limits ^{\text {Jensen}}} (s-t)^{1/2} \Big ( \int _t^s \int _{M} \big | \tau ^{\varepsilon }_r \big ( \partial _r \xi _r + \overline{\nabla }_{v_r} \xi _r) \big |^2 d \mu _{\varepsilon } dr\Big )^{\frac{1}{2}} \\&\quad {\mathop {=}\limits ^{(33)}} (s-t)^{1/2} \Big ( \int _t^s \int _{M} \big | \partial _r \xi _r + \frac{1}{2}\overline{\nabla }_{v_r} \xi _r \big |^2 d \mu _r dr\Big )^{\frac{1}{2}} \\&\quad \le (s-t) \sup _{r \in [\varepsilon ,1-\varepsilon ]} |\partial _r \xi _r + \overline{\nabla }_{v_r} \xi _r |_{\mathbf {T}_{\mu _r}} . \end{aligned}$$

Using (44), the desired absolute continuity follows. Let us now turn to the proof of (45). By definition,

$$\begin{aligned} \frac{\mathbf {d}}{dt} \xi _t = \lim _{h \downarrow 0} \frac{\tau ^t_{t+h}\xi _{t+h} (\cdot )- \xi _t(\cdot )}{h} , \end{aligned}$$

where the limit is in \(L^2_{\mu _t}\). But then, it is also the pointwise limit along a subsequence. Such computation has been done at (77), and yields the desired result. The identity (46) is a direct consequence of (45) and the fact that \(v_t\) is a gradient vector field. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Conforti, G. A second order equation for Schrödinger bridges with applications to the hot gas experiment and entropic transportation cost. Probab. Theory Relat. Fields 174, 1–47 (2019). https://doi.org/10.1007/s00440-018-0856-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-018-0856-7

Mathematics Subject Classification

  • 60J60
  • 39B62
  • 60F10
  • 46N10
  • 47D07