Skip to main content
Log in

Traversing the Schrödinger Bridge Strait: Robert Fortet’s Marvelous Proof Redux

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

In the early 1930s, Erwin Schrödinger, motivated by his quest for a more classical formulation of quantum mechanics, posed a large deviation problem for a cloud of independent Brownian particles. He showed that the solution to the problem could be obtained through a system of two linear equations with nonlinear coupling at the boundary (Schrödinger system). Existence and uniqueness for such a system, which represents a sort of bottleneck for the problem, was first established by Fortet in 1938/1940 under rather general assumptions by proving convergence of an ingenious but complex approximation method. It is the first proof of what are nowadays called Sinkhorn-type algorithms in the much more challenging continuous case. Schrödinger bridges are also an early example of the maximum entropy approach and have been more recently recognized as a regularization of the important optimal mass transport problem. Unfortunately, Fortet’s contribution is by and large ignored in contemporary literature. This is likely due to the complexity of his approach coupled with an idiosyncratic exposition style and due to missing details and steps in the proofs. Nevertheless, Fortet’s approach maintains its importance to this day as it provides the only existing algorithmic proof, in the continuous setting, under rather mild assumptions. It can be adapted, in principle, to other relevant optimal transport problems. It is the purpose of this paper to remedy this situation by rewriting the bulk of his paper with all the missing passages and in a transparent fashion so as to make it fully available to the scientific community. We consider the problem in \({\mathbb {R}}^d\) rather than in \({\mathbb {R}}\) and use as much as possible his notation to facilitate comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Let \(\mathcal{V}\) be a metric space and \({{{\mathcal {D}}}}({{{\mathcal {V}}}})\) be the set of probability measures defined on \({{{\mathcal {B}}}}({{{\mathcal {V}}}})\), the Borel \(\sigma \)-field of \({{{\mathcal {V}}}}\). We say that a sequence \(\{P_N\}\) of elements of \({{{\mathcal {D}}}}({{{\mathcal {V}}}})\) converges weakly to \(P\in \mathcal{D}({{{\mathcal {V}}}})\), and write \(P_N\Rightarrow P\) if \(\int _\mathcal{V}f\mathrm{d}P_N\rightarrow \int _{{{\mathcal {V}}}}f \mathrm{d}P\) for every bounded, continuous function f on \({{{\mathcal {V}}}}\).

  2. The initial marginal of the prior measure, as long as \(\rho _0(x){\mathrm{d}}x\) is at finite relative entropy from it, does not play any role in the optimization problem. Instead of \(\rho _0(x){\mathrm{d}}x\), which is the standard case in control problems, another popular choice is Lebesgue measure so that the prior is an unbounded measure called stationary Wiener measure, see, e.g., [12].

  3. Probability densities on \({\mathbb {R}}^n\times {\mathbb {R}}^n\) with marginals \(\rho _0\) and \(\rho _1\).

  4. Remarkable analogies to quantum mechanics, which appear to me very worth of reflection.

  5. In this paper, the maximum or minimum of two functions will always be taken pointwise.

  6. In Fortet’s paper, \(H'_{|n}\) is denoted \(H'_n\) [21, p. 88]. Unfortunately, the same notation is later used for another quantity [21, p. 90].

  7. Fortet seems to imply by this proposition that H and \(H'\) cannot vanish at a point without vanishing everywhere. Although this is true for \(H'\), see Proposition 5.8 below, it does not imply the same property for H.

  8. The statement can be found on [21, p. 92]. The proof there provided, however, appears to be incorrect as it does not make use of hypothesis (\(\star \)) confusing \(H_n'\) of the iteration (AS) with \(H'_{|n}\) (also denoted by \(H'_n\) by Fortet) defined in (18).

References

  1. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  2. Jaynes, E.T.: On the rationale of maximum-entropy methods. Proc. IEEE 70(9), 939–952 (1982)

    Article  Google Scholar 

  3. Burg, J.P.: Maximum entropy spectral analysis. In: 37th Annual International Meeting, Society of Exploration Geophysicists Oklahoma City, Okla, 31 Oct 1967 (1967)

  4. Burg, J.P., Luenberger, D.G., Wenger, D.L.: Estimation of structured covariance matrices. Proc. IEEE 70(9), 963–974 (1982)

    Article  Google Scholar 

  5. Dempster, A.P.: Covariance selection. Biometrics 28, 157–175 (1972)

    Article  Google Scholar 

  6. Csiszár, I.: I-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3, 146–158 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  7. Csiszár, I.: Sanov property, generalized I-projection and a conditional limit theorem. Ann. Probab. 12, 768–793 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  8. Csiszar, I., et al.: Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Stat. 19(4), 2032–2066 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  9. Mikami, T.: Monge’s problem with a quadratic cost by the zero-noise limit of h-path processes. Probab. Theory Relat. Fields 129(2), 245–260 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  10. Mikami, T., Thieullen, M.: Duality theorem for the stochastic optimal control problem. Stoch. Process. Appl. 116(12), 1815–1835 (2006). https://doi.org/10.1016/j.spa.2006.04.014

    Article  MathSciNet  MATH  Google Scholar 

  11. Mikami, T., Thieullen, M.: Optimal transportation problem by stochastic optimal control. SIAM J. Control Optim. 47(3), 1127–1139 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  12. Léonard, C.: A survey of the schrodinger problem and some of its connections with optimal transport. Discrete Contin. Dyn. Syst. A 34(4), 1533–1574 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  13. Léonard, C.: From the Schrödinger Problem to the Monge–Kantorovich Problem. arXiv preprint arXiv:1011.2564 (2010)

  14. Chen, Y., Georgiou, T.T., Pavon, M.: On the relation between optimal transport and Schrödinger bridges: a stochastic control viewpoint. J. Optim. Theory Appl. 169(2), 671–691 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  15. Peyré, G., Cuturi, M.: Computational Optimal Transport. arXiv preprint arXiv:1803.00567 (2018)

  16. Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems, pp. 2292–2300 (2013)

  17. Benamou, J.D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  18. Chen, Y., Georgiou, T.T., Pavon, M.: Optimal transport over a linear dynamical system. IEEE Trans. Autom. Control 62(5), 2137–2152 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  19. Chen, Y., Georgiou, T., Pavon, M.: Entropic and displacement interpolation: a computational approach using the Hilbert metric. SIAM J. Appl. Math. 76(6), 2375–2396 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  20. Fortet, R.: Résolution d’un système d’équations de M. Schrodinger. Comptes Rendus 206, 721–723 (1938)

    MATH  Google Scholar 

  21. Fortet, R.: Résolution d’un système d’équations de M. Schrodinger. J. Math. Pure Appl. IX, 83–105 (1940)

    MathSciNet  MATH  Google Scholar 

  22. Beurling, A.: An automorphism of product measures. Ann. Math. 72, 189–200 (1960)

    Article  MathSciNet  MATH  Google Scholar 

  23. Jamison, B.: The Markov processes of Schrödinger. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 32(4), 323–331 (1975)

    Article  MATH  Google Scholar 

  24. Zambrini, J.C.: Variational processes and stochastic versions of mechanics. J. Math. Phys. 27, 2307–2330 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  25. Föllmer, H.: Random fields and diffusion processes. In: Hennequin, P.-L. (ed.) École d’Été de Probabilités de Saint-Flour XV-XVII, 1985–87, pp. 101–203. Springer, Berlin (1988)

    Chapter  MATH  Google Scholar 

  26. Deming, W.E., Stephan, F.F.: On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Stat. 11(4), 427–444 (1940)

    Article  MathSciNet  MATH  Google Scholar 

  27. Sinkhorn, R.: A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Stat. 35(2), 876–879 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  28. Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.X.: Scaling Algorithms for Unbalanced Transport Problems. arXiv preprint arXiv:1607.05816 (2016)

  29. Schrödinger, E.: Uber, : die umkehrung der naturgesetze. Sitzungsberichte der Preuss Akad. Wissen. Berlin. Phys. Math. Klasse 1, 144–153 (1931)

  30. Schrödinger, E.: Sur la théorie relativiste de l’électron et l’interprétation de la mécanique quantique. Ann. Inst. H. Poincaré 2(4), 269–310 (1932)

    MathSciNet  MATH  Google Scholar 

  31. Sanov, I.N.: On the Probability of Large Deviations of Random Variables, Technical report. Department of Statistics, North Carolina State University (1958)

  32. Dudley, R.M.: Real Analysis and Probability. Cambridge University Press, Cambridge (2002)

    Book  MATH  Google Scholar 

  33. Ellis, R.S.: Entropy, Large Deviations, and Statistical Mechanics. Springer, Berlin (2007)

    Google Scholar 

  34. Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications. Corrected reprint of the second (1998) edition. Stochastic Modelling and Applied Probability, p. 38 (2010)

  35. Villani, C.: Topics in Optimal Transportation, vol. 58. American Mathematical Society, Providence (2003)

    MATH  Google Scholar 

  36. Wakolbinger, A.: Schrödinger bridges from 1931 to 1991. In: Proceedings of the 4th Latin American Congress in Probability and Mathematical Statistics, Mexico City, pp. 61–79 (1990)

  37. Dai Pra, P.: A stochastic control approach to reciprocal diffusion processes. Appl. Math. Optim. 23(1), 313–329 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  38. Dai Pra, P., Pavon, M.: On the Markov processes of Schrödinger, the Feynman–Kac formula and stochastic control. In: Realization and Modelling in System Theory, pp. 497–504. Springer, Berlin (1990)

  39. Pavon, M., Wakolbinger, A.: On free energy, stochastic control, and Schrödinger processes. In: Modeling, Estimation and Control of Systems with Uncertainty, pp. 334–348. Springer, Berlin (1991)

  40. Mikami, T.: Optimal transportation problem as stochastic mechanics. Sel. Pap. Probab. Stat. 227, 75–94 (2008)

    Google Scholar 

  41. Chen, Y., Georgiou, T.T., Pavon, M.: Optimal steering of a linear stochastic system to a final probability distribution, part I. IEEE Trans. Autom. Control 61(5), 1158–1169 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  42. Chen, Y., Georgiou, T.T., Pavon, M.: Optimal steering of a linear stochastic system to a final probability distribution, part II. IEEE Trans. Autom. Control 61(5), 1170–1180 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  43. Chen, Y., Georgiou, T.T., Pavon, M.: Fast cooling for a system of stochastic oscillators. J. Math. Phys. 56(11), 113,302 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  44. Benamou, J.D., Brenier, Y.: A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numer. Math. 84(3), 375–393 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  45. Birkhoff, G.: Extensions of Jentzsch’s theorem. Trans. Am. Math. Soc. 85(1), 219–227 (1957)

    MathSciNet  MATH  Google Scholar 

  46. Bushell, P.: On the projective contraction ratio for positive linear mappings. J. Lond. Math. Soc. 2(2), 256–258 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  47. Bushell, P.J.: Hilbert’s metric and positive contraction mappings in a Banach space. Arch. Ration. Mech. Anal. 52(4), 330–338 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  48. Birkhoff, G.: Uniformly semi-primitive multiplicative processes. Trans. Am. Math. Soc. 104(1), 37–51 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  49. Lemmens, B., Nussbaum, R.: Birkhoff’s Version of Hilbert’s Metric and Its Applications in Analysis. arXiv preprint arXiv:1304.7921 (2013)

  50. Tsitsiklis, J., Bertsekas, D., Athans, M.: Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Autom. Control 31(9), 803–812 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  51. Sepulchre, R., Sarlette, A., Rouchon, P.: Consensus in Non-Commutative Spaces. arXiv preprint arXiv:1003.5653 (2010)

  52. Bonnabel, S., Astolfi, A., Sepulchre, R.: Contraction and observer design on cones. In: Decision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on, pp. 7147–7151. IEEE (2011)

  53. Reeb, D., Kastoryano, M.J., Wolf, M.M.: Hilbert’s projective metric in quantum information theory. J. Math. Phys. 52(8), 082,201 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  54. Lemmens, B., Nussbaum, R.: Nonlinear Perron–Frobenius Theory, vol. 189. Cambridge University Press, Cambridge (2012)

    Book  MATH  Google Scholar 

  55. Georgiou, T.T., Pavon, M.: Positive contraction mappings for classical and quantum Schrödinger systems. J. Math. Phys. 56(3), 033,301 (2015)

    Article  MATH  Google Scholar 

  56. Franklin, J., Lorenz, J.: On the scaling of multidimensional matrices. Linear Algebra Appl. 114, 717–735 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  57. Schmitzer, B.: Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems. arXiv preprint arXiv:1610.06519 (2016)

  58. Galichon, A., Kominers, S.D., Weber, S.: The nonlinear Bernstein–Schrödinger equation in economics. In: International Conference on Networked Geometric Science of Information, pp. 51–59. Springer, Berlin (2015)

Download references

Acknowledgements

The authors thank Robert V. Kohn for useful suggestions. The second named author would also like to thank the Courant Institute of Mathematical Sciences of the New York University for the hospitality during the time this paper was written. The authors finally wish to thank two anonymous reviewers for very careful reading and providing plenty of general and specific comments/suggestions on how to improve the paper. The second named author was partly supported by the University of Padova Research Project CPDA 140897.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michele Pavon.

Additional information

Communicated by Gabriel Peyré

Appendix

Appendix

1.1 Proof of (33) from Theorem I

Let \(Z \subset {\mathfrak {I}}^2\) be the set of \(\{ y \in {\mathfrak {I}}^2{:}\,g(x_0,y) = 0 \}\).

Define \(Z_k = \{ y \in {\mathfrak {I}}^2{:}\,g(x_0,y) < \frac{1}{k} \}\) for \(k \in {\mathbb {N}}^*\). We have \(Z_{k+1} \subset Z_k\), and \(Z_k \downarrow Z\) as \(k \rightarrow +\infty \).

By assumption (H.vi) we know that Z has Lebesgue measure 0. From the continuity of g (H.iv), we also know that Z is closed.

Hence, \(m(Z_k) \rightarrow 0\) as \(k \rightarrow +\infty \).

Denote \({\mathfrak {I}}^2_k = {\mathfrak {I}}^2 \backslash Z_k\). Then we have \({\mathfrak {I}}^2_k \subset {\mathfrak {I}}^2_{k+1}\) and \({\mathfrak {I}}^2_k \uparrow {\mathfrak {I}}^2 \backslash Z\) as \(k \rightarrow +\infty \).

Since

$$\begin{aligned} H_n'(x_0) = \int _{{\mathfrak {I}}^2} g(x_0,y) \frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y \rightarrow 0, \quad \text { as } n \rightarrow +\infty \end{aligned}$$

\(\forall \epsilon >0\), we have for n large enough:

$$\begin{aligned} \int _{{\mathfrak {I}}^2} g(x_0,y) \frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y < \epsilon \end{aligned}$$

Fix \(\epsilon >0, k \in {\mathbb {N}}^*\). We then have for n large enough:

$$\begin{aligned} 0 \le \int _{{\mathfrak {I}}^2_{k}} g(x_0,y) \frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y + \int _{{\mathfrak {I}}^2\backslash {\mathfrak {I}}^2_{k}} g(x_0,y) \frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y < \epsilon \end{aligned}$$

and in particular, by nonnegativity, the first integral yields:

$$\begin{aligned} 0 \le \int _{{\mathfrak {I}}^2_{k}} \frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y < k \epsilon \end{aligned}$$

This implies that the measure \(\frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y\) converges weakly to 0 on \({\mathfrak {I}}^2_{k}\). Indeed, it is the case when evaluated on any step function with support included in \({\mathfrak {I}}^2_{k}\), and step functions are dense in the family of bounded continuous functions.

We would like the measure \(\frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y\) to converge to 0 for any step function whose support I is included in \({\mathfrak {I}}^2\), and not merely on \({\mathfrak {I}}^2_k\).

Pick a subset \(I \subset {{\mathfrak {I}}^2}\), and consider:

$$\begin{aligned} \int _{{\mathfrak {I}}^2} \mathbb 1_{I}(y) \frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y = \int _{{\mathfrak {I}}^2_{k} \cap I} \frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y + \int _{({\mathfrak {I}}^2_{k} \cap I)^C} \frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y \end{aligned}$$

The first integral converges to 0 as \(n \rightarrow +\infty \), since the measure \(\frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y\) converges weakly to 0 on \({\mathfrak {I}}^2_{k}\).

As for the second integral, we have that \(H_n \le H_1\), so \(\frac{\omega _2(y)}{G(H_n,y)} \le \frac{\omega _2(y)}{G(H_1,y)}\) which implies:

$$\begin{aligned} \int _{({\mathfrak {I}}^2_{k} \cap I)^C} \frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y \le \int _{({\mathfrak {I}}^2_{k} \cap I)^C} \frac{\omega _2(y)}{G(H_1,y)} {\mathrm{d}}y \le \int _{Z_{k}} \frac{\omega _2(y)}{G(H_1,y)} {\mathrm{d}}y \end{aligned}$$

where the last inequality comes from \(({\mathfrak {I}}^2_{k} \cap I)^C \subset Z_{k}\).

Condition (\(\star \)) states that \(\int _{{\mathfrak {I}}^2} \frac{\omega _2(y)}{G(H_1,y)} {\mathrm{d}}y < +\infty \); thus, we know that the measure \(\frac{\omega _2(y)}{G(H_1,y)} {\mathrm{d}}y\) is absolutely continuous with respect to the Lebesgue measure m on \({\mathfrak {I}}^2\). This implies that the second integral converges to 0, as \(k \rightarrow +\infty \) since \(m(Z_{k}) \rightarrow 0\).

Hence, for any measurable \(I \subset {\mathfrak {I}}^2\), \(\int _{{\mathfrak {I}}^2} \mathbb 1_{I}(y) \frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y \rightarrow 0\) as \(n \rightarrow +\infty \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Essid, M., Pavon, M. Traversing the Schrödinger Bridge Strait: Robert Fortet’s Marvelous Proof Redux. J Optim Theory Appl 181, 23–60 (2019). https://doi.org/10.1007/s10957-018-1436-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-018-1436-9

Keywords

Navigation