Abstract
In the early 1930s, Erwin Schrödinger, motivated by his quest for a more classical formulation of quantum mechanics, posed a large deviation problem for a cloud of independent Brownian particles. He showed that the solution to the problem could be obtained through a system of two linear equations with nonlinear coupling at the boundary (Schrödinger system). Existence and uniqueness for such a system, which represents a sort of bottleneck for the problem, was first established by Fortet in 1938/1940 under rather general assumptions by proving convergence of an ingenious but complex approximation method. It is the first proof of what are nowadays called Sinkhorn-type algorithms in the much more challenging continuous case. Schrödinger bridges are also an early example of the maximum entropy approach and have been more recently recognized as a regularization of the important optimal mass transport problem. Unfortunately, Fortet’s contribution is by and large ignored in contemporary literature. This is likely due to the complexity of his approach coupled with an idiosyncratic exposition style and due to missing details and steps in the proofs. Nevertheless, Fortet’s approach maintains its importance to this day as it provides the only existing algorithmic proof, in the continuous setting, under rather mild assumptions. It can be adapted, in principle, to other relevant optimal transport problems. It is the purpose of this paper to remedy this situation by rewriting the bulk of his paper with all the missing passages and in a transparent fashion so as to make it fully available to the scientific community. We consider the problem in \({\mathbb {R}}^d\) rather than in \({\mathbb {R}}\) and use as much as possible his notation to facilitate comparison.
Similar content being viewed by others
Notes
Let \(\mathcal{V}\) be a metric space and \({{{\mathcal {D}}}}({{{\mathcal {V}}}})\) be the set of probability measures defined on \({{{\mathcal {B}}}}({{{\mathcal {V}}}})\), the Borel \(\sigma \)-field of \({{{\mathcal {V}}}}\). We say that a sequence \(\{P_N\}\) of elements of \({{{\mathcal {D}}}}({{{\mathcal {V}}}})\) converges weakly to \(P\in \mathcal{D}({{{\mathcal {V}}}})\), and write \(P_N\Rightarrow P\) if \(\int _\mathcal{V}f\mathrm{d}P_N\rightarrow \int _{{{\mathcal {V}}}}f \mathrm{d}P\) for every bounded, continuous function f on \({{{\mathcal {V}}}}\).
The initial marginal of the prior measure, as long as \(\rho _0(x){\mathrm{d}}x\) is at finite relative entropy from it, does not play any role in the optimization problem. Instead of \(\rho _0(x){\mathrm{d}}x\), which is the standard case in control problems, another popular choice is Lebesgue measure so that the prior is an unbounded measure called stationary Wiener measure, see, e.g., [12].
Probability densities on \({\mathbb {R}}^n\times {\mathbb {R}}^n\) with marginals \(\rho _0\) and \(\rho _1\).
Remarkable analogies to quantum mechanics, which appear to me very worth of reflection.
In this paper, the maximum or minimum of two functions will always be taken pointwise.
Fortet seems to imply by this proposition that H and \(H'\) cannot vanish at a point without vanishing everywhere. Although this is true for \(H'\), see Proposition 5.8 below, it does not imply the same property for H.
References
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620 (1957)
Jaynes, E.T.: On the rationale of maximum-entropy methods. Proc. IEEE 70(9), 939–952 (1982)
Burg, J.P.: Maximum entropy spectral analysis. In: 37th Annual International Meeting, Society of Exploration Geophysicists Oklahoma City, Okla, 31 Oct 1967 (1967)
Burg, J.P., Luenberger, D.G., Wenger, D.L.: Estimation of structured covariance matrices. Proc. IEEE 70(9), 963–974 (1982)
Dempster, A.P.: Covariance selection. Biometrics 28, 157–175 (1972)
Csiszár, I.: I-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3, 146–158 (1975)
Csiszár, I.: Sanov property, generalized I-projection and a conditional limit theorem. Ann. Probab. 12, 768–793 (1984)
Csiszar, I., et al.: Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Stat. 19(4), 2032–2066 (1991)
Mikami, T.: Monge’s problem with a quadratic cost by the zero-noise limit of h-path processes. Probab. Theory Relat. Fields 129(2), 245–260 (2004)
Mikami, T., Thieullen, M.: Duality theorem for the stochastic optimal control problem. Stoch. Process. Appl. 116(12), 1815–1835 (2006). https://doi.org/10.1016/j.spa.2006.04.014
Mikami, T., Thieullen, M.: Optimal transportation problem by stochastic optimal control. SIAM J. Control Optim. 47(3), 1127–1139 (2008)
Léonard, C.: A survey of the schrodinger problem and some of its connections with optimal transport. Discrete Contin. Dyn. Syst. A 34(4), 1533–1574 (2014)
Léonard, C.: From the Schrödinger Problem to the Monge–Kantorovich Problem. arXiv preprint arXiv:1011.2564 (2010)
Chen, Y., Georgiou, T.T., Pavon, M.: On the relation between optimal transport and Schrödinger bridges: a stochastic control viewpoint. J. Optim. Theory Appl. 169(2), 671–691 (2016)
Peyré, G., Cuturi, M.: Computational Optimal Transport. arXiv preprint arXiv:1803.00567 (2018)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems, pp. 2292–2300 (2013)
Benamou, J.D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015)
Chen, Y., Georgiou, T.T., Pavon, M.: Optimal transport over a linear dynamical system. IEEE Trans. Autom. Control 62(5), 2137–2152 (2017)
Chen, Y., Georgiou, T., Pavon, M.: Entropic and displacement interpolation: a computational approach using the Hilbert metric. SIAM J. Appl. Math. 76(6), 2375–2396 (2016)
Fortet, R.: Résolution d’un système d’équations de M. Schrodinger. Comptes Rendus 206, 721–723 (1938)
Fortet, R.: Résolution d’un système d’équations de M. Schrodinger. J. Math. Pure Appl. IX, 83–105 (1940)
Beurling, A.: An automorphism of product measures. Ann. Math. 72, 189–200 (1960)
Jamison, B.: The Markov processes of Schrödinger. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 32(4), 323–331 (1975)
Zambrini, J.C.: Variational processes and stochastic versions of mechanics. J. Math. Phys. 27, 2307–2330 (1986)
Föllmer, H.: Random fields and diffusion processes. In: Hennequin, P.-L. (ed.) École d’Été de Probabilités de Saint-Flour XV-XVII, 1985–87, pp. 101–203. Springer, Berlin (1988)
Deming, W.E., Stephan, F.F.: On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Stat. 11(4), 427–444 (1940)
Sinkhorn, R.: A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Stat. 35(2), 876–879 (1964)
Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.X.: Scaling Algorithms for Unbalanced Transport Problems. arXiv preprint arXiv:1607.05816 (2016)
Schrödinger, E.: Uber, : die umkehrung der naturgesetze. Sitzungsberichte der Preuss Akad. Wissen. Berlin. Phys. Math. Klasse 1, 144–153 (1931)
Schrödinger, E.: Sur la théorie relativiste de l’électron et l’interprétation de la mécanique quantique. Ann. Inst. H. Poincaré 2(4), 269–310 (1932)
Sanov, I.N.: On the Probability of Large Deviations of Random Variables, Technical report. Department of Statistics, North Carolina State University (1958)
Dudley, R.M.: Real Analysis and Probability. Cambridge University Press, Cambridge (2002)
Ellis, R.S.: Entropy, Large Deviations, and Statistical Mechanics. Springer, Berlin (2007)
Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications. Corrected reprint of the second (1998) edition. Stochastic Modelling and Applied Probability, p. 38 (2010)
Villani, C.: Topics in Optimal Transportation, vol. 58. American Mathematical Society, Providence (2003)
Wakolbinger, A.: Schrödinger bridges from 1931 to 1991. In: Proceedings of the 4th Latin American Congress in Probability and Mathematical Statistics, Mexico City, pp. 61–79 (1990)
Dai Pra, P.: A stochastic control approach to reciprocal diffusion processes. Appl. Math. Optim. 23(1), 313–329 (1991)
Dai Pra, P., Pavon, M.: On the Markov processes of Schrödinger, the Feynman–Kac formula and stochastic control. In: Realization and Modelling in System Theory, pp. 497–504. Springer, Berlin (1990)
Pavon, M., Wakolbinger, A.: On free energy, stochastic control, and Schrödinger processes. In: Modeling, Estimation and Control of Systems with Uncertainty, pp. 334–348. Springer, Berlin (1991)
Mikami, T.: Optimal transportation problem as stochastic mechanics. Sel. Pap. Probab. Stat. 227, 75–94 (2008)
Chen, Y., Georgiou, T.T., Pavon, M.: Optimal steering of a linear stochastic system to a final probability distribution, part I. IEEE Trans. Autom. Control 61(5), 1158–1169 (2016)
Chen, Y., Georgiou, T.T., Pavon, M.: Optimal steering of a linear stochastic system to a final probability distribution, part II. IEEE Trans. Autom. Control 61(5), 1170–1180 (2016)
Chen, Y., Georgiou, T.T., Pavon, M.: Fast cooling for a system of stochastic oscillators. J. Math. Phys. 56(11), 113,302 (2015)
Benamou, J.D., Brenier, Y.: A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numer. Math. 84(3), 375–393 (2000)
Birkhoff, G.: Extensions of Jentzsch’s theorem. Trans. Am. Math. Soc. 85(1), 219–227 (1957)
Bushell, P.: On the projective contraction ratio for positive linear mappings. J. Lond. Math. Soc. 2(2), 256–258 (1973)
Bushell, P.J.: Hilbert’s metric and positive contraction mappings in a Banach space. Arch. Ration. Mech. Anal. 52(4), 330–338 (1973)
Birkhoff, G.: Uniformly semi-primitive multiplicative processes. Trans. Am. Math. Soc. 104(1), 37–51 (1962)
Lemmens, B., Nussbaum, R.: Birkhoff’s Version of Hilbert’s Metric and Its Applications in Analysis. arXiv preprint arXiv:1304.7921 (2013)
Tsitsiklis, J., Bertsekas, D., Athans, M.: Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Autom. Control 31(9), 803–812 (1986)
Sepulchre, R., Sarlette, A., Rouchon, P.: Consensus in Non-Commutative Spaces. arXiv preprint arXiv:1003.5653 (2010)
Bonnabel, S., Astolfi, A., Sepulchre, R.: Contraction and observer design on cones. In: Decision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on, pp. 7147–7151. IEEE (2011)
Reeb, D., Kastoryano, M.J., Wolf, M.M.: Hilbert’s projective metric in quantum information theory. J. Math. Phys. 52(8), 082,201 (2011)
Lemmens, B., Nussbaum, R.: Nonlinear Perron–Frobenius Theory, vol. 189. Cambridge University Press, Cambridge (2012)
Georgiou, T.T., Pavon, M.: Positive contraction mappings for classical and quantum Schrödinger systems. J. Math. Phys. 56(3), 033,301 (2015)
Franklin, J., Lorenz, J.: On the scaling of multidimensional matrices. Linear Algebra Appl. 114, 717–735 (1989)
Schmitzer, B.: Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems. arXiv preprint arXiv:1610.06519 (2016)
Galichon, A., Kominers, S.D., Weber, S.: The nonlinear Bernstein–Schrödinger equation in economics. In: International Conference on Networked Geometric Science of Information, pp. 51–59. Springer, Berlin (2015)
Acknowledgements
The authors thank Robert V. Kohn for useful suggestions. The second named author would also like to thank the Courant Institute of Mathematical Sciences of the New York University for the hospitality during the time this paper was written. The authors finally wish to thank two anonymous reviewers for very careful reading and providing plenty of general and specific comments/suggestions on how to improve the paper. The second named author was partly supported by the University of Padova Research Project CPDA 140897.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Gabriel Peyré
Appendix
Appendix
1.1 Proof of (33) from Theorem I
Let \(Z \subset {\mathfrak {I}}^2\) be the set of \(\{ y \in {\mathfrak {I}}^2{:}\,g(x_0,y) = 0 \}\).
Define \(Z_k = \{ y \in {\mathfrak {I}}^2{:}\,g(x_0,y) < \frac{1}{k} \}\) for \(k \in {\mathbb {N}}^*\). We have \(Z_{k+1} \subset Z_k\), and \(Z_k \downarrow Z\) as \(k \rightarrow +\infty \).
By assumption (H.vi) we know that Z has Lebesgue measure 0. From the continuity of g (H.iv), we also know that Z is closed.
Hence, \(m(Z_k) \rightarrow 0\) as \(k \rightarrow +\infty \).
Denote \({\mathfrak {I}}^2_k = {\mathfrak {I}}^2 \backslash Z_k\). Then we have \({\mathfrak {I}}^2_k \subset {\mathfrak {I}}^2_{k+1}\) and \({\mathfrak {I}}^2_k \uparrow {\mathfrak {I}}^2 \backslash Z\) as \(k \rightarrow +\infty \).
Since
\(\forall \epsilon >0\), we have for n large enough:
Fix \(\epsilon >0, k \in {\mathbb {N}}^*\). We then have for n large enough:
and in particular, by nonnegativity, the first integral yields:
This implies that the measure \(\frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y\) converges weakly to 0 on \({\mathfrak {I}}^2_{k}\). Indeed, it is the case when evaluated on any step function with support included in \({\mathfrak {I}}^2_{k}\), and step functions are dense in the family of bounded continuous functions.
We would like the measure \(\frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y\) to converge to 0 for any step function whose support I is included in \({\mathfrak {I}}^2\), and not merely on \({\mathfrak {I}}^2_k\).
Pick a subset \(I \subset {{\mathfrak {I}}^2}\), and consider:
The first integral converges to 0 as \(n \rightarrow +\infty \), since the measure \(\frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y\) converges weakly to 0 on \({\mathfrak {I}}^2_{k}\).
As for the second integral, we have that \(H_n \le H_1\), so \(\frac{\omega _2(y)}{G(H_n,y)} \le \frac{\omega _2(y)}{G(H_1,y)}\) which implies:
where the last inequality comes from \(({\mathfrak {I}}^2_{k} \cap I)^C \subset Z_{k}\).
Condition (\(\star \)) states that \(\int _{{\mathfrak {I}}^2} \frac{\omega _2(y)}{G(H_1,y)} {\mathrm{d}}y < +\infty \); thus, we know that the measure \(\frac{\omega _2(y)}{G(H_1,y)} {\mathrm{d}}y\) is absolutely continuous with respect to the Lebesgue measure m on \({\mathfrak {I}}^2\). This implies that the second integral converges to 0, as \(k \rightarrow +\infty \) since \(m(Z_{k}) \rightarrow 0\).
Hence, for any measurable \(I \subset {\mathfrak {I}}^2\), \(\int _{{\mathfrak {I}}^2} \mathbb 1_{I}(y) \frac{\omega _2(y)}{G(H_n,y)} {\mathrm{d}}y \rightarrow 0\) as \(n \rightarrow +\infty \).
Rights and permissions
About this article
Cite this article
Essid, M., Pavon, M. Traversing the Schrödinger Bridge Strait: Robert Fortet’s Marvelous Proof Redux. J Optim Theory Appl 181, 23–60 (2019). https://doi.org/10.1007/s10957-018-1436-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-018-1436-9