Abstract
We consider an optimal transport problem on the unit simplex whose solutions are given by gradients of exponentially concave functions and prove two main results. First, we show that the optimal transport is the large deviation limit of a particle system of Dirichlet processes transporting one probability measure on the unit simplex to another by coordinatewise multiplication and normalizing. The structure of our Lagrangian and the appearance of the Dirichlet process relate our problem closely to the entropic measure on the Wasserstein space as defined by von-Renesse and Sturm in the context of Wasserstein diffusion. The limiting procedure is a triangular limit where we allow simultaneously the number of particles to grow to infinity while the ‘noise’ tends to zero. The method, which generalizes easily to many other cost functions, including the squared Euclidean distance, provides a novel combination of the Schrödinger problem approach due to C. Léonard and the related Brownian particle systems by Adams et al. which does not require gamma convergence. Second, we analyze the behavior of entropy along the paths of transport. The reference measure on the simplex is taken to be the Dirichlet measure with all zero parameters which relates to the finite-dimensional distributions of the entropic measure. The interpolating curves are not the usual McCann lines. Nevertheless we show that entropy plus a multiple of the transport cost remains convex, which is reminiscent of the semiconvexity of entropy along lines of McCann interpolations in negative curvature spaces. We also obtain, under suitable conditions, dimension-free bounds of the optimal transport cost in terms of entropy.
Similar content being viewed by others
Notes
As suggested by an anonymous referee, it would be nice to obtain sufficient conditions directly in terms of the distributions \(P_0\) and \(P_1\). This is an interesting problem (possibly related to analysis of the corresponding Monge–Ampère equation studied in Sect. 4.2) on its own and is left for future research. On the other hand, once the function \(\varphi \) is fixed, the transport map T is optimal for any \(P_0\) if we set \(P_1 = T_{\#} P_0\).
We thank an anonymous referee for pointing out this point.
References
Adams, S., Dirr, N., Peletier, M.A., Zimmer, J.: From a large-deviations principle to the Wasserstein gradient flow: a new micro–macro passage. Commun. Math. Phys. 307(3), 791 (2011)
Amari, S.: Information Geometry and Its Applications. Springer, Berlin (2016)
Ambrosio, L., Gigli, N., Savaré, G.: Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Springer, Berlin (2008)
Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions. Commun. Pure Appl. Math. 44(4), 375–417 (1991)
Chang, J.T., Pollard, D.: Conditioning as disintegration. Stat. Neerl. 51(3), 287–317 (1997)
Conforti, G.: A second order equation for Schrödinger bridges with applications to the hot gas experiment and entropic transportation cost. Probab. Theory Relat. Fields 174(1–2), 1–47 (2019)
Cordero-Erausquin, D., McCann, R.J., Schmuckenschläger, M.: Prékopa–Leindler type inequalities on Riemannian manifolds, Jacobi fields, and optimal transport. Annales de la faculté des sciences de Toulouse 15(4), 613–635 (2006)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2006)
Ding, J., Zhou, A.: Eigenvalues of rank-one updated matrices with some applications. Appl. Math. Lett. 20(12), 1223–1226 (2007)
Duong, M.H., Laschos, V., Renger, M.: Wasserstein gradient flows from large deviations of many-particle limits. ESAIM Control Optim. Calc. Var. 19(4), 1166–1188 (2013). Erratum at www.wias-berlin.de/people/renger/Erratum/DLR2015ErratumFinal.pdf
Egozcue, J.J., Pawlowsky-Glahn, V.: Simplicial Geometry for Compositional Data, vol. 264, no. 1, pp. 145–159. Geological Society, Special Publications, London (2006)
Émery, M., Yor, M.: A parallel between Brownian bridges and gamma bridges. Publ. Res. Inst. Math. Sci. 40(3), 669–688 (2004)
Erbar, M., Kuwada, K., Sturm, K.T.: On the equivalence of the entropic curvature-dimension condition and Bochner’s inequality on metric measure spaces. Invent. Math. 201(3), 993–1071 (2015)
Erbar, M., Maas, J., Renger, D.R.M.: From large deviations to Wasserstein gradient flows in multiple dimensions. Electron. Commun. Probab. 20(89), 1–12 (2015)
Feng, S.: Large deviations for Dirichlet processes and Poisson–Dirichlet distribution with two parameters. Electron. J. Probab 12, 787–807 (2007)
Fernholz, E.R.: Stochastic Portfolio Theory. Applications of Mathematics. Springer, Berlin (2002)
Fournier, N., Guillin, A.: On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 162(3–4), 707–738 (2015)
Gangbo, W., McCann, R.J.: The geometry of optimal transportation. Acta Math. 177(2), 113–161 (1996)
Horn, R., Johnson, C.: Matrix Analysis. Cambridge University Press, Cambridge (1990)
Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998)
Khan, G., Zhang, J.: The Kähler geometry of certain optimal transport problems. Pure Appl. Anal. 2(2), 397–426 (2020)
Léonard, C.: From the Schrödinger problem to the Monge–Kantorovich problem. J. Funct. Anal. 262(4), 1879–1920 (2012)
Léonard, C.: A survey of the Schrödinger problem and some of its connections with optimal transport. Discrete Contin. Dyn. Syst. 34(4), 1533–1574 (2014)
Lynch, J., Sethuraman, J.: Large deviations for processes with independent increments. Ann. Probab. 15(2), 610–627 (1987)
McCann, R.J.: A convexity principle for interacting gases. Adv. Math. 128(1), 153–179 (1997)
Mikami, T.: Monge’s problem with a quadratic cost by the zero-noise limit of \(h\)-path processes. Probab. Theory Relat. Fields 129(2), 245–260 (2004)
Otto, F.: The geometry of dissipative evolution equations: the porous medium equation. Commun. Partial Differ. Equ. 26, 101–174 (2001)
Pal, S.: Embedding optimal transports in statistical manifolds. Indian J. Pure Appl. Math. 48(4), 541–550 (2017)
Pal, S.: Exponentially concave functions and high dimensional stochastic portfolio theory. Stoch. Process. Their Appl. 129(9), 3116–3128 (2019)
Pal, S.: On the difference between entropic cost and the optimal transport cost. Arxiv preprint arXiv:1905.12206 (2019)
Pal, S., Wong, T.K.L.: The geometry of relative arbitrage. Math. Financ. Econ. 10, 263–293 (2016)
Pal, S., Wong, T.K.L.: Exponentially concave functions and a new information geometry. Ann. Probab. 46(2), 1070–1113 (2018)
Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton (1997)
Santambrogio, F.: Optimal Transport for Applied Mathematicians. Springer, Berlin (2015)
Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics. American Mathematical Society, Providence (2003)
Villani, C.: Optimal Transport: Old and New. Springer, Berlin (2008)
von Renesse, M.K., Sturm, K.T.: Entropic measure and Wasserstein diffusion. Ann. Probab. 37(3), 1114–1191 (2009)
Wong, T.K.L.: Optimization of relative arbitrage. Ann. Finance 11(3–4), 345–382 (2015)
Wong, T.K.L.: Logarithmic divergences from optimal transport and Rényi geometry. Inf. Geom. 1(1), 39–78 (2018)
Wong, T.K.L.: Information geometry in portfolio theory. In: Nielsen, F. (ed.) Geometric Structures of Information, pp. 105–136. Springer, Cham (2019)
Wong, T.K.L., Yang, J.: Optimal transport and information geometry. arXiv preprint arXiv:1906.00030 (2019)
Acknowledgements
S. P. thanks Martin Huesmann for very useful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
S. Pal’s research is supported by NSF Grant DMS-1612483. T.-K. L. Wong’s research is supported by NSERC Grant RGPIN-2019-04419.
Appendix
Appendix
Proof of Lemma 3
Let \(\theta _i = - \log p_i\) and \(\phi _i = - \log q_i\) for \(1 \le i \le n\). Then the cost function (2) takes the form
By the Cauchy–Schwarz inequality, we have
Since
we have the estimate
Integrating against any coupling \(R \in \varPi (P, Q)\) and replacing the constant (which is irrelevant) by 2 shows that the transport cost is finite whenever \(P, Q \in {\mathcal {L}}\). \(\square \)
Proof of Theorem 1
Since \(P, Q \in {\mathcal {L}}\), by Proposition 3 we have \({\mathbf {C}}(P, Q) < \infty \). Since the cost function is continuous and bounded below, by general results of optimal transport (see for example [35, 36]), there exists an optimal coupling \(R^* \in \varPi (P, Q)\) solving the transport problem, and its support is c-cyclical monotone.
Let \(m \ge 1\) and let \(\{(p(s), q(s)\}_{s = 0}^{m - 1}\) be a sequence in the support of \(R^*\). By the c-cyclical monotonicity of \(R^*\), we have
where by convention \((p(m), q(m)) := (p(0), q(0))\). For each s let \(\pi (s) = q(s) \odot p(s)^{-1}\) and \(r(s) = p(s)^{-1}\). Rearranging, we have
Thus the (multi-valued) portfolio map
induced by the optimal coupling is multiplicatively cyclical monotone in the sense of (14). (In [31, Proposition 12] we performed this argument using another coordinate system.)
By [31, Proposition 4, Proposition 6], there exists an exponentially concave function \(\varphi \) on \(\varDelta _n\) such that if \(\varvec{\pi }\) is the portfolio map generated by \(\varphi \), (p, q) is any pair in the support of \(R^*\) and \(\varphi \) is differentiable at \(r = p^{-1}\), then
Rearranging, we have \(q = p \odot \varvec{\pi }(p^{-1})\) which is the image of p under the mapping (15). Since \(P \in {\mathcal {L}}_a\) is absolutely continuous and \(\varphi \) is differentiable almost everywhere, for P-a.e. values of p there is a unique element \(q \in \varDelta _n\) such that \((p, q) \in \mathrm {supp}(R^*)\) and (78) holds. This proves both (i) and (ii). \(\square \)
Proof of Proposition 2
First we show that \(P_t \in {\mathcal {L}}\) for all t. By Remark 3, for each p the trace of \(\{T_t(p)\}_{0 \le t \le 1}\) is a straight line in \(\varDelta _n\). It follows that for each i we have
Since both \(P_0, P_1 \in {\mathcal {L}}\) by assumption, we have \(P_t \in {\mathcal {L}}\) as well.
Next we prove that \(P_t\) is absolutely continuous. For vectors a and b we let \(\frac{a}{b} = (\frac{a_i}{b_i})\) be the vector of component-wise ratios, and we use \(a \cdot b\) and \(\langle a, b \rangle \) interchangeably to denote the Euclidean dot product.
Let \(0< t < 1\) be given. Let \(\mathbf{w}_t(r) = \frac{\varvec{\pi }_t(r)}{r}\) be the vector of unnormalized weight ratios. Recall that \(q = T_t(p) = p \odot \varvec{\pi }_t(p^{-1}) = r^{-1} \odot \varvec{\pi }_t(r)\) and similarly for \(q'\). Then, by (16), we have
Thus, if we can prove that the distribution \({\tilde{P}}_t\) of \(\mathbf{w}_t(r)\) (where \(r = p^{-1}\) and \(p \sim P_0\)) is absolutely continuous, then \(P_t\) is absolutely continuous and we are done.
To this end, consider the quantity
In the last line we used the estimate \(\log (1 + x) \le x\).
By the multiplicative cyclical monotonicity of the portfolio maps (see (14)), we have
for all \(r, r' \in \varDelta _n\). It follows from (79) and the Cauchy–Schwarz inequality that
By (21), the right hand side of (80) equals
which is positive for \(r \ne r'\). By the Taylor approximation (24) \(c(r, r') + c(r', r)\) is of order \(\Vert r - r'\Vert ^2\) when \(r \approx r'\), thus (81) is of order \((1 - t) \Vert r - r'\Vert \) when \(r \approx r'\).
From (80) and the previous observation, the mapping \(p \mapsto r = p^{-1} \mapsto \mathbf{w}_t(r)\) is one-to-one and its inverse is locally Lipschitz. Since \(P_0\) is absolutely continuous by assumption, we have that \({\tilde{P}}_t\), and hence \(P_t\), is absolutely continuous.
To prove the second claim, let \(\varvec{\pi }_t\) be the portfolio map at time t. By Lemma 2, we have
By properties of the relative entropy (see for example [8, Theorem 2.7.2]) the quantity \(H\left( {\overline{e}} \mid (1 - t) {\overline{e}} + t \varvec{\pi }_1(p^{-1})\right) \) is smooth and convex in t, and is increasing and strictly convex whenever \(\varvec{\pi }_1(p^{-1}) \ne {\overline{e}}\). Since \(P_0 \ne P_1\) by assumption, the last condition holds on a set of positive probability under \(P_0\). This completes the proof of the proposition. \(\square \)
Proof of Lemma 7
Recall that
Since \(\log (1 + x) \le x\), we have the upper bound
which is the Bregman divergence of \(\varphi \) (see [2, Chapter 1]). Let \(q \ne q'\). Applying Taylor’s theorem along the line segment \([q, q']\) from q to \(q'\), we have
for some \(q''\) on \([q, q']\) and \(v = \frac{q' - q}{\Vert q' - q \Vert }\). From the hypotheses we have
so the upper bound in (41) holds with \(\alpha ' = C_1\).
To derive a lower bound, let \(\Phi = e^{\varphi }\) and express (82) in the form
Again \(q''\) is some point on \([q, q']\) and v is as above. Using \(- \log (1 + x) \ge -x\), we have the bound
Since \(\Phi \) is non-negative and concave on \(\varDelta _n\), it is bounded above by some \(M > 0\). Since \(\Vert q' - q\Vert \le 1\) for \(q, q' \in \varDelta _n\), we have
Plugging this into (83) gives the lower bound with \(\alpha = \frac{C_2}{M + C_3}\). \(\square \)
Rights and permissions
About this article
Cite this article
Pal, S., Wong, TK.L. Multiplicative Schrödinger problem and the Dirichlet transport. Probab. Theory Relat. Fields 178, 613–654 (2020). https://doi.org/10.1007/s00440-020-00987-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-020-00987-6
Keywords
- Optimal transport
- Exponentially concave function
- Displacment interpolation
- Schrödinger problem
- Entropic measure
- L-divergence
- Large deviations
- Dirichlet process