Skip to main content
Log in

Optimal measure transportation with respect to non-traditional costs

  • Published:
Calculus of Variations and Partial Differential Equations Aims and scope Submit manuscript

Abstract

We study optimal mass transport problems between two measures with respect to a non-traditional cost function, i.e. a cost c which can attain the value \(+\infty \). We define the notion of c-compatibility and strong c-compatibility of two measures, and prove that if there is a finite-cost plan between the measures then the measures must be c-compatible, and if in addition the two measures are strongly c-compatible, then there is an optimal plan concentrated on a c-subgradient of a c-class function. This function is the so-called potential of the plan. We give two proofs of this theorem, under slightly different assumptions. In the first we utilize the notion of c-path-boundedness, showing that strong c-compatibility implies a strong connectivity result for a directed graph associated with an optimal map. Strong connectivity of the graph implies that the c-cyclic monotonicity of the support set (which follows from classical reasoning) guarantees its c-path-boundedness, implying, in turn, the existence of a potential. We also give a constructive proof, in the case when one of the measures is discrete. This approach adopts a new notion of ‘Hall polytopes’, which we introduce and study in depth, to which we apply a version of Brouwer’s fixed point theorem to prove the existence of a potential in this case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. All considered measures are Borel measures on Polish spaces, which are complete, separable metric spaces equipped with their Borel \(\sigma \)-algebra (see also the beginning of Sect. 2).

  2. When referring to a function on \(X \times Y\) as “measurable” we assume it is both measurable with respect to the product \(\sigma \)-algebra and its fibers \(f(\cdot ,y)\) and \(f(x,\cdot )\) are measurable functions on X and Y respectively, for any \(x\in X\) and \(y\in Y\).

References

  1. Ambrosio, L., Pratelli, A.: Existence and stability results in the \({L}^1\) theory of optimal transportation. Optimal transportation and applications, pp. 123–160. Springer, Berlin, Heidelberg (2003)

    MATH  Google Scholar 

  2. Artstein-Avidan, S., Barel, H., Rubinstein, Y., Sadovsky, S., Wyczesany, K.: Transportation induced by the polarity transform, in preparation

  3. Artstein-Avidan, S., Milman, V.: Hidden structures in the class of convex functions and a new duality transform. J. Eur. Math. Soc. 13(4), 975–1004 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Artstein-Avidan, S., Rubinstein, Y.A.: Differential analysis of polarity: Polar Hamilton-Jacobi, conservation laws, and Monge Ampère equations. J. d’Anal. Math. 132(1), 133–156 (2017)

    Article  MATH  Google Scholar 

  5. Artstein-Avidan, S., Sadovsky, S., Wyczesany, K.: A Rockafellar-type theorem for non-traditional costs. Adv. Math. 395, 108157 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  6. Ball, K.: An elementary introduction to monotone transportation. In: Geometric aspects of functional analysis, Lecture Notes in Mathematics, pp. 41–52. Springer, Berlin, Heidelberg (2004)

    Google Scholar 

  7. Barel, H.: Optimal transportation problem for polar cost, Master’s thesis, Tel Aviv University (2019)

  8. Beiglböck, M., Goldstern, M., Maresch, G., Schachermayer, W.: Optimal and better transport plans. J. Funct. Anal. 256(6), 1907–1927 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bianchini, S., Caravenna, L.: On optimality of c-cyclically monotone transference plans. Comptes Rendus Math. 348(11–12), 613–618 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  10. Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions. Commun. Pure Appl. Math. 44(4), 375–417 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  11. Caffarelli, L.A.: The regularity of mappings with a convex potential. J. Am. Math. Soc. 5(1), 99–104 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  12. Gangbo, W., McCann, R.J.: The geometry of optimal transportation. Acta Math. 177(2), 113–161 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  13. Kantorovich, L.V.: On the transfer of masses. Dokl. Acad. Sci. SSSR 37, 227–229 (1942)

    Google Scholar 

  14. Kantorovich, L.V.: On a problem of Monge. Uspekhi Mat. Nauk. 3, 225–226 (1948)

    Google Scholar 

  15. McCann, R.J.: A convexity theory for interacting gases and equilibrium crystals, Ph.D. thesis, Princeton University (1994)

  16. Rochet, J.C.: A necessary and sufficient condition for rationalizability in a quasi-linear context. J. Math. Econ. 16(2), 191–200 (1987)

    Article  MATH  Google Scholar 

  17. Rockafellar, R.T.: Characterization of the subdifferentials of convex functions. Pac. J. Math. 17(3), 497–510 (1966)

    Article  MathSciNet  MATH  Google Scholar 

  18. Rüschendorf, L.: On c-optimal random variables. Stat. Probab. Lett. 27(3), 267–270 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  19. Smith, C., Knott, M.: On Hoeffding-Fréchet bounds and cyclic monotone relations. J. Multivar. Anal. 40(2), 328–334 (1992)

    Article  MATH  Google Scholar 

  20. Strassen, V.: The existence of probability measures with given marginals. Ann. Math. Stat. 36(2), 423–439 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  21. Trudinger, N.S., Wang, X.: On strict convexity and continuous differentiability of potential functions in optimal transportation. Arch. Ration. Mech. Anal. 192(3), 403–418 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  22. Villani, C.: Topics in optimal transportation, vol. 58. Graduate Studies in Mathematics American Mathematical Society, Ann Arbor (2003)

    MATH  Google Scholar 

  23. Villani, C.: Optimal transport: old and new, vol. 338. Springer Science & Business Media, Springer, New York, (2008)

    MATH  Google Scholar 

  24. Wyczesany, K.: Topics in high-dimensional geometry and optimal transport, Ph.D. thesis, University of Cambridge (2020)

Download references

Acknowledgements

The authors were partially supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 770127). The first named author was partially supported by ISF Grant No. 784/20. The second named author is grateful to the Azrieli foundation for the award of an Azrieli fellowship. We thank the anonymous referee whose comments helped improve and clarify this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Sadovsky.

Additional information

Communicated by A. Mondino.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: c-subgradients and polar subgradients

Appendix A: c-subgradients and polar subgradients

Since c-subgradients play such an important role in this theory, we gather here some relevant information regarding them but which we did not include in the main text so as not to disturb its flow.

Let us recall that given a function \(\varphi \) in the c-class, its c-subgradient is defined by

$$\begin{aligned} \partial ^c\varphi = \left\{ (x,y):\, \varphi (x)+\varphi ^c(y)=c(x,y) \,\text { and } \, c(x,y)<\infty \right\} . \end{aligned}$$

Denoting by \(\partial ^c\varphi (x)\) the set of points \(y\in Y\) for which \((x,y)\in \partial ^c \varphi \), we have by the definition that

$$\begin{aligned} x\in \partial ^c\varphi ^c(y)\;\;\Leftrightarrow \;\;y\in \partial ^c\varphi (x). \end{aligned}$$

Notice that \(y\in \partial ^c\varphi (x)\) if and only if the function \(c(\cdot ,y)-\varphi ^c(y)\) is above \(\varphi \) and coincides with it at x. This provides the first simple but useful way to think about c-subgradients, summarized in Lemma A.1. Given a function \(\varphi \) in the c-class, it is the image, under the c-transform, of another c-class function \(\psi = \varphi ^c\) and therefore, it can be written as an infimum over basic functions as follows:

$$\begin{aligned} \varphi (x) = \inf _y \left( c(x,y) - \varphi ^c(y)\right) . \end{aligned}$$

All the functions on the right hand side lie above \(\varphi \). If any one of the basic functions (indexed by y) on the right hand side is equal to \(\varphi \) at the point x, then the pair (xy) belongs to \(\partial ^c \varphi \), and \(y \in \partial ^c \varphi (x)\).

Lemma A.1

Let \(\varphi \) be a c-class function, and \(x\in X\) and assume that \(\varphi (x)<\infty \). Then \(y_0\in \partial ^c \varphi (x)\) if and only if \(c(x,y_0)<\infty \) and the function \(\ell (z)=c(z,y_0)-c(x,y_0)+\varphi (x)\) satisfies

$$\begin{aligned} \ell (z)\ge \varphi (z) \ \text { for all } \ z\in X. \end{aligned}$$

Proof

By the definition we have that \(y_0\in \partial ^c \varphi (x)\) if and only if

$$\begin{aligned} \varphi (x)+ \varphi ^c(y_0)=c(x,y_0)<\infty . \end{aligned}$$

Using the definition of the c-transform we see that

$$\begin{aligned} \varphi (x) = c(x,y_0)-\varphi ^c (y_0)= \sup _z(c(x,y_0)-c(z,y_0)+\varphi (z)), \end{aligned}$$

which holds if and only if for all z we have \(c(z,y_0)-c(x,y_0)+\varphi (x) \ge \varphi (z)\). \(\square \)

It is useful to understand the structure of the c-subgradient of the basic functions. In parallel to the classical case, where the linear functions have constant subgradient, we show that under mild assumptions the same is true for c-subgradients of basic functions. This was, of course, our motivation for using the specific candidates for the potential functions in Sect. 5.

Lemma A.2

Let XY be Polish spaces and let \(c:X\times Y \rightarrow (-\infty , \infty ]\) be a measurable cost function. Consider a basic function \(\varphi (x) = c(x,y_0)+ t\) for some \(y_0 \in Y\). If \(c(x,y_0)<\infty \), then \(y_0\in \partial ^c\varphi (x)\). If, in addition, for any \(y_1\ne y_0\) we have that \(\inf _z\left( c(z,y_1) - c(z,y_0)\right) \) is not attained at x (for example, if the infimum is \(-\infty \), or bounded but not attained at all) then \(\{y_0 \}= \partial ^c\varphi (x)\).

Proof

Indeed, let \(\varphi \) be as in the statement. From the definition it follows that \(y\in \partial ^c\varphi (x)\) if and only if \(c(x,y)<\infty \) and

$$\begin{aligned} c(x,y)-\varphi (x)=\varphi ^c(y) =\inf _{z} ( c(z,y)-\varphi (z)), \end{aligned}$$

which can be reformulated as

$$\begin{aligned} c(x,y)-\varphi (x) \le c(z,y_1)-\varphi (z) \ \ \ \text { for all } z\in X. \end{aligned}$$

Plugging in the definition of \(\varphi \) we get

$$\begin{aligned} c(x,y)-c(x,y_0)\le c(z,y)-c(z,y_0) \ \ \ \text { for all } z\in X. \end{aligned}$$

We see that \(y=y_0\) always satisfies the equality, so that \(y_0\in \partial ^c \varphi (x)\). Clearly for \(y_1\ne y_0\), such an inequality means precisely that the infimum is attained at x. \(\square \)

An important and motivating first example is the one coming from the clasical cost function \(c(x,y) = -\langle x,y \rangle \).

Example A.3

For the cost function \(c(x,y)=-\langle x,y \rangle \), whose transport plans and maps coincide with those associated to the quadratic cost, the c-subgradient coincides, up to a minus sign, with the well known subgradient. More formally, a function \(\varphi \) is in the c-class if and only if \(-\varphi \in \textrm{Cvx}({\mathbb {R}}^n)\), namely is convex and lower semi-continuous. Denoting \(\psi = -\varphi \) and using the definition of the c-transform we see that \((x,y)\in \partial ^c \varphi \) if and only if for all \(z\in X\) we have

$$\begin{aligned} \psi (z) - \psi (x) = \varphi (x)-\varphi (z)\ge c(x,y)-c(z,y). \end{aligned}$$

Plugging in the cost we indeed get that \(y\in \partial ^c \varphi (x)\) if for all z it holds that \( \psi (x) + \langle z-x,y \rangle \le \psi (z)\), namely \(y\in \partial \psi (x)\).

The second motivating example, which is our main point of interest, is that of the polar cost \(p:{\mathbb {R}}^n \times {\mathbb {R}}^n \rightarrow (-\infty , \infty ]\), which we once again recall

$$\begin{aligned} p(x,y) =- \ln (\langle x,y \rangle -1) = {\left\{ \begin{array}{ll} - \ln (\langle x,y \rangle -1), &{} \ \ \text {if } \langle x,y \rangle >1 \\ +\infty , &{} \ \ \text {otherwise.} \end{array}\right. } \end{aligned}$$

It was shown in [7] that for the polar cost the p-class consists of all functions of the form \(-\ln (\varphi )\), where \(\varphi \) is a geometric convex function, that is, a lower semi-continuous  non-negative convex function with \(\varphi (0)=0\). We denote this class by \(\textrm{Cvx}_0({\mathbb {R}}^n)\). The associated cost transform is linked with the \({{\mathcal {A}}}\)-transform defined in [3] and given by

$$\begin{aligned} {{\mathcal {A}}}\varphi (y)= \sup _{\{x:\, \langle x,y\rangle >1\}} \frac{\langle x,y\rangle -1}{\varphi (x)}. \end{aligned}$$
(16)

More precisely, one may easily verify that \(- \ln ({{\mathcal {A}}}\varphi )=(-\ln (\varphi ))^p\). Further, the p-subgradient of the function \(-\ln (\varphi )\) can be rewritten as the polar subgradient \(\partial ^\circ \), introduced in [4], of the function \(\varphi \in \textrm{Cvx}_0({\mathbb {R}}^n)\). Indeed, we have that

$$\begin{aligned} \partial ^p(-\ln (\varphi ))= \partial ^\circ \varphi = \left\{ (x,y): \varphi (x) {{\mathcal {A}}}\varphi (y) = \langle x,y\rangle -1>0 \right\} . \end{aligned}$$
(17)

This convenient form is a reason for us to sometimes consider a “multiplicative” setting, where the basic functions are of the form

$$\begin{aligned} \varphi _{u,t}(x) = t (\langle x,u \rangle -1)_+ = t \max (\langle x,u \rangle -1,0). \end{aligned}$$

The next lemma, which is a version of [4, Lemma 3.3], describes the connection between the polar subgradient and the classical subgradient. We will also use the following notation for the zero set \(Z_\varphi = \{ x: \varphi (x) = 0\}\) and \(\textrm{dom}( \varphi ) = \{x: \varphi (x) <\infty \}\) for the domain where \(\varphi \) is finite.

Lemma A.4

Let \(\varphi \in \textrm{Cvx}_0 ({\mathbb {R}}^n)\) and let \(x\in { \mathrm dom}(\varphi ){\setminus } Z_\varphi \). Then there is a bijection between the set \(\{ z\in \partial \varphi (x): \, \langle x,z\rangle \ne \varphi (x)\}\) and the set \(\partial ^\circ \varphi (x)\) given by \(z \mapsto y=\frac{z}{ \langle x,z \rangle - \varphi (x)}\). (Note that these sets might be empty.)

Proof

We first show that the mapping has the desired range. Let \(z\in \partial \varphi (x)\) with \(\langle x,z\rangle \ne \varphi (x)\), which means that for every w we have \(\langle w,z \rangle -\varphi (w)\le \langle x,z \rangle -\varphi (x)\). In particular, \(\langle x,z\rangle -\varphi (x)> 0\). Hence, letting \(y=\frac{z}{ \langle x,z \rangle - \varphi (x)}\) we have that \(\langle x,y\rangle >1\).

To show that \(y \in \partial ^\circ \varphi (x)\), it remains to show that \( \varphi (x){{\mathcal {A}}}\varphi (y) = \langle x,y \rangle -1\). According to the definition of \({{\mathcal {A}}}\), this holds if for every w with \(\langle w,y \rangle >1\) and \(\varphi (w)>0\), we have

$$\begin{aligned} \frac{\langle w,y \rangle -1}{\varphi (w)}\le \frac{\langle x,y \rangle -1}{\varphi (x)}. \end{aligned}$$

Plugging in the formula for y and rearranging gives

$$\begin{aligned} \frac{\langle w,\frac{z}{ \langle x,z \rangle - \varphi (x)} \rangle -1}{\varphi (w)}\le \frac{\langle x,\frac{z}{ \langle x,z \rangle - \varphi (x)} \rangle -1}{\varphi (x)} = {\frac{1}{ \langle x,z \rangle - \varphi (x)}}. \end{aligned}$$

Using that \(\langle x,z \rangle -\varphi (x)> 0\), the above inequality is equivalent to our initial assumption \(\langle w,z \rangle -\varphi (w)\le \langle x,z \rangle -\varphi (x)\).

Having established that the mapping is indeed into \(\partial ^{\circ }\varphi (x)\), we proceed to show that it is surjective. Given \(y \in \partial ^\circ \varphi (x)\) it follows from the definition that \(\langle x,y \rangle >1\). Consider

$$\begin{aligned} z = \frac{ y \varphi (x)}{\langle y,x \rangle -1}, \end{aligned}$$
(18)

which is well defined. Taking scalar product with x we see that \(\langle z,x \rangle = \frac{\langle y,x \rangle \varphi (x)}{\langle y,x \rangle -1}>0\), which can be rearranged to \(\langle y,x \rangle (\langle z,x \rangle -\varphi (x)) = \langle z,x \rangle >0\) so, in particular, \(\langle z,x \rangle \ne \varphi (x)\). Solving for \(\langle x,y \rangle \) and plugging back into the Eq. (18) we get that \(y=\frac{z}{ \langle x,z \rangle - \varphi (x)}\). To prove surjectivity, it remain to show that \(z\in \partial \varphi (x)\). To this end, as before, we use that if \(y\in \partial ^\circ \varphi (x)\) then for any w with \(\langle w,y \rangle >1\) and \(\varphi (w) >0\) we have \( \frac{\langle w,y \rangle -1}{\varphi (w)}\le \frac{\langle x,y \rangle -1}{\varphi (x)}\). Plugging in the formula for y and rearranging, we get that

$$\begin{aligned} \varphi (x) + \langle w-x,z \rangle \le \varphi (w) \end{aligned}$$

holds for any w such that \(\langle w,z \rangle >\langle x,z \rangle -\varphi (x)\) and \(\varphi (w)>0\). In the case when w is such that \(\langle w,z \rangle \le \langle x,z \rangle -\varphi (x)\), this actually means that \( \varphi (x) + \langle w-x,z \rangle \le 0\) and since the geometric convex functions are non-negative the desired inequality trivially follows. Finally, we consider the case when \(w\in Z_\varphi \), i.e. when \(\varphi (w)=0\). Then, plugging in z as defined in (18), we have that the inequality defining the subgradient of \(\varphi \) at x becomes simply

$$\begin{aligned} \langle w,y\rangle \le 1. \end{aligned}$$

That is, we need to show that \(\partial ^\circ \varphi (x)\) is contained in the polar set of \(Z_\varphi \). Indeed, \(y\in \partial ^\circ \varphi (x)\) implies, in particular, that \(y \in \textrm{dom}\,({{\mathcal {A}}}\varphi )\) (since the value of \({{\mathcal {A}}}\varphi (y) = \frac{\langle x,y \rangle -1}{\varphi (x)} <\infty \)) and it follows from the definition of \({{\mathcal {A}}}\) that \(\textrm{dom}\,({{\mathcal {A}}}\varphi )\subset Z_\varphi ^\circ \), which completes the proof of surjectivity.

To show injectivity, assume that for \(z,w\in \partial \varphi (x)\) we have that \( y=\frac{z}{ \langle x,z \rangle - \varphi (x)} = \frac{w}{ \langle x,w \rangle - \varphi (x)}\). This means that these vectors are parallel. Taking scalar product of each of them with x and simplifying we get that \(\langle x,z \rangle = \langle x,w \rangle \), and plugging this back into the equality we get \(z = w\), as claimed. \(\square \)

We end this appendix with one explicit example of a function and its p-subgradient. More examples and applications can be found in [7, 24] and in the forthcoming [2].

Example A.5

Let \(\varphi (x) = \Vert x\Vert ^2/2\), in which case \({{{\mathcal {A}}}}\varphi (y)= \Vert y\Vert ^2/2\) and the supremum in the definition of \({{\mathcal {A}}}\varphi \) is satisfied for \(x = 2y/\Vert y\Vert ^2\). Hence, \(\partial ^\circ \varphi (x) = \frac{2x}{\Vert x\Vert ^2}\). Note that the mapping \(x\mapsto \partial ^\circ \varphi (x)\) in this case is a (rescaled) spherical inversion.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Artstein-Avidan, S., Sadovsky, S. & Wyczesany, K. Optimal measure transportation with respect to non-traditional costs. Calc. Var. 62, 35 (2023). https://doi.org/10.1007/s00526-022-02362-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00526-022-02362-w

Mathematics Subject Classification

Navigation