Skip to main content
Log in

From Mahalanobis to Bregman via Monge and Kantorovich

Towards a “General Generalized Distance”

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

In his celebrated 1936 paper on “the generalized distance in statistics,” P.C. Mahalanobis pioneered the idea that, when defined over a space equipped with some probability measure P, a meaningful distance should be P-specific, with data-driven empirical counterpart. The so-called Mahalanobis distance and the related Mahalanobis outlyingness achieve this objective in the case of a Gaussian P by mapping P to the spherical standard Gaussian, via a transformation based on second-order moments which appears to be an optimal transport in the Monge-Kantorovich sense. In a non-Gaussian context, though, one may feel that second-order moments are not informative enough, or inappropriate; moreover, they might not exist. We therefore propose a distance that fully takes the underlying P into account—not just its second-order features—by considering the potential that characterizes the optimal transport mapping P to the uniform over the unit ball, along with a symmetrized version of the corresponding Bregman divergence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

References

  • Boissonnat, J.D., Nielsen, F. and and Nock, R. (2010). Bregman Voronoi diagrams. Discrete Comput. Geometry 201044, 281–307.

    Article  MathSciNet  Google Scholar 

  • Bregman, L. (1967). The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys.7, 200–217.

    Article  MathSciNet  Google Scholar 

  • Brenier, Y. (1987). Décomposition polaire et réarrangement monotone des champs de vecteurs. Comptes Rendus de l’Académie des Sciences de Paris Série I Mathémathique305 19, 805–808.

    MATH  Google Scholar 

  • Brenier, Y. (1991). Polar factorization and monotone rearrangement of vector-valued functions. Commun. Pure Appl. Math.44, 375–417.

    Article  MathSciNet  Google Scholar 

  • Burger, M. (2016). Bregman distances in inverse problems and partial differential equations. In Hirriart-Urruty, J.B., Korytowski, A., Maurer, H. and Szymkat, M. (eds.), Advances in Mathematical Modeling, Optimization and Optimal Control, pp. 3–33. Springer.

  • Carlier, C.V. and Galichon, A. (2016). Vector quantile regression. Annal. Stat.44, 1165–1192.

    Article  Google Scholar 

  • Chernozhukov, V., Galichon, A., Hallin, M. and Henry, M. (2017). Monge-Kantorovich depth, quantiles, ranks, and signs. Annal. Stat.45, 223–256.

    Article  MathSciNet  Google Scholar 

  • Cuesta-Albertos, J., Rüschendorf, L. and Tuero-Diaz, A. (1993). Optimal coupling of multivariate distributions and stochastic processes. J. Multivariate Anal.46, 335–361.

    Article  MathSciNet  Google Scholar 

  • Del Barrio, E., Cuesta-Albertos, J., Hallin, M. and Matran, C. (2018). Smooth cyclically monotone interpolation and empirical center-outward distribution functions. Available at arXiv:1806.01238.

  • Figalli, A. (2018). On the continuity of center-outward distribution and quantile functions. Nonlinear Analysis, to appear.

  • Galichon, A. (2016). Optimal Transport Methods in Economics. Princeton University Press, Princeton.

    Book  Google Scholar 

  • Hallin, M. (2018). On distribution and quantile functions, ranks and signs in \(\mathbb {R}^{d}\): a measure transportation approach. Available at https://ideas.repec.org/p/eca/wpaper/2013-258262.html.

  • Knott, M. and Smith, C.S. (1984). On the optimal mapping of distributions. J. Optim. Theory Appl.43, 39–49.

    Article  MathSciNet  Google Scholar 

  • Mahalanobis, P.C. (1936). On the generalised distance in statistics. Proc. National Inst. Sci. India2, 49–55.

    MATH  Google Scholar 

  • McCann, R.J. (1995). Existence and uniqueness of monotone measure-preserving maps. Duke Math. J.80, 309–324.

    Article  MathSciNet  Google Scholar 

  • Olkin, I. and Pukelsheim, F. (1982). The distance between two random vectors with given dispersion matrices. Linear Algebra Appl.48, 257–263.

    Article  MathSciNet  Google Scholar 

  • Panaretos, V. and Zemel, Y. (2016). Amplitude and phase variation of point processes. Annal. Stat.44, 771–812.

    Article  MathSciNet  Google Scholar 

  • Panaretos, V. and Zemel, Y. (2018). Fréchet means and Procrustes analysis in Wasserstein space. Bernoulli, to appear.

  • Rachev, S.T. and Rüschendorf, L. (1998). Mass Transportation Problems, I and II. Springer, New York.

    MATH  Google Scholar 

  • Villani, C. (2003). Topics in Optimal Transportation. American Mathematical Society, Providence, RI.

    Book  Google Scholar 

  • Villani, C. (2009). Optimal Transport: Old and New, Grundlehren der Mathematischen Wissenschaften. Springer, Heidelberg.

    Book  Google Scholar 

Download references

Acknowledgments

The author gratefully acknowledges inspiring conversations with Estasio del Barrio, Gérard Biau, Michel Cahen, Juan Cuesta Albertos, Christine De Mol, Alessio Figalli, Aurélie Fischer, Carlos Matran, Victor Panaretos, and Davy Paindaveine, as well as the comments by two referees and the Editor, which helped improving the original version of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Hallin.

Additional information

Most of this note was prepared while the author was visiting the ISI (Indian Statistical Institute) in Kolkata, Chennai, Bengaluru and Delhi as the recipient of the 2017-18 Mahalanobis Memorial Lecture. The warm hospitality and stimulating environment of the ISI are gratefully acknowledged.

Appendix. Measure Transportation in a Nutshell

Appendix. Measure Transportation in a Nutshell

This appendix provides a very elementary presentation of the measure transportation results used in this paper. It can be skipped by readers familiar with the subject.

Starting from very practical road construction problems, Gaspard Monge (1746–1818), in his 1781 Mémoire sur la Théorie des Déblais et des Remblais, probably did not realize that he was initiating a profound mathematical theory anticipating different areas of differential geometry, linear programming, nonlinear partial differential equations, and probability. Monge’s Mémoire indeed was motivated by a very terre à terre issue, related with road construction activities: how do you best move a given pile of sand to fill up a given hole of the same total volume?

The simplest and most intuitive abstract formulation of Monge’s problem is as follows. Let P1 and P2 denote two probability measures over (for simplicity) \((\mathbb {R}^{d}, \mathcal {B}^{d})\). Let \(L : \mathbb {R}^{2d} \to [0, \infty ]\) be a Borel-measurable loss function: L(x1,x2) represents the cost of transporting x1 to x2. Monge’s problem is an optimization problem: find a measurable transport map \(T_{\mathrm {P}_{1} ; \mathrm {P}_{2}} : \mathbb {R}^{d}\to \mathbb {R}^{d}\) that achieves the infimum

$$\inf_{T}{\int}_{\mathbb{R}^{d}} L(\textbf{x}, T(\mathbf{x}))\text{dP}_{1} \qquad \text{subject to} \quad T\# \mathrm{P}_{1} = \mathrm{P}_{2}$$

where T# P1 denotes the “push forward of P1 by T”—more classical statistical notation for this would be \(\mathrm {P}_{1}^{T\textbf {X}} = \mathrm {P}_{2}\). A map \(T_{\mathrm {P}_{1} ; \mathrm {P}_{2}}\) that attains this infimum is called an “optimal transport map”, in short, an “optimal transport”, of P1 into P2. In the sequel, we restrict to the L2 loss function \(L(\textbf {x}_{1},\textbf {x}_{2})=\Vert \textbf {x}_{1}- \textbf {x}_{2} {\Vert ^{2}_{2}}\); Monge was considering the more difficult Euclidean distance loss L(x1,x2) = ∥x1x22.

The problem looks simple, but it is not (leading to nonlinear partial differential equations of the Monge-Ampère type). Monge himself could not solve it, and relatively little progress was made until the 1940s, when renewed interest in the topic was triggered by the contributions of Leonid Vitalievitch Kantorovich (1912-1986; Nobel Prize in Economics in 1975). The fundamental idea behind Kantorovich’s approach consists in relaxing Monge’s problem into the more general one of constructing a distribution \(\gamma _{\mathrm {P}_{1}\mathrm {P}_{2}}\) on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\) minimizing

$${\int}_{(\textbf{x},\textbf{y})\in\mathbb{R}^{d}\times \mathbb{R}^{d}} \Vert \textbf{x}-\textbf{y}\Vert^{2} d\gamma $$

(equivalently, maximizing \(\int \langle \textbf {x},\textbf {y}\rangle d\gamma \)) among all distributions γ with marginals P1 and P2. This formulation allows for a groundbreaking duality approach, with solutions of the form

$$(\text{identity}\times T_{\mathrm{P}_{1} ; \mathrm{P}_{2}})\#\mathrm{P}_{1} $$

where \((\text {identity}\times T_{\mathrm {P}_{1} ; \mathrm {P}_{2}})\) is mapping x to \((\textbf {x}, T_{\mathrm {P}_{1} ; \mathrm {P}_{2}}(\textbf {x}))\)—so that \(T_{\mathrm {P}_{1} ; \mathrm {P}_{2}}\) is indeed a solution of Monge’s problem.

Among the most powerful ensuing results is the Polar Factorization Theorem by Brenier (1987, 1991; see Chapter 3 in Villani (2003)) which implies, among other things, that for L2 loss, if P1 and P2 are absolutely continuous with finite second-order moments, the solution \(T_{\mathrm {P}_{1} ; \mathrm {P}_{2}}\) of Monge’s problem exists, is (a.e.) unique, and the gradient ∇ψ of some convex (potential) function ψ—a form of multivariate monotonicity.

All this, including Monge’s problem, only makes sense under finite moments of order two, though. Brenier’s results were further enhanced by a most remarkable theorem by Robert J. McCann, who had the intuition that the problem was of a geometric rather than analytical nature. His main result (McCann, 1995) implies that, for any given (absolutely continuous—no second-order moments needed) P1 and P2, there exists a P1-essentially unique element in the class of gradients of convex functions mapping P1 to P2. Under the existence of finite moments of order two, that mapping moreover coincides with the L2-optimal (in the Monge-Kantorovich-Brenier sense) transport of P1 to P2.

The subject ever since has been a very active domain of mathematical analysis, with applications in various fields, from fluid mechanics to economics (see Galichon (2016)), learning, and statistics (Carlier and Galichon (2016); Panaretos and Zemel (2016, 2018)). It was popularized recently by the French Fields medalist Cédric Villani, with two monographs (Villani 2003, 2009), where we refer to for background reading, along with the two volumes by Rachev and Rüschendorf (1998), where the scope is somewhat closer to probabilistic and statistical concerns.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hallin, M. From Mahalanobis to Bregman via Monge and Kantorovich. Sankhya B 80 (Suppl 1), 135–146 (2018). https://doi.org/10.1007/s13571-018-0163-4

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-018-0163-4

Keywords and phrases

AMS (2000) subject classification

Navigation