Abstract
We investigate compact ultrametric measure spaces which form a subset \(\mathcal {U}^{\textrm{w}}\) of the collection of all metric measure spaces \(\mathcal {M}^{\textrm{w}}\). In analogy with the notion of the ultrametric Gromov–Hausdorff distance on the collection of ultrametric spaces \(\mathcal {U}\), we define ultrametric versions of two metrics on \(\mathcal {U}^{\textrm{w}}\), namely of Sturm’s Gromov–Wasserstein distance of order p and of the Gromov–Wasserstein distance of order p. We study the basic topological and geometric properties of these distances as well as their relation and derive for \(p=\infty \) a polynomial time algorithm for their calculation. Further, several lower bounds for both distances are derived and some of our results are generalized to the case of finite ultra-dissimilarity spaces. Finally, we study the relation between the Gromov–Wasserstein distance and its ultrametric version (as well as the relation between the corresponding lower bounds) in simulations and apply our findings for phylogenetic tree shape comparisons.
Similar content being viewed by others
Data Availability
The code and datasets generated during and/or analyzed during the current study are available in https://github.com/ndag/uGW and from http://dx.doi.org/10.5061/dryad.3r8v1.
Notes
Here “approximation” is meant in the sense that one can write code which will locally minimize the functional. There are in general no theoretical guarantees that these algorithms will converge to a global minimum.
A cluster point x in a topological space X is such that any neighborhood of x contains countably many points in X.
The algorithm can be sped up via a binary search process which we do not include for simplicity of presentation.
References
Adelson-Welsky, G.M., Kronrode, A.S.: Sur les lignes de niveau des fonctions continues possédant des dérivées partielles. C. R. (Doklady) Acad. Sci. URSS (N.S.) 49, 235–237 (1945)
Agarwal, P.K., Fox, K., Nath, A., Sidiropoulos, A., Wang, Y.: Computing the Gromov-Hausdorff distance for metric trees. ACM Trans. Algorithms 14(2), 24 (2018)
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Pearson Education, London (1974)
Alvarez-Melis, D., Jaakkola, T.: Gromov–Wasserstein alignment of word embedding spaces. In: 2018 Conference on Empirical Methods in Natural Language Processing (Brussels 2018), pp. 1881–1890. Association for Computational Linguistics (2018)
Bartal, Y.: Probabilistic approximation of metric spaces and its algorithmic applications. In: 37th Annual Symposium on Foundations of Computer Science (Burlington 1996), pp. 184–193. IEEE, Los Alamitos (1996)
Billera, L.J., Holmes, S.P., Vogtmann, K.: Geometry of the space of phylogenetic trees. Adv. Appl. Math. 27(4), 733–767 (2001)
Billingsley, P.: Convergence of Probability Measures. Probability and Statistics. Wiley, New York (2013)
Bonneel, N., Rabin, J., Peyré, G., Pfister, H.: Sliced and Radon Wasserstein barycenters of measures. J. Math. Imaging Vision 51(1), 22–45 (2015)
Bottou, L., Arjovsky, M., Lopez-Paz, D., Oquab, M.: Geometrical insights for implicit generative modeling. In: Rozonoer, L., et al. (eds.) Braverman Readings in Machine Learning. Lecture Notes in Computer Science, vol. 11100, pp. 229–268. Springer, Cham (2018)
Brinkman, D., Olver, P.J.: Invariant histograms. Am. Math. Mon. 119(1), 4–24 (2012)
Bronstein, A.M., Bronstein, M.M., Bruckstein, A.M., Kimmel, R.: Partial similarity of objects, or how to compare a centaur to a horse. Int. J. Comput. Vis. 84(2), 163–183 (2009)
Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Efficient computation of isometry-invariant distances between surfaces. SIAM J. Sci. Comput. 28(5), 1812–1836 (2006)
Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching. Proc. Natl. Acad. Sci. USA 103(5), 1168–1172 (2006)
Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Topology-invariant similarity of nonrigid shapes. Int. J. Comput. Vis. 81(3), 281–301 (2009)
Bronstein, A.M., Bronstein, M.M., Kimmel, R., Mahmoudi, M., Sapiro, G.: A Gromov–Hausdorff framework with diffusion geometry for topologically-robust non-rigid shape matching. Int. J. Comput. Vis. 89(2–3), 266–286 (2010)
Brown, P., Pullan, W., Yang, Y., Zhou, Y.: Fast and accurate non-sequential protein structure alignment using a new asymmetric linear sum assignment heuristic. Bioinformatics 32(3), 370–377 (2016)
Bunne, C., Alvarez-Melis, D., Krause, A., Jegelka, S.: Learning generative models across incomparable spaces. In: 36th International Conference on Machine Learning (Long Beach 2019), pp. 851–861. PMLR (2019)
Carlsson, G., Mémoli, F.: Characterization, stability and convergence of hierarchical clustering methods. J. Mach. Learn. Res. 11, 1425–1470 (2010)
Chazal, F., Cohen-Steiner, D., Guibas, L.J., Mémoli, F., Oudot, S.Y.: Gromov–Hausdorff stable signatures for shapes using persistence. In: 7th Symposium on Geometry Processing (Berlin 2009), pp. 1393–1403. ACM, New York (2009)
Chen, J., Safro, I.: Algebraic distance on graphs. SIAM J. Sci. Comput. 33(6), 3468–3490 (2011)
Chowdhury, S., Mémoli, F.: The Gromov-Wasserstein distance between networks and stable network invariants. Inf. Inference 8(4), 757–787 (2019)
Chowdhury, S., Needham, T.: Generalized spectral clustering via Gromov–Wasserstein learning. In: 24th International Conference on Artificial Intelligence and Statistics (San Diego 2021), pp. 712–720. PMLR (2021)
Colijn, C., Plazzotta, G.: A metric on phylogenetic tree shapes. Syst. Biol. 67(1), 113–126 (2018)
David, G., Semmes, S.W.: Fractured Fractals and Broken Dreams: Self-Similar Geometry Through Metric and Measure. Oxford Lecture Series in Mathematics and its Applications, vol. 7. Oxford University Press, New York (1997)
Do Ba, K., Nguyen, H.L., Nguyen, H.N., Rubinfeld, R.: Sublinear time algorithms for Earth mover’s distance. Theory Comput. Syst. 48(2), 428–442 (2011)
Dong, Y., Sawin, W.: COPT: Coordinated optimal transport on graphs. In: Advances in Neural Information Processing Systems, vol. 33, 19, 327–19, 338. Curran Associates, Red Hook (2020)
Dordovskyi, D., Dovgoshey, O., Petrov, E.: Diameter and diametrical pairs of points in ultrametric spaces. \(p\)-Adic Numbers Ultrametric Anal. Appl. 3(4), 253–262 (2011)
Dudley, R.M.: Real Analysis and Probability. CRC Press, Boca Raton (2017)
Edwards, D.A.: The structure of superspace. In: Studies in Topology (Charlotte 1974), pp. 121–133. Academic Press, New York (1975)
Evans, S.N.: Probability and Real Trees. Lectures from the 35th Summer School on Probability Theory (Saint-Flour 2005). Springer, Berlin (2008)
Evans, S.N., Matsen, F.A.: The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74(3), 569–592 (2012)
Fakcharoenphol, J., Rao, S., Talwar, K.: A tight bound on approximating arbitrary metrics by tree metrics. J. Comput. Syst. Sci. 69(3), 485–497 (2004)
Folland, G.B.: Real Analysis: Modern Techniques and their Applications. 2nd edn. Pure and Applied Mathematics (New York). Wiley, New York (1999)
Gellert, M., Hossain, M.F., Berens, F.J.F., Bruhn, L.W., Urbainsky, C., Liebscher, V., Lillig, C.H.: Substrate specificity of thioredoxins and glutaredoxins—towards a functional classification. Heliyon 5(12), e02943 (2019)
Givens, C.R., Shortt, R.M.: A class of Wasserstein metrics for probability distributions. Mich. Math. J. 31(2), 231–240 (1984)
Greven, A., Pfaffelhuber, P., Winter, A.: Convergence in distribution of random metric measure spaces (\(\Lambda \)-coalescent measure trees). Probab. Theory Relat. Fields 145(1–2), 285–322 (2009)
Grindstaff, G., Owen, M.: Representations of partial leaf sets in phylogenetic tree space. SIAM J. Appl. Algebra Geom. 3(4), 691–720 (2019)
Gromov, M.: Groups of polynomial growth and expanding maps (with an appendix by Jacques Tits). Inst. Hautes Études Sci. Publ. Math. 53, 53–78 (1981)
Hein, J.: Reconstructing evolution of sequences subject to recombination using parsimony. Math. Biosci. 98(2), 185–200 (1990)
Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233(1), 123–138 (1993)
Howes, N.R.: Modern Analysis and Topology. Springer, Berlin (2012)
Jain, A.K., Dorai, C.: 3D object recognition: representation and matching. Stat. Comput. 10(2), 167–182 (2000)
Jardine, N., Sibson, R.: Mathematical Taxonomy. Wiley Series in Probability and Mathematical Statistics, Wiley, London (1971)
Kantorovich, L.: On the translocation of masses. C. R. (Doklady) Acad. Sci. URSS (N.S.) 37, 199–201 (1942)
Kantorovich, L.V., Rubinstein, G.S.: On a space of completely additive functions. Vestnik Leningrad. Univ. 13(7), 52–59 (1958) (in Russian)
Kloeckner, B.R.: A geometric study of Wasserstein spaces: ultrametrics. Mathematika 61(1), 162–178 (2015)
Kolmogorov, A.N., Fomin, S.V.: Elements of the Theory of Functions and Functional Analysis, vol. 1. Graylock Press, Rochester (1957)
Kolouri, S., Nadjahi, K., Simsekli, U., Badeau, R., Rohde, G.: Generalized sliced Wasserstein distances. In: Advances in Neural Information Processing Systems, vol. 32, pp. 261–272. Curran Associates, Red Hook (2019)
Kufareva, I., Abagyan, R.: Methods of protein structure comparison. Methods Mol. Biol. 857, 231–257 (2012)
Kuo, H.-Y., Su, H.-R., Lai, S.-H., Wu, C.-C.: 3D object detection and pose estimation from depth image for robotic bin picking. In: 2014 IEEE International Conference on Automation Science and Engineering (New Taipei 2014), pp. 1264–1269. IEEE (2014)
Lafond, M., El-Mabrouk, N., Huber, K.T., Moulton, V.: The complexity of comparing multiply-labelled trees by extending phylogenetic-tree metrics. Theoret. Comput. Sci. 760, 15–34 (2019)
Lambert, A., Uribe Bravo, G.: The comb representation of compact ultrametric spaces. \(p\)-Adic Numbers Ultrametric Anal. Appl. 9(1), 22–38 (2017)
Le, T., Ho, N., Yamada, M.: Computationally Efficient Tree Variants of Gromov–Wasserstein (2019). arXiv:1910.04462
Le, T., Yamada, M., Fukumizu, K., Cuturi, M.: Tree-sliced variants of Wasserstein distances. In: 33rd Conference on Neural Information Processing Systems (Vancouver 2019), pp. 12304–12315. Curran Associates, Red Hook (2019)
Liebscher, V.: New Gromov-inspired metrics on phylogenetic tree space. Bull. Math. Biol. 80(3), 493–518 (2018)
Lowe, D.G.: Local feature view clustering for 3D object recognition. In: 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Kauai 2001), pp. I–I. IEEE (2001)
Mallows, C.L.: A note on asymptotic joint normality. Ann. Math. Stat. 43, 508–515 (1972)
McGregor, A., Stubbs, D.: Sketching Earth-Mover distance on graph metrics. In: Approximation, Randomization, and Combinatorial Optimization (Berkeley 2013). Lecture Notes in Computer Science, vol. 8096, pp. 274–286. Springer, Heidelberg (2013)
Mémoli, F.: On the use of Gromov–Hausdorff distances for shape comparison. In: Eurographics Symposium on Point-Based Graphics (Prague 2007). The Eurographics Association (2007). https://doi.org/10.2312/SPBG/SPBG07/081-090
Mémoli, F.: Gromov–Wasserstein distances and the metric approach to object matching. Found. Comput. Math. 11(4), 417–487 (2011)
Mémoli, F., Needham, T.: Distance distributions and inverse problems for metric measure spaces. Stud. Appl. Math. 149(4), 943–1001 (2022)
Mémoli, F., Sapiro, G.: Comparing point clouds. In: 2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing (Nice 2004), pp. 32–40. ACM, New York (2004)
Mémoli, F., Smith, Z., Wan, Z.: The Gromov–Hausdorff distance between ultrametric spaces: its structure and computation. J. Comput. Geom. (to appear). arXiv:2110.03136
Mémoli, F., Wan, Z.: On \(p\)-metric spaces and the \(p\)-Gromov–Hausdorff distance. \(p\)-Adic Numbers Ultrametric Anal. Appl. 14(3), 173–223 (2022)
Mézard, M., Parisi, G., Virasoro, M.A.: Spin Glass Theory and Beyond. World Scientific Lecture Notes in Physics, vol. 9. World Scientific, Teaneck (1987)
Morozov, D., Beketayev, K., Weber, G.H.: Interleaving distance between merge trees. TopoInVis’13. https://www.mrzv.org/publications/interleaving-distance-merge-trees/manuscript/
Nies, T.G., Staudt, T., Munk, A.: Transport dependency: Optimal transport based dependency measures (2021). arXiv:2105.02073 (2021)
Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. ACM Trans. Graph. 21(4), 807–832 (2002)
Owen, M., Provan, J.S.: A fast algorithm for computing geodesic distances in tree space. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(1), 2–13 (2011)
Papazov, C., Haddadin, S., Parusel, S., Krieger, K., Burschka, D.: Rigid 3D geometry matching for grasping of known objects in cluttered scenes. Intern. J. Robotics Res. 31(4), 538–553 (2012)
Pardalos, P.M., Vavasis, S.A.: Quadratic programming with one negative eigenvalue is NP-hard. J. Global Optim. 1(1), 15–22 (1991)
Peyré, G., Cuturi, M., Solomon, J.: Gromov–Wasserstein averaging of kernel and distance matrices. In: 33rd International Conference on Machine Learning (New York 2016), pp. 2664–2672. JMLR (2016)
Qiu, D.: Geometry of non-Archimedean Gromov–Hausdorff distance. \(p\)-Adic Numbers Ultrametric Anal. Appl. 1(4), 317–337 (2009)
Rammal, R., Toulouse, G., Virasoro, M.A.: Ultrametricity for physicists. Rev. Mod. Phys. 58(3), 765–788 (1986)
Reeb, G.: Sur les points singuliers d’une forme de Pfaff complètement intégrable ou d’une fonction numérique. C. R. Acad. Sci. Paris 222, 847–849 (1946)
Robinson, D.F.: Comparison of labeled trees with valency three. J. Comb. Theory Ser. B 11(2), 105–119 (1971)
Robinson, D.F., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131–147 (1981)
Rubner, Y., Tomasi, C., Guibas, L.J.: The Earth Mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)
Scetbon, M., Peyré, G., Cuturi, M.: Linear-time Gromov–Wasserstein distances using low rank couplings and costs. In: 39th International Conference on Machine Learning (Baltimore 2022), pp. 19,347–19,365. PMLR (2022)
Schmiedl, F.: Computational aspects of the Gromov–Hausdorff distance and its application in non-rigid shape matching. Discrete Comput. Geom. 57(4), 854–880 (2017)
Semmes, S.: An introduction to the geometry of ultrametric spaces (2007). arXiv:0711.0709
Semple, C., Steel, M.: Phylogenetics. Oxford Lecture Series in Mathematics and Its Applications, vol. 24. Oxford University Press, New York (2003)
Sturm, K.-T.: On the geometry of metric measure spaces. I. Acta Math. 196(1), 65–131 (2006)
Sturm, K.T.: The space of spaces: Curvature bounds and gradient flows on the space of metric measure spaces (2012). arXiv:1208.0434
Thorsley, D., Klavins, E.: Model reduction of stochastic processes using Wasserstein pseudometrics. In: 2008 American Control Conference (Seattle 2008), pp. 1374–1381. IEEE (2008)
Titouan, V., Courty, N., Tavenard, R., Flamary, R.: Optimal transport for structured data with application on graphs. In: 36th International Conference on Machine Learning (Long Beach 2019), pp. 6275–6284. PMLR (2019)
Touli, E.F., Wang, Y.: FPT-algorithms for computing Gromov–Hausdorff and interleaving distances between trees. In: 27th Annual European Symposium on Algorithms (Munich 2019). Leibniz Int. Proc. Inform., vol. 144, # 83. Leibniz-Zent. Inform., Wadern (2019)
Vallender, S.S.: Calculation of the Wasserstein distance between probability distributions on the line. Theory Probab. Appl. 18(4), 784–786 (1974)
Vayer, T., Flamary, R., Tavenard, R., Chapel, L., Courty, N.: Sliced Gromov–Wasserstein (2019). arXiv:1905.10124
Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics, vol. 58. American Mathematical Society, Providence (2003)
Villani, C.: Optimal Transport: Old and New. Grundlehren der Mathematischen Wissenschaften, vol. 338. Springer, Berlin (2008)
Wan, Z.: A novel construction of Urysohn universal ultrametric space via the Gromov–Hausdorff ultrametric. Topology Appl. 300, # 107759 (2021)
Zarichnyi, I.: Gromov–Hausdorff ultrametric (2005). arXiv:math/0511437
Acknowledgements
F.M. and A.M. thank the Mathematisches Forschungsinstitut Oberwolfach. Conversations which eventually led to this project were initiated during the 2019 workshop “Statistical and Computational Aspects of Learning with Complex Structure”.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Editor in Charge: Kenneth Clarkson
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
F.M. and Z.W. acknowledge funding from the National Science Foundation under grants CCF 1740761, DMS 1723003, and RI 1901360. A.M. and C.W. gratefully acknowledge support by the DFG Research Training Group 2088, CRC 1456 project A04 and Cluster of Excellence MBExC 2067
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Technical Details from Sect. 2
1.1 Proofs from Sect. 2
In this section we give the proofs of various results form Sect. 2.
1.1.1 Proof of Theorem 2.2
Recall that for a given \(\theta \in {\mathcal {D}}(X)\), we define \(u_\theta :X\hspace{1.111pt}{\times }\hspace{1.111pt}X\rightarrow {\mathbb {R}}_{\geqslant 0}\) as follows:
It is easy to verify that \(u_\theta \) is an ultrametric. For any Cauchy sequence \(\{x_n\}_{n\in {\mathbb {N}}}\) in \((X,u_\theta )\), let \(D_i:= \sup _{\,m,n\geqslant i}u_\theta (x_m,x_n)\) for each \(i\in {\mathbb {N}}\). Then, each \(D_i<\infty \) and \(\lim _{\,i\rightarrow \infty }D_i=0\). By definition of \(u_\theta \), for each \(i\in {\mathbb {N}} \) the set \(\{x_n\}_{n=i}^\infty \) is contained in the block \([x_i]_{D_i}\in \theta (D_i)\). Let \(X_i:= [x_i]_{D_i}\) for each \(i\in {\mathbb {N}} \). Then, obviously we have that \(X_j\subseteq X_i\) for any \(1\leqslant i<j\). By condition (vii) in Definition 2.1, we have that \(\bigcap _{\,i\in {\mathbb {N}}}X_i\ne \text{\O }\). Choose \(x_*\!\in \bigcap _{\,i\in {\mathbb {N}}}X_i\), then it is easy to verify that \(x_*\!=\lim _{\,n\rightarrow \infty }x_n\) and thus \((X,u_\theta )\) is a complete space. To prove that \((X,u_\theta )\) is a compact space, we need to verify that for each \(t>0\), \(X_t\) is a finite space (cf. Lemma A.7). Since \(\theta (t)\) is finite by condition (vi) in Definition 2.1, we have that \(X_t=\{[x]_t\,{|}\,x\in X\}=\theta (t)\) is finite and thus X is compact. Therefore, we have proved that \(u_\theta \!\in {\mathcal {U}}(X)\). Based on this, the map \(\Upsilon _X:{\mathcal {D}}(X)\rightarrow {\mathcal {U}}(X)\) defined by \(\theta \mapsto u_\theta \) is well defined.
Now given \(u\in {\mathcal {U}}(X)\), we define a map \(\theta _u:[0,\infty )\rightarrow \textbf{Part}\hspace{0.55542pt}(X)\) as follows: for each \(t\geqslant 0\), consider the equivalence relation \(\sim _t\) with respect to u, i.e., \(x\sim _t x'\) iff \(u(x,x')\leqslant t\). This is actually the same equivalence relation defined in Sect. 2.2 for introducing quotient ultrametric spaces. We then let \(\theta _u(t)\) to be the partition induced by \(\sim _t\), i.e., \(\theta _u(t)=X_t\). It is not hard to show that \(\theta _u\) satisfies conditions (i)–(v) in Definition 2.1. Since X is compact, then \(\theta _u(t)=X_t\) is finite for each \(t>0\) and thus \(\theta _u\) satisfies condition (vi) in Definition 2.1. Now, let \(\{t_n\}_{n\in {\mathbb {N}}}\) be a decreasing sequence such that \(\lim _{\,n\rightarrow \infty }t_n=0\) and let \(X_n\in \theta _X(t_n)\) be such that for any \(1\leqslant n<m\), \(X_m\subseteq X_n\). Since each \(X_n=[x_n]_{t_n}\) for some \(x_n\in X\), \(X_n\) is a compact subset of X. Since X is also complete, we have that \(\bigcap _{\,n\in {\mathbb {N}}}X_n\ne \text{\O }\). Therefore, \(\theta _u\) satisfies condition (vii) in Definition 2.1 and thus \(\theta _u\in {\mathcal {D}}(X)\). Then, we define the map \(\Delta _X:{\mathcal {U}}(X)\rightarrow {\mathcal {D}}(X)\) by \(u\mapsto \theta _u\).
It is easy to check that \(\Delta _X\) is the inverse of \(\Upsilon _X\) and thus we have established that \(\Upsilon _X:{\mathcal {D}}(X)\rightarrow {\mathcal {U}}(X)\) is bijective.
1.1.2 Proof of Lemma 2.8
First of all, we prove that the following supremum is attained to verify that the right-hand side of (12) is well defined
Fix any \(B_0\in V(X)\backslash \{X\}\) such that \(\alpha (B_0)\ne \beta (B_0)\). Then, it is obvious that \({\textrm{diam}}\hspace{0.55542pt}(B^*_0) >0\). By Lemma A.7, \(X_{{\textrm{diam}}\hspace{0.55542pt}(B^*_0) }\) is finite. So there are only finitely many \(B\in V(X)\backslash \{X\}\) such that \({\textrm{diam}}\hspace{0.55542pt}(B) \geqslant {\textrm{diam}}\hspace{0.55542pt}(B^*_0) \) and thus \({\textrm{diam}}\hspace{0.55542pt}(B^*) \geqslant {\textrm{diam}}\hspace{0.55542pt}(B^*_0) \). This implies that the supremum above is attained and thus
Let \(B_1\) denote the maximizer in (22) and let \(\delta := {\textrm{diam}}\hspace{0.55542pt}(B_1^*) \). It is easy to see that for any \(x\in X\), \(\alpha ([x]_\delta )=\beta ([x]_\delta )\).
By Strassen’s theorem (see for example [28, Thm. 11.6.2]),
where \(A^r:= \{x\in X\,{|}\,u_X(x,A)\leqslant r\}\).
Since \(\alpha (B_1)\ne \beta (B_1)\), we assume without loss of generality that \(\alpha (B_1)>\beta (B_1)\). By definition of \(B_1^*\), it is obvious that \((B_1)^\delta =B_1^*\) (recall: \(\delta := {\textrm{diam}}\hspace{0.55542pt}(B_1^*) \)) and \((B_1)^r=B_1\) for all \(0\leqslant r<\delta \). Therefore, \(\alpha (B_1)\leqslant \beta ((B_1)^r)\) only when \(r\geqslant \delta \). By (23), this implies that \(d_{\textrm{W},\infty }(\alpha ,\beta )\geqslant \delta \). Conversely, for any closed set A, we have that \(A^\delta =\bigcup _{x\in A}[x]_\delta \). For two closed balls in ultrametric spaces, either one includes the other or they have no intersection. Therefore, there exists a subset \(S\subseteq A\) such that \([x]_\delta \cap [x']_\delta =\text{\O }\) for all \(x,x'\!\in S\) and \(x\ne x'\), and that \(A^\delta =\bigsqcup _{\,x\in S}[x]_\delta \). Then, \(\alpha (A)\leqslant \alpha (A^\delta )=\sum _{x\in S}\alpha ([x]_\delta )=\sum _{x\in S}\beta ([x]_\delta )=\beta (A^\delta )\). Hence, \(d_{\textrm{W},\infty }(\alpha ,\beta )\leqslant \delta \) and thus we conclude the proof.
1.2 Technical Details from Sect. 2
In this section, we address various technical issues from Sect. 2.
1.2.1 Synchronized Rooted Trees
A synchronized rooted tree, is a combinatorial tree \(T=(V,E)\) with a root \(o\in V\) and a height function \(h:V\rightarrow [0,\infty )\) such that \(h^{-1}(0)\) coincides with the leaf set and \(h(v)< h(v^*)\) for each \(v\in V\backslash \{o\}\), where \(v^*\) is the parent of v. Similarly as in Theorem 2.2 that there exists a correspondence between ultrametric spaces and dendrograms, an ultrametric space X uniquely determines a synchronized rooted tree \(T_X\) [46].
Given \((X,{u_{X}})\in {\mathcal {U}}\), recall from Sect. 2.3 that \(V(X):= \bigcup _{t>0}\theta _X(t)\) and that for each \(B\in V(X)\backslash \{X\}\), \(B^*\) denotes the smallest element in V(X) containing B. The existence of \(B^*\) is guaranteed by the following lemma:
Lemma A.1
Let \(X\in {\mathcal {U}}\). For each \(B\in V(X)\) such that \(B\ne X\), there exists \(B^*\!\in V(X)\) such that \(B^*\!\ne B\) and \(B^*\!\subseteq B'\) for all \(B'\!\in V(X)\) with \(B\subsetneqq B'\).
Proof
Let \(\delta := {\textrm{diam}}\hspace{0.55542pt}(B) \). Let \(x\in B\), then \(B=[x]_\delta \). By Lemma A.7, \(X_\delta \) is a finite set. Consider \(\delta ^*\!:= \min \hspace{0.88882pt}\{u_{X_\delta }([x]_\delta ,[x']_\delta )\,{|}\,[x']_\delta \ne [x]_\delta \}\). Let \(B^*\!:= [x]_{\delta ^*}\), then \(B^*\) is the smallest element in V(X) containing B under inclusion. Indeed, \(B^*\!\ne B\) and if \(B\subseteq B'\) for some \(B'\!\in V(X)\), then \(B'\!=[x]_r\) for some \(r> \delta \). It is easy to see that for all \(\delta<r<\delta ^*\), \([x]_r=[x]_\delta \). Therefore, if \(B'\!\ne B\), we must have that \(r\geqslant \delta ^*\) and thus \(B^*\!=[x]_{\delta ^*}\subseteq [x]_r=B'\).\(\square \)
Now, we construct the synchronized rooted tree \(T_X\) corresponding to X via the proper dendrogram \(\theta _X\) associated with \({u_{X}}\). We first define a combinatorial tree \(T_X=(V_X,E_X)\) as follows: we let \(V_X:= V(X)\); for any distinct \(B,B'\in V_X\), we let \((B,B')\in E_X\) iff either \(B=(B')^*\) or \(B'\!=B^*\). We choose \(X\in V_X\) to be the root of \(T_X\), then any \(B\ne X\) in \(V_X\) has a unique parent \(B^*\). We define \(h_X:V_X\rightarrow [0,\infty )\) such that \(h_X(B):= {{\textrm{diam}}\hspace{0.55542pt}(B) }/{2}\) for any \(B\in V_X\). Now, \(T_X\) endowed with the root X and the height function \(h_X\) is a synchronized rooted tree. It is easy to see that X can be isometrically identified with \(h_X^{-1}(0)\) of the so-called metric completion of \(T_X\) (see [46, Sect. 2.3] for details). With this construction Lemma 2.7 follows directly from [46, Lem. 3.1].
1.3 \(d^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )}_{\textrm{W},p}\) Between Compactly Supported Measures
Next, we demonstrate that Theorem 2.9 extends naturally to the case of compactly supported probability measures in \(({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\). For this purpose, it is important to note that compact subsets of \(({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\) have a very particular structure as shown by the next lemma.
Lemma A.2
Let \(X\subseteq \mathbb ({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\). X is a compact subset iff X is either a finite set or a countable set containing 0 and with 0 being the unique cluster point (w.r.t. the usual Euclidean distance \(\Lambda _1\)).
Proof
If X is finite, then obviously X is compact. Assume that X is a countable set with 0 being the unique cluster point (w.r.t. the usual Euclidean distance \(\Lambda _1\)). If \(\{x_n\}_{n\in {\mathbb {N}}}\subseteq X\) is a Cauchy sequence with respect to \(\Lambda _\infty \), then either \(x_n\) is a constant when n is large or \(\lim _{n\rightarrow \infty }x_n=0\). In either case, the limit of \(\{x_n\}_{n\in {\mathbb {N}}}\) belongs to X and thus X is complete. Now for any \(\varepsilon >0\), by Lemma A.7, \(X_\varepsilon \) is a finite set. Denote \(X_\varepsilon =\{[x_1]_\varepsilon ,\ldots ,[x_n]_\varepsilon \}\). Then, \(\{x_1,\ldots ,x_n\}\) is a finite \(\varepsilon \)-net of X. Therefore, X is totally bounded and thus X is compact.
Now, assume that X is compact. Then, for any \(\varepsilon >0\), \(X_\varepsilon \) is a finite set. Suppose \(X_\varepsilon =\{[x_1]_\varepsilon ,\ldots ,[x_n]_\varepsilon \}\) where \(0\leqslant x_1<x_2<\cdots <x_n\). Further, we have that \(\Lambda _\infty (x_i,x_j)=x_j\) whenever \(1\leqslant i<j\leqslant n\). This implies that
-
(i)
\(x_i>\varepsilon \) for all \(2\leqslant i\leqslant n\);
-
(ii)
\([x_i]_\varepsilon =\{x_i\}\) for all \(2\leqslant i\leqslant n\).
Therefore, \(X\cap (\varepsilon ,\infty )=\{x_2,\ldots ,x_n\}\) is a finite set. Since \(\varepsilon >0\) is arbitrary, X is at most countable and has no cluster point (w.r.t. the Euclidean distance \(\Lambda _1\)) other than 0. If X is countable, then 0 must be a cluster point and by compactness of X, we have that \(0\in X\). \(\square \)
Based on the special structure of compact subsets of \(({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\), we derive the following extension of Theorem 2.9.
Theorem A.3
(\(d^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )}_{\textrm{W},p}\) between compactly supported measures) Let \(X:= \{0\}\cup \{x_i\hspace{0.55542pt}{|}\,i\in {\mathbb {N}}\}\subseteq {\mathbb {R}}_{\geqslant 0}\) such that \(0<\ldots< x_n<x_{n-1}<\ldots <x_1\) and 0 is the only cluster point w.r.t. the usual Euclidean distance. Let \(\alpha ,\beta \in {\mathcal {P}}(X)\). Let \(\alpha _i:= \alpha (\{x_i\})\) for \(i\in {\mathbb {N}}\) and \(\alpha _0:= \alpha (\{0\})\). Similarly, let \(\beta _i:= \beta (\{x_i\})\) and \(\beta _0:= \beta (\{0\})\). Then for \(p\in [1,\infty )\),
Let \(F_\alpha \) and \(F_\beta \) be the cumulative distribution functions of \(\alpha \) and \(\beta \), respectively. Then,
Proof
Note that \(V(X)=\{\{0\}\cup \{x_j\,{|}\,j\geqslant i\}\,{|}\,i\in {\mathbb {N}}\}\cup \{\{x_i\}\,{|}\,i\in {\mathbb {N}}\}\) (recall that each set corresponds to a closed ball). Thus, we conclude by applying Lemmas 2.7 and 2.8. \(\square \)
1.3.1 Closed-Form Solution for \(d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _q)}\)
In this section, we will derive the subsequent theorem.
Theorem A.4
Given \(1\leqslant p,q <\infty \) and two compactly supported probability measures \(\alpha \) and \(\beta \) on \({\mathbb {R}}_{\geqslant 0}\), we have that
When \(q\leqslant p\), the equality holds whereas when \(q>p\), the equality does not hold in general.
One important ingredient for the proof is the following direct adaptation of [67, Lem. 1].
Lemma A.5
Let X, Y be two Polish metric spaces and let \(f:X\rightarrow {\mathbb {R}}\) and \(g:Y\rightarrow {\mathbb {R}}\) be measurable maps. Denote by \(f\hspace{1.111pt}{\times }\hspace{1.111pt}g:X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\rightarrow {\mathbb {R}}^2\) the map \((x,y)\mapsto (f(x),g(y))\). Then, for any \({\mu _{Y}}\in {\mathcal {P}}(X)\) and \({\mu _{Y}}\in {\mathcal {P}}(Y)\)
Based on Lemma A.5, we show the following auxiliary result.
Lemma A.6
Let \(1\leqslant q\leqslant p<\infty \). Assume that \(\alpha \) and \(\beta \) are compactly supported probability measures on \({\mathbb {R}}_{\geqslant 0}\). Then,
where \(S_q:{\mathbb {R}}_{\geqslant 0}\rightarrow {\mathbb {R}}_{\geqslant 0}\) taking x to \(x^q\) is the q-snowflake transform defined in Sect. 3.3.
Proof
Since \({p}/{q}\geqslant 1\) and by Lemma A.5 we have that
\(\square \)
With Lemma A.6 at our disposal, we can demonstrate Theorem A.4.
Proof of Theorem A.4
We first note that
where \(\xi \) and \(\eta \) are two random variables with marginal distributions \(\alpha \) and \(\beta \), respectively. Moreover, let \(\zeta \) be the random variable uniformly distributed on [0, 1], then \(F_\alpha ^{-1}(\zeta )\) has distribution function \(F_\alpha \) and \(F_\beta ^{-1}(\zeta )\) has distribution function \(F_\beta \) (see for example [88]). Let \(\xi =F_\alpha ^{-1}(\zeta )\) and \(\eta =F_\beta ^{-1}(\zeta )\), then we have
Next, we assume that \(q\leqslant p\). By Lemma A.6, we have that
Then,
where \(F_{\alpha ,q}\) and \(F_{\beta ,q}\) are distribution functions of \((S_q)_\#\,\alpha \) and \((S_q)_\#\,\beta \), respectively. It is easy to verify that \(F_{\alpha ,q}(t)=(F_\alpha ^{-1}(t))^q\) and \(F_{\beta ,q}(t)=(F_\beta ^{-1}(t))^q\). Therefore,
Finally, we demonstrate that for \(q>p\) the equality does not hold in general. We first consider the extreme case \(p=1\) and \(q=\infty \) (though we require \(q<\infty \) in the assumptions of the theorem, we relax this for now). Let \(\alpha _0= \delta _1/2+ \delta _2/2\) and \(\beta _0 = \delta _2/2+\delta _3/2\) where \(\delta _x\) means the Dirac measure at point \(x\in {\mathbb {R}}_{\geqslant 0}\). Then, we have that
It is not hard to see that both \(d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _q)}(\alpha _0,\beta _0)\) and
are continuous with respect to \(p\in [1,\infty )\) and \(q\in [1,\infty ]\). Then, for p close to 1 and \(q<\infty \) large enough, and in particular, \(p<q\), we have that
\(\square \)
1.3.2 Miscellaneous
In the remainder of this section, we collect several technical results that find implicit or explicit usage throughout Sect. 2.
Lemma A.7
A complete ultrametric space X is compact iff for any \(t>0\), \(X_t\) is finite.
Proof
Wan [92, Lem. 2.3] proves that whenever X is compact, \(X_t\) is finite for any \(t>0\).
Conversely, we assume that \(X_t\) is finite for any \(t>0\). We only need to prove that X is totally bounded. For any \(\varepsilon >0\), \(X_\varepsilon \) is a finite set and thus there exist \(x_1,\ldots ,x_n\in X\) such that \(X_\varepsilon =\{[x_1]_\varepsilon ,\ldots ,[x_n]_\varepsilon \}\). Now, for any \(x\in X\), there exists \(x_i\) for some \(i=1,\ldots ,n\) such that \(x\in [x_i]_\varepsilon \). This implies that \(u_X(x,x_i)\leqslant \varepsilon \). Therefore, the set \(\{x_1,\ldots ,x_n\}\subseteq X\) is an \(\varepsilon \)-net of X. Then, X is totally bounded and thus compact.\(\square \)
Lemma A.8
V(X) is the collection of all closed balls in X except for singletons \(\{x\}\) such that x is a cluster point in X.
Proof
Given any \(t>0\) and \(x\in X\), \([x]_t=B_t(x)=\{x'\!\in X\,{|}\, u_X(x,x')\leqslant t\}\). Therefore, V(X) is a collection of closed balls in X. On the contrary, any closed ball \(B_t(x)\) with positive radius \(t>0\) coincides with \([x]_t\in \theta _X(t)\) and thus belongs to V(X). Now, for any singleton \(\{x\}=B_0(x)\), if x is not a cluster point, then there exists \(t>0\) such that \(B_t(x)=\{x\}\) which implies that \(\{x\}\in V(X)\). If x is a cluster point, then for any \(t>0\), \(\{x\}\subsetneqq B_t(x)=[x]_t\). This implies that \(\{x\}\ne [x]_t\) for all \(t>0\) and thus \(\{x\}\notin V(X)\). This concludes the proof.\(\square \)
Technical Details from Sect. 3
1.1 Proofs from Sect. 3.1
Next, we give the missing proofs of the results stated in Sect. 3.1.
1.1.1 Proof of Proposition 3.3
Part 1. This directly follows from the definitions of \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\) and \(d_{\textrm{GW},p}^{\mathrm{\,sturm}}\) (see (8) and (4)).
Part 2. This simply follows from Jensen’s inequality.
Part 3. By Part 2, \(\{u_{\textrm{GW},n}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\}_{n\in {\mathbb {N}}}\) is an increasing sequence with a finite upper bound \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\). Therefore, \(L:= \lim _{\,n\rightarrow \infty }u_{\textrm{GW},n}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\) exists and \(L\leqslant u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\).
Next, we come to the opposite inequality. By Proposition B.1, there exist \(u_n\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(\mu _n\in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) such that
By Lemmas B.19 and B.21, the sequence \(\{u_n\}_{n\in {\mathbb {N}}}\) uniformly converges to some \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(\{\mu _n\}_{n\in {\mathbb {N}}}\) weakly converges to some \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) (after taking appropriate subsequences of both sequences). Let
Let \(\varepsilon >0\) and let \(U=\{(x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\,{|}\,u(x,y)> M-\varepsilon \}\). Then, \(\mu (U)>0\). Since U is open, it follows that there exists a small \(\varepsilon _1>0\) such that \(\mu _n(U)>\mu (U)-\varepsilon _1>0\) for all n large enough (see e.g. [7, Thm. 2.1]). Moreover, by uniform convergence of the sequence \(\{u_n\}_{n\in {\mathbb {N}}}\), we have \(|u(x,y)-u_n(x,y)|\leqslant \varepsilon \) for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) when n is large enough. Therefore, we obtain for n large enough
Letting \(n\rightarrow \infty \), we obtain \(L\geqslant M-2\varepsilon \). Since \(\varepsilon >0\) is arbitrary, \(L\geqslant M\geqslant u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\).
1.1.2 Proof of Theorem 3.4
In this section, we devote to prove Theorem 3.4. To this end, we will first verify the existence of optimal metrics and optimal couplings in (15).
Proposition B.1
(Existence of optimal couplings) Let \({\mathcal {X}}={(X,{u_{X}},{\mu _{X}}) }\) and \({\mathcal {Y}}={(Y,{u_{Y}},{\mu _{Y}}) }\) be compact ultrametric measure spaces. Then, there always exist \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) such that for \(1\leqslant p\leqslant \infty \),
Proof
The following proof is a suitable adaptation from proof of [83, Lem. 3.3]. We will only prove the claim for the case \(p<\infty \) since the case \(p=\infty \) can be shown in a similar manner. Let \(u_n\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(\mu _n\in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) be such that
By Lemma B.19, \(\{\mu _n\}_{n\in {\mathbb {N}}}\) weakly converges (after taking an appropriate subsequence) to some \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\). By Lemma B.21, \(\{u_n\}_{n\in {\mathbb {N}}}\) uniformly converges (after taking an appropriate subsequence) to some \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\). Then, it is easy to verify that
\(\square \)
As a direct consequence of the proposition, we get the subsequent result.
Corollary B.2
Fix \(1\leqslant p\leqslant \infty \). Let \({\mathcal {X}}={(X,{u_{X}},{\mu _{X}}) }\) and \({\mathcal {Y}}={(Y,{u_{Y}},{\mu _{Y}}) }\) be compact ultrametric measure spaces. Then, there exist a compact ultrametric space Z and isometric embeddings \(\phi :X\hookrightarrow Z\) and \(\psi :Y\hookrightarrow Z\) such that
Before we come to the proof of Theorem 3.4, it remains to establish another auxiliary result. We ensure that the Wasserstein pseudometric of order p on a compact pseudo-ultrametric space \((X,u_X)\) is for \(p\in [1,\infty )\) a p-pseudometric and for \(p=\infty \) a pseudo-ultrametric, i.e., we prove for \(1\leqslant p<\infty \) that for all \(\alpha _1,\alpha _2,\alpha _3\in {\mathcal {P}}(X)\),
and for \(p=\infty \) that for all \(\alpha _1,\alpha _2,\alpha _3\in {\mathcal {P}}(X)\)
Lemma B.3
Let \((X,{u_{X}})\) be a compact pseudo-ultrametric space. Then, for \(1\leqslant p\leqslant \infty \) the p-Wasserstein metric \(d_{\textrm{W},p}^{\,(X,{u_{X}})}\) is a p-pseudometric on \({\mathcal {P}}(X)\). In particular, when \(p=\infty \), it is a pseudo-ultrametric on \({\mathcal {P}}(X)\).
Proof
We prove the statement by adapting the proof of the triangle inequality for the p-Wasserstein distance (see e.g., [90, Thm. 7.3]). We only prove the case when \(p<\infty \) whereas the case \(p=\infty \) follows by analogous arguments.
Let \(\alpha _1,\alpha _2,\alpha _3\in {\mathcal {P}}(X)\), denote by \(\mu _{12}\) an optimal transport plan between \(\alpha _1\) and \(\alpha _2\) and by \(\mu _{23}\) an optimal transport plan between \(\alpha _2\) and \(\alpha _3\) (see [91, Thm. 4.1] for the existence of \(\mu _{12}\) and \(\mu _{23}\)). Furthermore, let \(X_i\) be the support of \(\alpha _i\), \(1\leqslant i \leqslant 3\). Then, by the Gluing Lemma [90, Lem. 7.6] there exists a measure \(\mu \in {\mathcal {P}}(X_1\hspace{0.55542pt}{\times }\hspace{1.111pt}X_2\hspace{0.55542pt}{\times }\hspace{1.111pt}X_3)\) with marginals \(\mu _{12}\) on \(X_1\hspace{0.55542pt}{\times }\hspace{1.111pt}X_2\) and \(\mu _{23}\) on \(X_2\hspace{0.55542pt}{\times }\hspace{1.111pt}X_3\). Clearly, we obtain
Here, we used that \({u_{X}}\) is an ultrametric, i.e., in particular a p-metric [64, Prop. 2.11]. With this we obtain that
\(\square \)
With Proposition B.1 and Lemma B.3 at our disposal we are now ready to prove Theorem 3.4 which states that \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\) is indeed a p-metric on \({\mathcal {U}}^{\textrm{w}}\).
Proof of Theorem 3.4
It is clear that \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\) is symmetric and that \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}}) =0\) if \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\). Furthermore, we remark that \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\geqslant d_{\textrm{GW},p}^{\,\textrm{sturm}}({\mathcal {X}},{\mathcal {Y}})\) by Proposition 3.3. Since \(d_{\textrm{GW},p}^{\,\textrm{sturm}}({\mathcal {X}},{\mathcal {Y}})=0\) implies that \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\) ([84]), we have that \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})=0\) implies that \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\). It remains to verify the p-triangle inequality. To this end, we only prove the case when \(p<\infty \) whereas the case \(p=\infty \) follows by analogous arguments.
Let \({\mathcal {X}},{\mathcal {Y}},{\mathcal {Z}}\in {\mathcal {U}}^{\textrm{w}}\). Suppose \(u_{XY}\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(u_{YZ}\in {\mathcal {D}}^{\textrm{ult}}({u_{Y}},{u_{Z}})\) are optimal metric couplings such that
Further, define \(u_{XYZ}\) on \(X\sqcup Y\sqcup Z\) as
Then, by [93, Lem. 1.1] \(u_{XYZ}\) is a pseudo-ultrametric on \(X\sqcup Y\sqcup Z\) that coincides with \(u_{XY}\) on \(X\sqcup Y\) and with \(u_{YZ}\) on \(Y\sqcup Z\). Thus by Lemma B.3 we obtain that
This gives the claim for \(p<\infty \). \(\square \)
1.1.3 Proof of Theorem 3.7
In order to proof Theorem 3.7, we will first establish the statement for finite ultrametric measure spaces. For this purpose, we need to introduce some notation. Given \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\), let \({\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\) denote the collection of all admissible pseudo-ultrametrics on \(X\sqcup Y\), where \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) is called admissible, if there exists no \(u^*\!\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) such that \(u^*\ne u\) and \(u^*(x,y)\leqslant u(x,y)\) for all \(x,y\in X\sqcup Y\).
Lemma B.4
For any \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\), \({\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\ne \text{\O }\). Moreover,
Proof
If \(\{u_n\}_{n\in {\mathbb {N}}}\subseteq {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) is a decreasing sequence (with respect to pointwise inequality), it is easy to verify that \(u:= \inf _{\,n\in {\mathbb {N}}}u_n\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and thus u is a lower bound of \(\{u_n\}_{n\in {\mathbb {N}}}\). Then, by Zorn’s lemma \({\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\ne \text{\O }\). Therefore, we obtain the claim.\(\square \)
Combined with Example 3.6, the following result implies that each \(u\in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\) gives rise to an element in \({\mathcal {A}}\).
Lemma B.5
Given finite spaces \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\), for each \(u\in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\), \(u^{-1}(0)\ne \text{\O }\).
Proof
Assume otherwise that \(u^{-1}(0)=\text{\O }\). Let \((x_0,y_0)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) be such that \(u(x_0,y_0)=\min _{x\in X,y\in Y}u(x,y)\). The existence of \((x_0,y_0)\) is due to the finiteness of X and Y. We define \( u_{(x_0,y_0)}:X\sqcup Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\sqcup Y\rightarrow {\mathbb {R}}_{\geqslant 0}\) as follows:
-
(i)
\( u_{(x_0,y_0)}|_{X\times X}:= u_X\) and \( u_{(x_0,y_0)}|_{Y\times Y}:= u_Y\).
-
(ii)
For \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\),
$$\begin{aligned} u_{(x_0,y_0)}(x,y):= \min \hspace{1.111pt}(u(x,y),\max \hspace{0.55542pt}(u_X(x,x_0),u_Y(y,y_0))). \end{aligned}$$ -
(iii)
For any \((y,x)\in Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\), \( u_{(x_0,y_0)}(y,x):= u_{(x_0,y_0)}(x,y)\).
It is easy to verify that \(u_{(x_0,y_0)}\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\). Further, it is obvious that \(u_{(x_0,y_0)}(x_0,y_0)=0<u(x_0,y_0)\) and that \(u_{(x_0,y_0)}(x,y)\leqslant u(x,y)\) for all \(x,y\in X\sqcup Y\) which contradicts with \(u\in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\). Therefore, \(u^{-1}(0)\ne \text{\O }\).\(\square \)
Theorem B.6
Let \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\) be finite spaces. Then, we have for each \(p\in [1,\infty )\) that
Proof
By Lemma B.4 suffices to prove that \(u\in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\) induces \((A,\varphi )\in {\mathcal {A}}\) such that
Let \(u\in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\). Define \(A_0:= \{x\in X\,{|}\,\exists \, y\in Y \text { such that }u(x,y)=0\}\) (\(A_0\ne \text{\O }\) by Lemma B.5). By Example 3.6, the map \(\varphi _0:A_0\rightarrow Y\) taking x to y such that \(u(x,y)=0\) is a well-defined isometric embedding. This means in particular that \((A_0,\varphi _0)\in {\mathcal {A}}\).
If \(u(x,y)\geqslant u_{Z_{A_0}}(\phi ^X_{(A_0,\varphi _0)}(x),\psi ^Y_{(A_0,\varphi _0)}(y))\) holds for all \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), then we set \(A:= A_0\) and \(\varphi := \varphi _0\). This gives
Otherwise, there exists \((x,y)\in X\backslash A_0\hspace{1.111pt}{\times }\hspace{1.111pt}Y\backslash \varphi _0(A_0)\) such that
(if \(x\in A_0\) or \(y\in \varphi _0(A_0)\), then \(u(x,y)\geqslant u_{Z_{A_0}}\bigl (\phi ^X_{(A_0,\varphi _0)}(x),\psi ^Y_{(A_0,\varphi _0)}(y)\bigr )\) must hold). Let \((x_1,y_1)\in X\backslash A_0\hspace{0.55542pt}{\times }\hspace{1.111pt}Y\backslash \varphi _0(A_0)\) be such that
The existence of \((x_1,y_1)\) follows from finiteness of X and Y. It is easy to check that \(\varphi _0\) extends to an isometry from \(A_0\cup \{x_1\}\) to \(\varphi _0(A_0)\cup \{y_1\}\) by taking \(x_1\) to \(y_1\). We denote the new isometry \(\varphi _1\) and set \(A_1:= A_0\cup \{x_1\}\). If for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), we have that \(u(x,y)\geqslant u_{Z_{A_1}}\!(\phi ^X_{(A_1,\varphi _1)}(x),\psi ^Y_{(A_1,\varphi _1)}(y))\), then we define \(A:= A_1\) and \(\varphi := \varphi _1\). Otherwise, we continue the process to obtain \(A_2, A_3,\dots \). This process will eventually stop since we are considering finite spaces. Suppose the process stops at \(A_n\), then \(A:= A_n\) and \(\varphi := \varphi _n\) satisfy that \(u(x,y)\geqslant u_{Z_{A}}(\phi ^X_{(A,\varphi )}(x),\psi ^Y_{(A,\varphi )}(y))\) for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\). Therefore,
Since \(u\in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\) is arbitrary, this gives the claim.\(\square \)
As a direct consequence of Theorem B.6, we obtain that it is sufficient, as claimed in Remark 3.8, for finite spaces to infimize in (24) over the collection of all maximal pairs \({\mathcal {A}}^*\!\subseteq {\mathcal {A}}\). Recall that a pair \((A,\varphi _1)\in {\mathcal {A}}\) is denoted as maximal, if for all pairs \((B,\varphi _2)\in {\mathcal {A}}\) with \(A\subseteq B\) and \(\varphi _2|_A\!=\varphi _1\) it holds \(A=B\).
Corollary B.7
Let \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\) be finite spaces. Then, we have for each \(p\in [1,\infty ]\) that
By proving Theorem B.6, we have verified Theorem 3.7 for finite ultrametric measure spaces. Then, we will use Theorem B.6 and weighted quotients to demonstrate Theorem 3.7. However, before we come to this, we need to establish the following two auxiliary results.
Lemma B.8
Let \(X\in {\mathcal {U}}\) be a compact ultrametric space. Let \(t>0\) and let \(p\in [1,\infty )\). Then, for any \(\alpha ,\beta \in {\mathcal {P}}(X)\), we have that
where \(\alpha _t\) is the push forward of \(\alpha \) under the canonical quotient map \(Q_t:X\rightarrow X_t\) taking \(x\in X\) to \([x]_t\in X_t\).
Proof
For any \(\mu _t\in {\mathcal {C}}(\alpha _t,\beta _t)\), it is easy to see that there exists \(\mu \in {\mathcal {C}}(\alpha ,\beta )\) such that \(\mu _t=( Q_t\hspace{0.55542pt}{\times }\hspace{1.111pt}Q_t)_\#\,\mu \) where \(Q_t\hspace{0.55542pt}{\times }\hspace{1.111pt}Q_t:X\hspace{1.111pt}{\times }\hspace{1.111pt}X\rightarrow X_t\hspace{0.55542pt}{\times }\hspace{1.111pt}X_t\) maps \((x,x')\in X\hspace{1.111pt}{\times }\hspace{1.111pt}X\) to \(([x]_t,[x']_t)\). For example, suppose \(X_t=\{[x_1]_t,\ldots ,[x_n]_t\}\), then one can let
where \(\alpha |_{[x_i]_t}\) is the restriction of \(\alpha \) on \([x_i]_t\).
For any \(x,x'\!\in X\), we have that \(( u_X(x,x'))^p\leqslant ( u_{X_t}([x]_t,[x']_t))^p+t^p\). Then
Infimizing over all \(\mu _t\in {\mathcal {C}}(\alpha _t,\beta _t)\), we obtain the claim.\(\square \)
Lemma B.9
Let \({\mathcal {X}}\in {\mathcal {U}}^{\textrm{w}}\) and let \(p\in [1,\infty ]\). Then, for any \(t>0\), we have that \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}}_t,{\mathcal {X}})\leqslant t\). In particular, \(\lim _{\,t\rightarrow 0}u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}}_t,{\mathcal {X}})=0\).
Proof
It is obvious that \(({\mathcal {X}}_t)_t\cong _{\textrm{w}}{\mathcal {X}}_t\). Hence, it holds by Theorem 3.14 that \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}}_t,{\mathcal {X}})\leqslant t\). By Proposition 3.3 we have that for any \(p\in [1,\infty ]\),
\(\square \)
With Lemmas B.8 and B.9 available, we can come to the proof of Theorem 3.7.
Proof of Theorem 3.7
It follows from the definition of \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\) (see (8)) that
Hence, we focus on proving the opposite inequality. Given any \(t>0\), by Lemma A.7, both \({\mathcal {X}}_t\) and \({\mathcal {Y}}_t\) are finite spaces. By Theorem B.6 we have that
where \({\mathcal {A}}_t:= \{(A_t,\varphi _t)\mid \text{\O }\ne A_t\subseteq X_t \text { is closed and } \varphi _t:A_t\hookrightarrow Y_t \text { is an} \text {isometricembedding } \}\).
For any \((A_t,\varphi _t)\in {\mathcal {A}}_t\), assume that \(A_t=\{[x_1]_t^X\!,\ldots ,[x_n]_t^X\}\) and that \(\varphi _t([x_i]_t)=[y_i]_t\in Y_t\) for all \(i=1,\ldots ,n\). Let \(A:= \{x_1,\ldots ,x_n\}\). Then, the map \(\varphi :A\rightarrow Y\) defined by \(x_i\mapsto y_i\) for \(i=1,\ldots ,n\) is an isometric embedding. Therefore, \((A,\varphi )\in {\mathcal {A}}\).
Claim 1
\(((Z_A)_t,u_{(Z_A)_t})\cong ( Z_{A_t},u_{Z_{A_t}})\).
Proof of Claim 1
We define a map \(\Psi :(Z_A)_t\rightarrow Z_{A_t}\) by \([x]_t^{Z_A}\!\mapsto [x]_t^X\) for \(x\in X\) and \([y]_t^{Z_A}\!\mapsto [y]_t^Y\) for \(y\in Y\backslash \varphi (A)\). We first show that \(\Psi \) is well defined. For any \(x'\!\in X\), if \(u_{Z_A}(x,x')\leqslant t\), then obviously we have that \(u_X(x,x')=u_{Z_A}(x,x')\leqslant t\) and thus \([x]_t^X\!=[x']_t^X\). Now, assume that there exists \(y\in Y\backslash \varphi (A)\) such that \(u_{Z_A}(x,y)\leqslant t\), i.e., \([x]_t^{Z_A}\!=[y]_t^{Z_A}\). Then, by finiteness of A and definition of \(Z_A\), there exists \(x_i\in A\) such that \(u_{Z_A}(x,y)=\max \hspace{0.55542pt}( u_X(x,x_i),u_Y(\varphi (x_i),y))\leqslant t\). This gives that
However, this happens only if \(u_{Z_{A_t}}\!([x]_t^X\!,[y]_t^Y)=0\), that is, \([x]_t^X\) is identified with \([y]_t^Y\) under the map \(\varphi _t\). Therefore, \(\Psi \) is well defined. It is easy to see from the definition that \(\Psi \) is surjective. Thus, it suffices to show that \(\Psi \) is an isometric embedding to finish the proof. For any \(x,x'\!\in X\) such that \(u_X(x,x')>t\), we have that
Similarly, for any \(y,y'\!\in Y\backslash \varphi (A)\) such that \(u_Y(y,y')>t\), we have that
Now, consider \(x\in X\) and \(y\in Y\backslash \varphi (A)\). Assume that \(u_{Z_A}\!(x,y)>t\) (otherwise \([x]_t^{Z_A}\!=[y]_t^{Z_A}\)). Then, we have that
This implies that
Therefore, \(\Psi \) is an isometric embedding and thus we conclude the proof. \(\square \)
By Lemma B.8 we have that
Therefore,
Notice that the last inequality already holds when we only consider \((A,\varphi )\) corresponding to \((A_t,\varphi _t)\in {\mathcal {A}}_t\). By Lemma B.9, we have that
which concludes the proof. \(\square \)
1.2 Proofs from Sect. 3.2
In this section, we give the complete proofs of the results stated in Sect. 3.2.
1.2.1 Proof of Proposition 3.10
Part 1. This follows directly from the definitions of \(u_{\textrm{GW},p}\) and \(d_{\textrm{GW},p}\) (see (11) and (5)).
Part 2. By Jensen’s inequality we have that \({\textrm{dis}}^{\textrm{ult}}_p(\mu )\leqslant {\textrm{dis}}^{\textrm{ult}}_q(\mu )\) for any \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\). Therefore, \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})\leqslant u_{\textrm{GW},q}({\mathcal {X}},{\mathcal {Y}}) \).
Part 3. By Part 2 we know that \(\{u_{\textrm{GW},n}({\mathcal {X}},{\mathcal {Y}})\}_{n\in {\mathbb {N}}}\) is an increasing sequence with a finite upper bound \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\). Therefore, \(L:= \lim _{\,n\rightarrow \infty }u_{\textrm{GW},n}({\mathcal {X}},{\mathcal {Y}})\) exists and it holds \(L\leqslant u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\).
To prove the opposite inequality, by Proposition B.10, there exists for each \(n\in {\mathbb {N}}\), \(\mu _n\in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) such that
By Lemma B.19, \(\{\mu _n\}_{n\in {\mathbb {N}}}\) weakly converges (after taking an appropriate subsequence) to some \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\). Let
and for a given \(\varepsilon >0\) let
Then, we have \(\mu \hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu (U)>0\). As \(\mu _n\) weakly converges to \(\mu \), we have that \(\mu _n\hspace{0.55542pt}{\otimes }\hspace{1.111pt}\mu _n\) weakly converges to \(\mu \hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu \). Since U is open, there exists a small \(\varepsilon _1>0\) such that \(\mu _n\hspace{0.55542pt}{\otimes }\hspace{1.111pt}\mu _n(U)>\mu \hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu (U)-\varepsilon _1>0\) for n large enough (see e.g. [7, Thm. 2.1]). Therefore,
Letting \(n\rightarrow \infty \), we obtain \(L\geqslant M-\varepsilon \). Since \(\varepsilon >0\) is arbitrary, we obtain \(L\geqslant M\geqslant u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\).
1.2.2 Proof of Theorem 3.11
One main step to verify Theorem 3.11 is to demonstrate the existence of optimal couplings.
Proposition B.10
Let \({\mathcal {X}}={(X,{u_{X}},{\mu _{X}}) }\) and \({\mathcal {Y}}={(Y,{u_{Y}},{\mu _{Y}}) }\) be compact ultrametric measure spaces. Then, for any \(p\in [1,\infty ]\), there always exists an optimal coupling \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) such that \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})=\mathrm{dis\hspace{0.55542pt}}_p^{\textrm{ult}}(\mu )\).
Proof
We will only prove the claim for the case \(p<\infty \) since the case \(p=\infty \) can be proven in a similar manner. Let \(\mu _n\in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) be such that
By Lemma B.19, \(\{\mu _n\}_{n\in {\mathbb {N}}}\) weakly converges to some \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) (after taking an appropriate subsequence). Then, by the boundedness and continuity of \(\Lambda _\infty ({u_{X}},{u_{Y}})\) on \(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) (cf. Lemma B.22) as well as the weak convergence of \(\mu _n\hspace{0.55542pt}{\otimes }\hspace{1.111pt}\mu _n\), we have that
Hence, \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})=\mathrm{dis\hspace{0.55542pt}}_p^{\textrm{ult}}(\mu )\).\(\square \)
Based on Proposition B.10, it is straightforward to prove Theorem 3.11.
Proof of Theorem 3.11
It is clear that \(u_{\textrm{GW},p}\) is symmetric and that \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}}) =0\) if \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\). Furthermore, we remark that \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})\geqslant d_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})\) by Proposition 3.10. Since \(d_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})=0\) implies that \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\) (see [60]), we have that \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})=0\) implies that \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\). It remains to verify the p-triangle inequality. To this end, we only prove the case when \(p<\infty \) whereas the case \(p=\infty \) follows by analogous arguments.
Now let \({\mathcal {X}},{\mathcal {Y}},{\mathcal {Z}}\) be three ultrametric measure spaces. Let \(\mu _{XY}\in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) and \(\mu _{YZ}\in {\mathcal {C}}({\mu _{Y}},\mu _Z)\) be optimal (cf. Proposition B.10). By the Gluing Lemma [90, Lem. 7.6], there exists a measure \(\mu _{XYZ}\in {\mathcal {P}}(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}Z)\) with marginals \(\mu _{XY}\) on \(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) and \(\mu _{YZ}\) on \(Y\hspace{1.111pt}{\times }\hspace{1.111pt}Z\). Further, we define \(\mu _{XZ}=(\pi _{XZ})_\#\,\mu \in {\mathcal {P}}(X\hspace{1.111pt}{\times }\hspace{1.111pt}Z)\), where \(\pi _{XZ}\) denotes the canonical projection \(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}Z\rightarrow X\hspace{1.111pt}{\times }\hspace{1.111pt}Z\). Then
where the second inequality follows from the fact that \(\Lambda _\infty \) in an ultrametric on \({\mathbb {R}}_{\geqslant 0}\) (cf. [64, Exam. 2.7]) and the observation that an ultrametric is automatically a p-metric for any \(p\in [1,\infty ]\) [64, Prop. 2.11]. \(\square \)
1.2.3 Proof of Theorem 3.14
We first prove that
and then show that the infimum is attainable.
Since \({\mathcal {X}}_0\cong _{\textrm{w}} {\mathcal {X}}\) and \({\mathcal {Y}}_0\cong _{\textrm{w}} {\mathcal {Y}}\), if \({\mathcal {X}}_0\cong _{\textrm{w}}{\mathcal {Y}}_0\), then \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\) and thus by Theorem 3.11
Now, assume that for some \(t>0\), \({\mathcal {X}}_t\cong _{\textrm{w}} {\mathcal {Y}}_t\). By Lemma A.7, for some \(n\in {\mathbb {N}}\) we can write \({X}_t=\{[x_1]_t,\dots ,[x_n]_t\}\) and \({Y}_t=\{[y_1]_t,\dots ,[y_n]_t\}\) such that \(u_{X_t}([x_i]_t,[x_j]_t)=u_{Y_t}([y_i]_t,[y_j]_t)\) and \({\mu _{X}}([x_i]_t)={\mu _{Y}}([y_i]_t)\). Let \({\mu _{X}}^i:= {\mu _{X}}|_{[x_i]_t}\) and \({\mu _{Y}}^i:= {\mu _{Y}}|_{[y_i]_t}\) for all \(i=1,\dots ,n\). Let \(\mu := \sum _{i=1}^n{\mu _{X}}^i\hspace{0.55542pt}{\otimes }\hspace{1.111pt}{\mu _{Y}}^i\). It is easy to check that \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) and \(\textrm{supp}\hspace{0.55542pt}(\mu )=\bigcup _{i=1}^n[x_i]_t\hspace{0.55542pt}{\times }\hspace{1.111pt}[y_i]_t\). Assume \((x,y)\in [x_i]_t\hspace{0.55542pt}{\times }\hspace{1.111pt}[y_i]_t\) and \((x'\!,y')\in [x_j]_t\hspace{0.55542pt}{\times }\hspace{1.111pt}[y_j]_t\). If \(i\ne j\), then \(u_{X_t}([x_i]_t,[x_j]_t)=u_{Y_t}([y_i]_t,[y_j]_t)\) and thus
If \(i=j\), then \({u_{X}}(x,x'),{u_{Y}}(y,y')\leqslant t\) and thus \(\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\leqslant t\). In either case, we have that
Therefore, \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\leqslant \inf \hspace{1.111pt}\lbrace t\geqslant 0 \,{|}\,{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \).
Conversely, suppose \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) and let
By [60, Lem. 2.2], we know that \(\textrm{supp}\hspace{0.55542pt}(\mu )\) is a correspondence between X and Y. We define a map \(f_t:X_t\rightarrow Y_t\) by taking \([x]_t^X\!\in X_t\) to \([y]_t^Y\!\in Y_t\) such that \((x,y)\in \textrm{supp}\hspace{0.55542pt}(\mu )\). It is easy to check that \(f_t\) is well defined and moreover \(f_t\) is an isometry (see for example the proof of [64, Thm. 5.1]). Next, we prove that \(f_t\) is actually an isomorphism between \({\mathcal {X}}_t\) and \({\mathcal {Y}}_t\). For any \([x]^X_t\in X_t\), let \(y\in Y\) be such that \((x,y)\in {{\textrm{supp}}\hspace{0.55542pt}(\mu ) }\) (in this case, \([y]^Y_t\!=f_t([x]^X_t)\)). If there exists \((x'\!,y')\in \textrm{supp}\hspace{0.55542pt}(\mu )\) such that \(x'\!\in [x]^X_t\) and \(y'\!\not \in [y]^Y_t\), then \(\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))={u_{Y}}(y,y')>t\), which is impossible. Consequently, \(\mu ([x]^X_t\hspace{1.111pt}{\times }\hspace{1.111pt}(Y\backslash [y]^Y_t))=0\) and similarly, \(\mu ((X\backslash [x]^X_t)\hspace{0.55542pt}{\times }\hspace{1.111pt}[y]^Y_t)=0\). This yields that
Therefore, \(f_t\) is an isomorphism between \({\mathcal {X}}_t\) and \({\mathcal {Y}}_t\). Hence, we have that \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\geqslant \inf \hspace{1.111pt}\lbrace t\geqslant 0 \,{|}\,{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \) and hence \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})=\inf \hspace{1.111pt}\lbrace t\geqslant 0 \,{|}{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \).
Finally, we show that the infimum of \(\inf \hspace{1.111pt}\lbrace t\geqslant 0 \,{|}\,{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \) is attainable. Let \(\delta := \inf \lbrace t\geqslant 0 \,{|}\,{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \). If \(\delta >0\), let \(\{t_n\}_{n\in {\mathbb {N}}}\) be a decreasing sequence converging to \(\delta \) such that \({\mathcal {X}}_{t_n}\!\cong _{\textrm{w}} {\mathcal {Y}}_{t_n}\) for all \(t_n\). Since \({\mathcal {X}}_\delta \) and \({\mathcal {Y}}_\delta \) are finite, \({\mathcal {X}}_{t_n}\!={\mathcal {X}}_{\delta }\) and \({\mathcal {Y}}_{t_n}\!={\mathcal {Y}}_{\delta }\) when n is large enough. This immediately implies that \({\mathcal {X}}_\delta \cong _{\textrm{w}} {\mathcal {Y}}_\delta \). Now, if \(\delta =0\), then by (26) we have that \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})=\delta =0\). By Theorem 3.11, \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\). This is equivalent to \({\mathcal {X}}_\delta \cong _{\textrm{w}}{\mathcal {Y}}_\delta \). Therefore, the infimum of \(\inf \hspace{0.55542pt}\lbrace t\geqslant 0 \,{|}\,{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \) is always attainable.
1.2.4 Proof of Theorem 3.18
An important observation for the proof of Theorem 3.18 is that the snowflake transform relates the p-Wasserstein pseudometric on a pseudo-ultrametric space X with the 1-Wasserstein pseudometric on the space \(S_p(X)\), \(1\leqslant p<\infty \).
Lemma B.11
Given a pseudo-ultrametric space \((X,{u_{X}})\) and \(p\geqslant 1\), we have for any \(\alpha ,\beta \in {\mathcal {P}}(X)\) that \(d_{\textrm{W},p}^{\,(X,{u_{X}})}(\alpha ,\beta )=(d_{\textrm{W},1}^{\,S_p(X)}(\alpha ,\beta ))^{1/p}\).
Remark B.12
Since \(S_p\hspace{0.55542pt}{\circ }\hspace{1.111pt}{u_{X}}\) and \({u_{X}}\) induce the same topology and thus the same Borel sets on X, \({\mathcal {P}}(X)={\mathcal {P}}(S_p(X))\) and thus the expression \(d_{\textrm{W},1}^{\,S_p(X)}(\alpha ,\beta )\) in the lemma is well defined.
Proof of Lemma B.11
Suppose \(\mu _1,\mu _2\in {\mathcal {C}}(\alpha ,\beta )\) are optimal for \(d_{\textrm{W},p}^X(\alpha ,\beta )\) and \(d_{\textrm{W},1}^{\,S_p(X)}(\alpha ,\beta )\), respectively (see Sect. B.5.1 for the existence of \(\mu _1\) and \(\mu _2\)). Then,
and
Therefore, \(d_{\textrm{W},p}^{\,(X,{u_{X}})}(\alpha ,\beta )=(d_{\textrm{W},1}^{\,S_p(X)}(\alpha ,\beta ))^{1/p}\). \(\square \)
With Lemma B.11 at our disposal we can prove Theorem 3.18.
Proof of Theorem 3.18
Let \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\). Then,
By infimizing over \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) on both sides, we obtain that \((u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}}))^p =u_{\textrm{GW},1}(S_p({\mathcal {X}}),S_p({\mathcal {Y}}))\).
To prove the second part of the claim, let \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\). By Lemma B.11 we have that
Finally, infimizing over \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) yields
\(\square \)
As a direct consequence of Theorem 3.18, we obtain the following relation between \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}})\) and \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\) for \(p\in [1,\infty )\).
Corollary B.13
For each \(p\in [1,\infty )\), the metric space \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}})\) is isometric to the snowflake transform of \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\), i.e., \(S_p({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\cong ({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}}) \).
Proof
Consider the snowflake transform map \(S_p:{\mathcal {U}}^{\textrm{w}}\!\rightarrow {\mathcal {U}}^{\textrm{w}}\) sending \(X\in {\mathcal {U}}^{\textrm{w}}\) to \(S_p(X)\in {\mathcal {U}}^{\textrm{w}}\). It is obvious that \(S_p\) is bijective. By Theorem 3.18, \(S_p\) is an isometry from \(S_p({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\) to \( ({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}})\). Therefore, \(S_p({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\cong ({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}}) \).\(\square \)
1.3 Proofs from Sect. 3.3
Throughout the following, we demonstrate the open claims from Sect. 3.3.
1.3.1 Proof of Theorem 3.19
First, we focus on the statement for \(p=1\), i.e., on showing
Let \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) be such that
The existence of u and \(\mu \) follows from Proposition B.1.
Claim 1
For any \((x,y),(x'\!,y')\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), we have
Proof of Claim 1
We only need to show that
If \({u_{X}}(x,x')={u_{Y}}(y,y')\), then there is nothing to prove. Otherwise, we assume without loss of generality that \({u_{X}}(x,x')<{u_{Y}}(y,y')\). If \(\max \hspace{0.55542pt}(u(x,y),u(x'\!,y'))<{u_{Y}}(y,y')\), then by the strong triangle inequality we must have \(u(x,y')={u_{Y}}(y,y')=u(x'\!,y)\). However, \(u(x'\!,y)\leqslant \max \hspace{0.55542pt}({u_{X}}(x,x'),u(x,y))<{u_{Y}}(y,y')\), which leads to a contradiction. Therefore,
\(\square \)
By Claim 1, we have that
Therefore, \(u_{\textrm{GW},1}({\mathcal {X}},{\mathcal {Y}})\leqslant 2\hspace{1.111pt}u_{\textrm{GW},1}^{\textrm{sturm}}({\mathcal {X}},{\mathcal {Y}})\).
Applying Theorem 3.18 and (27), yields that for any \(p\in [1,\infty )\)
1.3.2 Proof of Results in Example 3.21
It follows from [60, Rem. 5.17] that
Then, by Proposition 3.3, we have that
Let \(\mu _n\) denote the uniform probability measure of \({\widehat{\Delta }}_n(1)\). Since \({\widehat{\Delta }}_n(1)\) has the constant interpoint distance 1, it is obvious that for any coupling \(\mu \in {\mathcal {C}}(\mu _n,\mu _{2n})\), \({\textrm{dis}}_p(\mu ) = {\textrm{dis}}^{\textrm{ult}}_p(\mu )\) This implies that \(u_{\textrm{GW},p}({\widehat{\Delta }}_n(1),{\widehat{\Delta }}_{2n}(1)) =2\hspace{1.111pt}d_{\textrm{GW},p}({\widehat{\Delta }}_n(1),{\widehat{\Delta }}_{2n}(1))\leqslant ({3}/({2n}))^{1/p}\).
1.3.3 Proof of Theorem 3.22
First, we prove that \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\geqslant u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\). Indeed, for any \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\), we have that
where the first inequality follows from Claim 1 in the proof of Theorem 3.19. Then, by a standard limit argument, we conclude that \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\geqslant u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\).
Next, we prove that \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\leqslant \min \hspace{0.88882pt}\{t\geqslant 0\,{|}\,{\mathcal {X}}_t\cong _{\textrm{w}} {\mathcal {Y}}_t\}\). Let \(t> 0\) be such that \({\mathcal {X}}_t\cong _{\textrm{w}} {\mathcal {Y}}_t\) and let \(\varphi :{{\mathcal {X}}}_t\rightarrow {{\mathcal {Y}}}_t\) denote such an isomorphism. Then, we define a function \(u:X\sqcup Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\sqcup Y\rightarrow {\mathbb {R}}_{\geqslant 0}\) as follows:
-
1.
\(u|_{X\times X}:= u_X\) and \(u|_{Y\times Y}:= u_Y\);
-
2.
for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\),
$$\begin{aligned} u(x,y):= {\left\{ \begin{array}{ll} \,u_{Y_t}(\varphi ([x]_t^X),[y]_t^Y),&{}\text {if}\;\;\varphi ([x]_t^X)\ne [y]_t^Y,\\ \, t,&{}\text {if}\;\;\varphi ([x]_t^X)=[y]_t^Y; \end{array}\right. } \end{aligned}$$ -
3.
for any \((y,x)\in Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\), \(u(y,x):= u(x,y)\).
Then, it is easy to verify that \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and that u is actually an ultrametric. Let \(Z:= (X\sqcup Y,u)\). By Lemma 2.8, we have
We verify that \(d_{\textrm{W},\infty }^Z({\mu _{X}},{\mu _{Y}})\leqslant t\) next. It is obvious that \(Z_t\cong X_t\cong Y_t\). Write \(X_t=\{[x_i]_t^X\}_{i=1}^n\) and \(Y_t=\{[y_i]_t^Y\}_{i=1}^n\) such that \([y_i]_t^Y=\varphi ([x_i]_t^X)\) for each \(i=1,\ldots ,n\). Then, \([x_i]_t^{Z}\!=[y_i]_t^{Z}\) and \(Z_t=\{[x_i]_t^{Z}\,{|}\,i=1,\ldots ,n\}\). Since \(\varphi \) is an isomorphism, for any \(i=1,\dots ,n\) we have that \({\mu _{X}}([x_i]_t^X)={\mu _{Y}}([y_i]_t^Y)\) and thus \({\mu _{X}}([x_i]_t^{Z})={\mu _{Y}}([y_i]_s^{Z})={\mu _{Y}}([x_i]_t^{Z})\) when \({\mu _{X}}\) and \({\mu _{Y}}\) are regarded as pushforward measures under the inclusion map \(X\hookrightarrow Z\) and \(Y\hookrightarrow Z\), respectively. Now for any \(B\in V(Z)\) (cf. Sect. 2.3), if \({\textrm{diam}}\hspace{0.55542pt}(B) \geqslant t\), then B is the union of certain \([x_i]_t^{Z}\)’s in \(Z_t\) and thus \({\mu _{X}}(B)={\mu _{Y}}(B)\). If \({\textrm{diam}}\hspace{0.55542pt}(B) < t\) and \({\textrm{diam}}\hspace{0.55542pt}(B^*) > t\), then there exists some \(x_i\) such that \(B=[x_i]_s^{Z}\) and \([x_i]_s^{Z}\!=[x_i]_t^{Z}\) where \(s:= {\textrm{diam}}\hspace{0.55542pt}(B) \). This implies that \({\mu _{X}}(B)={\mu _{Y}}(B)\). In consequence, we have that \(d_{\textrm{W},\infty }^Z({\mu _{X}},{\mu _{Y}})\leqslant t \) and thus \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\leqslant d_{\textrm{W},\infty }^{\,(X\sqcup Y,u)}({\mu _{X}},{\mu _{Y}})\leqslant t\). Therefore, \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\leqslant \inf \hspace{0.88882pt}\{t\geqslant 0\,{|}\,{\mathcal {X}}_t\cong _{\textrm{w}} {\mathcal {Y}}_t\}\).
Finally, by invoking Theorem 3.14, we conclude that
1.3.4 Proof of Theorem 3.23
We prove the result via an explicit construction. By Theorem 3.22, we have \(s=u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})=u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\). By Theorem 3.14, there exists an isomorphism \(\varphi :{\mathcal {X}}_s\rightarrow {\mathcal {Y}}_s\). Since \(s>0\), by Lemma A.7, both \({\mathcal {X}}_s\) and \({\mathcal {Y}}_s\) are finite spaces. Let \(X_s=\{[x_1]_s^X\!,\dots ,[x_n]_s^X\}\), \(Y_s=\{[y_1]_s^Y\!,\dots ,[y_n]_s^Y\}\) and assume \([y_i]_s^Y\!=\varphi ([x_i]_s^X)\) for each \(i=1,\ldots ,n\). Let \(A:= \{x_1,\dots ,x_n\}\) and define \(\phi :A\rightarrow Y\) by sending \(x_i\) to \(y_i\) for each \(i=1,\ldots ,n\). We prove that \((A,\phi )\) satisfies the conditions in the statement.
Since \(\varphi \) is an isomorphism, for any \(1\leqslant i<j\leqslant n\),
This implies that \(\phi :A\rightarrow Y\) is an isometric embedding and thus \((A,\phi )\in {\mathcal {A}}\).
It is obvious that \((Z_A)_s\) is isometric to both \(X_s\) and \(Y_s\). In fact, \([x_i]_s^{Z_A}=[y_i]_s^{Z_A}\) in \(Z_A\) for each \(i=1,\ldots ,n\) and \((Z_A)_s=\{[x_i]_s^{Z_A}\hspace{0.55542pt}{|}\,i=1,\ldots ,n\}\). Since \(\varphi \) is an isomorphism, for any \(i=1,\dots ,n\) we have that \({\mu _{X}}([x_i]_s^X)={\mu _{Y}}([y_i]_s^Y)\) and thus \({\mu _{X}}([x_i]_s^{Z_A})={\mu _{Y}}([y_i]_s^{Z_A})={\mu _{Y}}([x_i]_s^{Z_A})\) when \({\mu _{X}}\) and \({\mu _{Y}}\) are regarded as pushforward measures under the inclusion maps \(X\rightarrow Z_A\) and \(Y\rightarrow Z_A\), respectively. Now for any \(B\in V(Z_A)\) (cf. Sect. 2.3), if \({\textrm{diam}}\hspace{0.55542pt}(B) \geqslant s\), then B is the union of certain \([x_i]_s^{Z_A}\)’s and thus \({\mu _{X}}(B)={\mu _{Y}}(B)\). If otherwise \({\textrm{diam}}\hspace{0.55542pt}(B) < s\) and \({\textrm{diam}}\hspace{0.55542pt}(B^*) > s\), then there exists \(x_i\) such that \(B=[x_i]_t^{Z_A}\) and \([x_i]_t^{Z_A}\!=[x_i]_s^{Z_A}\) where \(t:= {\textrm{diam}}\hspace{0.55542pt}(B) \). This implies that \({\mu _{X}}(B)={\mu _{Y}}(B)\). By Lemma 2.8, we have \( d_{\textrm{W},\infty }^{Z_A}({\mu _{X}},{\mu _{Y}})\leqslant s\) and thus \( d_{\textrm{W},\infty }^{Z_A}({\mu _{X}},{\mu _{Y}})=s\) since \(d_{\textrm{W},\infty }^{Z_A}({\mu _{X}},{\mu _{Y}})\) is an upper bound for \(s=u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\) due to (8).
1.3.5 Proof of Theorem 3.25
In this section, we prove Theorem 3.25 by modifying the proof of [60, Prop. 5.3].
Lemma B.14
Let \((X,{u_{X}})\) and \((Y,{u_{Y}})\) be compact ultrametric spaces and let \(S\subseteq X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) be non-empty. Assume that \(\sup _{(x,y),(x'\!,y')\in S}\Lambda _\infty (u_X(x,x'),u_Y(y,y'))\leqslant \eta \). Define \(u_S:X\sqcup Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\sqcup Y\rightarrow {\mathbb {R}}_{\geqslant 0}\) as follows:
-
(i)
\(u_S|_{X\times X}:= u_X\) and \(u_S|_{Y\times Y}:= u_Y\);
-
(ii)
for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), \(u_S(x,y):= \inf _{(x'\!,y')\in S}\max \hspace{0.55542pt}(u_X(x,x'),u_Y(y,y'),\eta )\);
-
(iii)
for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), \(u_S(y,x):= u_S(x,y)\).
Then, \(u_S\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(u_S(x,y)\leqslant \eta \) for all \((x,y)\in S\).
Proof
That \(u_S\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) essentially follows by [93, Lem. 1.1]. It remains to prove the second half of the statement. For \((x,y)\in S\), we set \((x'\!,y'):= (x,y)\). This yields
\(\square \)
Proof of Theorem 3.25
Let \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) be a coupling s.t. \(\Vert \Gamma _{X,Y}^\infty \Vert _{L^p(\mu \otimes \mu )}<\delta ^5\). Set \(\varepsilon := 4v_\delta (X)\leqslant 4\). By [60, Claim 10.1], there exist a positive integer \(N\leqslant [1/\delta ]\) and points \(x_1,\ldots ,x_N\) in X such that \(\min _{\,i\ne j}u_X(x_i,x_j)\geqslant {\varepsilon }/{2}\), \(\min _{\,i}{\mu _{X}}( B_\varepsilon ^X(x_i)) >\delta \) and \({\mu _{X}}\bigl (\bigcup _{i=1}^NB_\varepsilon ^X(x_i)\bigr )\geqslant 1-\varepsilon \).
Claim 1
For every \(i=1,\ldots ,N\) there exists \(y_i\in Y\) such that
Proof of Claim 1
Assume the claim is false for some i and let
Then, as \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) it holds
Consequently, we have that \(\mu (Q_i(y))\geqslant \delta ^2{\mu _{X}}( B_\varepsilon ^X(x_i)) \). Further, let
Clearly, it holds for \((x,y,x'\!,y')\in {\mathcal {Q}}_i\) that
Further, we have that \(\mu \hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu ({\mathcal {Q}}_i)\geqslant \delta ^4\). Indeed, it holds
However, this yields that
which contradicts \(\Vert \Gamma _{X,Y}^\infty \Vert _{L^p(\mu \otimes \mu )}<\delta ^5\). \(\square \)
Define for each \(i=1,\ldots ,N\), \(S_i:= B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}B_{2(\varepsilon +\delta )}^Y(y_i)\). Then, by Claim 1, \(\mu (S_i)\geqslant \delta (1-\delta ^2)\), for all \(i=1,\ldots ,N\).
Claim 2
\(\Gamma _{X,Y}^\infty (x_i,y_i,x_j,y_j)\leqslant 6\hspace{0.55542pt}(\varepsilon +\delta )\) for all \(i,j=1,\ldots ,N\).
Proof of Claim 2
Assume the claim fails for some \((i_0,j_0)\), i.e.,
Then, we have \(\Lambda _\infty (u_X(x_{i_0},x_{j_0}),u_Y(y_{i_0},y_{j_0}))=\max \hspace{0.88882pt}(u_X(x_{i_0},x_{j_0}),u_Y(y_{i_0},y_{j_0}))\). We assume without loss of generality that
Consider any \((x,y)\in S_{i_0}\) and \((x'\!,y')\in S_{j_0}\). By the strong triangle inequality and the fact that \(u_X(x_{i_0},x_{j_0})>6(\varepsilon +\delta )>\varepsilon \), it is easy to verify that \(u_X(x,x')=u_X(x_{i_0},x_{j_0})\). Moreover,
Therefore, \(\Gamma _{X,Y}^\infty (x,y,x'\!,y')=u_X(x,x')=u_X(x_{i_0},x_{j_0})= \Gamma _{X,Y}^\infty (x_{i_0},y_{i_0},x_{j_0},y_{j_0})>6\hspace{0.55542pt}(\varepsilon +\delta )>2\delta \). Consequently, we have that
However, for \(\delta \leqslant 1/2\), \(2\delta \hspace{0.55542pt}(\delta (1-\delta ^2))^2\geqslant 2\delta ^5\). This leads to a contradiction. \(\square \)
Consider \(S\subseteq X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) given by \(S:= \{(x_i,y_i)\,{|}\,i=1,\ldots ,N\}\). Let \(u_S\) be the ultrametric on \(X\sqcup Y\) given by Lemma B.14. By Claim 2,
Then, for all \(i=1,\ldots ,N\) we have that \(u_S(x_i,y_i)\leqslant 6(\varepsilon +\delta )\) and for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) we have that \(u_S(x,y)\leqslant \max \hspace{0.55542pt}({\textrm{diam}}\hspace{0.55542pt}(X) ,{\textrm{diam}}\hspace{0.55542pt}(Y) ,6(\varepsilon +\delta )) \leqslant \max \hspace{0.55542pt}({\textrm{diam}}\hspace{0.55542pt}(X) ,{\textrm{diam}}\hspace{0.55542pt}(Y) ,27)=:M'\). Here in the second inequality we use the assumption that \(\delta <{1}/{2}\) and the fact that \(\varepsilon =4\hspace{0.55542pt}v_\delta (X)\leqslant 4\).
Claim 3
Fix \(i\in \{1,\dots ,N\}\). Then, for all \((x,y)\in S_i\), it holds \(u_S(x,y)\leqslant 6\hspace{0.55542pt}(\varepsilon +\delta )\).
Proof of Claim 3
Let \((x,y)\in S_i\). Then, \({u_{X}}(x,x_i)\leqslant \varepsilon \) and \({u_{Y}}(y,y_i)\leqslant 2\hspace{0.55542pt}(\varepsilon +\delta )\). Then, by the strong triangle inequality for \(u_S\) we obtain
\(\square \)
Let \(L:= \bigcup _{i=1}^NS_i\). The next step is to estimate the mass of \(\mu \) in the complement of L.
Claim 4
\(\mu (X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\backslash L)\leqslant \varepsilon +\delta \).
Proof of Claim 4
For each \(i=1,\ldots ,N\), let
Then,
Hence, \(\mu (A_i)=\mu ( B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}Y) -\mu (S_i)={\mu _{X}}( B_\varepsilon ^X(x_i))-\mu (S_i)\), where the last equality follows from the fact that \(\mu \in {\mathcal {M}}({\mu _{X}},{\mu _{Y}})\). By Claim 1, we have that \(\mu (S_i)\geqslant {\mu _{X}}( B_\varepsilon ^X(x_i)) (1-\delta ^2)\). Consequently, \(\mu (A_i)\leqslant {\mu _{X}}( B_\varepsilon ^X(x_i)) \delta ^2\). Notice that
Hence,
Here, the third inequality follows from the choice of the points \(x_i\)s at the beginning of this section and from the fact that \(N\leqslant [1/\delta ]\). \(\square \)
Now,
Since we have for any \(a,b\geqslant 0\) and \(p\geqslant 1\) that \(a^{1/p}+b^{1/p}\geqslant (a+b)^{1/p}\), we obtain
where we used \(\varepsilon =4v_\delta ({\mathcal {X}})\) and \(M:= 2\max \hspace{0.55542pt}({\textrm{diam}}\hspace{0.55542pt}(X) ,{\textrm{diam}}\hspace{0.55542pt}(Y) )+54\geqslant M'+27\). Since the roles of \({\mathcal {X}}\) and \({\mathcal {Y}}\) are symmetric, we have \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\leqslant (4\min \hspace{0.55542pt}(v_\delta ({\mathcal {X}}),v_\delta (Y))+\delta )^{1/p}\hspace{0.55542pt}{\cdot }\hspace{1.111pt}M\). \(\square \)
1.4 Proofs from Sect. 3.4
The subsequent section contains the full proofs of the statements in Sect. 3.4.
1.4.1 Proof of Theorem 3.27
Part 1. We first prove that \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p})\) is non-separable for each \(p\in [1,\infty ]\). Recall notation in Example 3.5 and consider the family \(\{{\widehat{\Delta }}_2(a)\}_{a\in [1,2]}\).
Claim 1
For all \( a\ne b\in [1,2]\), \( u_{\textrm{GW},p}({\widehat{\Delta }}_2(a),{\widehat{\Delta }}_2(b))=2^{-{1/p}}\Lambda _\infty (a,b)\geqslant 2^{-{1/p}}\), where \(2^{-{1/\infty }}:= 1\).
Proof of Claim 1
First note by Theorem 4.1 that
It is easy to verify that \({\textbf{SLB}}_{p}^{\textrm{ult}}({\widehat{\Delta }}_2(a),{\widehat{\Delta }}_2(b))=2^{-{1/p}}\Lambda _\infty (a,b)\). On the other hand, consider the diagonal coupling between \(\mu _a\) and \(\mu _b\), then for \(p\in [1,\infty )\)
and for \(p=\infty \), \(u_{\textrm{GW},\infty }({\widehat{\Delta }}_2(a),{\widehat{\Delta }}_2(b))\leqslant \Lambda _\infty (a,b)\). This concludes the proof. \(\square \)
By Claim 1, we have that \(\{{\widehat{\Delta }}_2(a)\}_{a\in [1,2]}\) is an uncountable subset of \({\mathcal {U}}^{\textrm{w}}\) with pairwise distance greater than \(2^{-{1}/{p}}\), which implies that \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p})\) is non-separable.
Now for \(p\in [1,\infty )\), we show that \(u_{\textrm{GW},p}\) is not complete. Consider the family \(\{\Delta _{2^n}(1)\}_{n\in {\mathbb {N}}}\) of \(2^n\)-point spaces with unitary interpoint distances. Endow each space \(\Delta _{2^n}(1)\) with the uniform measure \(\mu _n\) and denote the corresponding ultrametric measure space by \({\widehat{\Delta }}_{2^n}(1)\). It is proven in [84, Exam. 2.2] that \(\{{\widehat{\Delta }}_{2^n}(1)\}_{n\in {\mathbb {N}}}\) is a Cauchy sequence with respect to \(d_{\textrm{GW},p}\) without a compact metric measure space as limit. It is not hard to check that
Therefore, \(\{{\widehat{\Delta }}_{2^n}(1)\}_{n\in {\mathbb {N}}}\) is a Cauchy sequence with respect to \(u_{\textrm{GW},p}\) without limit in \({\mathcal {U}}^{\textrm{w}}\). This implies that \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p})\) is not complete.
By Theorem 3.19 and that \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p})\) is not separable, \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\) is not separable. As for completeness, consider the subset \(X:= \{1-{1}/{n}\}_{n\in {\mathbb {N}}}\subseteq ({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\). By Lemma A.2, X is not a compact ultrametric space. Let \(\mu _0\in {\mathcal {P}}(X)\) be a probability defined as follows:
For each \(N\in {\mathbb {N}}\), let \(X_N:= \{1-{1}/{n}\,{|}\,n=1,\ldots ,N\}\). Since each \(X_N\) is finite, \((X_N,\Lambda _\infty )\) is a compact ultrametric space. Let \(\mu _N\in {\mathcal {P}}(X_N)\) be a probability defined as follows:
Then, it is easy to verify (e.g. via Theorem 3.7) that \(\{(X_N,\Lambda _\infty ,\mu _N)\}_{N\in {\mathbb {N}}}\) is a \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\) Cauchy sequence with \((X,\Lambda _\infty ,\mu _0)\) being the limit. Since the set X is not compact, \((X,\Lambda _\infty ,\mu _0)\notin {\mathcal {U}}^{\textrm{w}}\) and thus \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\) is not complete.
Part 2. That \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},\infty })\) is non-separable is already proved in Part 1. We prove completeness next. Given a Cauchy sequence \(\{{\mathcal {X}}_n=(X_n,u_n,\mu _n)\}_{n\in {\mathbb {N}}}\) with respect to \(u_{\textrm{GW},\infty }\), we have that the underlying ultrametric spaces \(\{X_n\}_{n\in {\mathbb {N}}}\) form a Cauchy sequence w.r.t. \(u_{\textrm{GH}}\) due to Corollary 3.16. Since \(({\mathcal {U}},u_{\textrm{GH}})\) is complete (see [93, Prop. 2.1]), there exists a compact ultrametric space \((X,u_X)\) such that \(\lim _{\,n\rightarrow \infty }u_{\textrm{GH}}(X_n,X)=0\).
Let \(\{\delta _n\}_{n\in {\mathbb {N}}}\) be a sequence of positive numbers converging to 0 such that \(\delta _n\geqslant u_{\textrm{GH}}(X_n,X)\). By Theorem 2.5, we have that \((X_n)_{\delta _n}\!\cong X_{\delta _n}\). Denote by \({\widehat{\mu }}_n\in {\mathcal {P}}(X_{\delta _n})\) the pushforward of \((\mu _n)_{\delta _n}\) under the isometry. Furthermore, we have by Lemma A.7 that \(X_{\delta _n}\) is finite and we let \(X_{\delta _n}=\{[x_1]_{\delta _n},\ldots ,[x_k]_{\delta _n}\}\) for \(x_1,\ldots ,x_k\in X\). Based on this, we define \(\nu _n:= \sum _{i=1}^k{\widehat{\mu }}_n([x_i]_{\delta _n})\hspace{1.111pt}{\cdot }\hspace{1.111pt}\delta _{x_i}\in {\mathcal {P}}(X) \), where \(\delta _{x_i}\) is the Dirac measure at \(x_i\). Since X is compact, \({\mathcal {P}}(X)\) is weakly compact. Therefore, the sequence \(\{\nu _n\}_{n\in {\mathbb {N}}}\) has a cluster point \(\nu \in {\mathcal {P}}(X)\).
Now we show that \({\mathcal {X}}:= (X,u_X,\nu )\) is a \(u_{\textrm{GW},\infty }\) cluster point of \(\{{\mathcal {X}}_{n}\}_{n\in {\mathbb {N}}}\) and thus the limit of \(\{{\mathcal {X}}_n\}_{n\in {\mathbb {N}}}\) (since \(\{{\mathcal {X}}_n\}_{n\in {\mathbb {N}}}\) is Cauchy). Without loss of generality, we assume that \(\{\nu _n\}_{n\in {\mathbb {N}}}\) weakly converges to \(\nu \). Fix any \(\varepsilon >0\), we need to show that \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {X}}_n)\leqslant \varepsilon \) when n is large enough. For any fixed \(x_*\!\in X\), \([x_*]_{\varepsilon }\) is both an open and closed ball in X. Therefore, \(\nu ([x_*]_{\varepsilon })=\lim _{\,n\rightarrow \infty }\nu _n([x_*]_{\varepsilon })\) (see e.g. [7, Thm. 2.1]). Since \(\delta _n\rightarrow 0\) as \(n\rightarrow \infty \), there exists \(N_1>0\) such that for any \(n>N_1\), \(\delta _n<\varepsilon \). We specify an isometry \(\varphi _n:(X_n)_{\delta _n}\!\rightarrow X_{\delta _n}\) that gives rise to the construction of \(\nu _n\). Then, we let \(\psi _n:(X_n)_\varepsilon \rightarrow X_\varepsilon \) be the isometry such that the following diagram commutes:
Assume that \([x_*]_\varepsilon ^X=\bigcup _{i=1}^l[x_i]_{\delta _n}^X\). Let \(x_*^n\in X_n\) be such that \(\psi _n([x_*^n]_\varepsilon ^{X_n})=[x_*]_\varepsilon ^X\) and let \(x_1^n,\ldots ,x_l^n\in X_n\) be such that \(\varphi _n([x_i^n]_{\delta _n}^{X_n})=[x_i]_{\delta _n}^X\) for each \(i=1,\ldots ,l\). Then, \([x^n_*]_\varepsilon ^{X_n}\!=\bigcup _{i=1}^l[x_i^n]_{\delta _n}^{X_n}\). Therefore,
Since \({\mathcal {X}}_n\) is a Cauchy sequence, there exists \(N_2>0\) such that \(u_{\textrm{GW},\infty }({\mathcal {X}}_n,{\mathcal {X}}_m)<\varepsilon \) when \(n,m>N_2\). Then, by Theorem 3.14, \(({\mathcal {X}}_n)_\varepsilon \cong _{\textrm{w}}({\mathcal {X}}_m)_\varepsilon \) for all \(n,m>N_2\). By Lemma A.7, \((X_n)_\varepsilon \) is finite, then \((X_n)_\varepsilon \) has cardinality independent of n when \(n>N_2\). For all \(n>N_2\), we define the finite set \(A_n:= \{\mu _n([x^n]_\varepsilon ^{X_n})\,{|}\,x^n\in X_n\}\). \(A_n\) is independent of n since \(({\mathcal {X}}_n)_\varepsilon \cong _{\textrm{w}}({\mathcal {X}}_m)_\varepsilon \) for all \(n,m>N_2\). This implies that \(\mu _n([x^n_*]_\varepsilon ^{X_n})\) only takes value in a finite set \(A_n\). Combining with the fact that \(\lim _{\,n\rightarrow \infty }\mu _n([x^n_*]_\varepsilon ^{X_n})=\lim _{\,n\rightarrow \infty }\nu _n([x]_\varepsilon ^X)=\nu ([x_*]_\varepsilon ^X)\) exists, there exists \(N_3>0\) such that when \(n>N_3\), \(\mu _n([x^n_*]_\varepsilon )\equiv C\) for some constant C. This implies that \(\nu ([x_*]_\varepsilon ^X)=\mu _n([x^n_*]_\varepsilon ^{X_n})\), when \(n>\max \hspace{0.55542pt}(N_1,N_2,N_3)\). Since \(X_\varepsilon \) is finite, there exists a common \(N>0\) such that for all \(n>N\) and for all \( [x_*]_\varepsilon \in X_\varepsilon \) we have \(\nu ([x_*]_\varepsilon ^X)=\mu _n([x^n_*]_\varepsilon ^{X_n}) \), where \([x^n_*]^{X_n}_\varepsilon =\psi ^{-1}_n([x_*]_\varepsilon ^X)\in (X_n)_\varepsilon \). This indicates that \(\nu _\varepsilon =(\psi _n)_\#\,(\mu _n)_\varepsilon \) when \(n>N\). Therefore, \({\mathcal {X}}_\varepsilon \cong _{\textrm{w}} ({\mathcal {X}}_n)_\varepsilon \) and thus \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {X}}_n)\leqslant \varepsilon \).
1.4.2 Proof of Proposition 3.28
Next, we will demonstrate Proposition 3.28. However, before we come to this we recall some facts about p-metric and p-geodesic spaces.
Lemma B.15
([64, Prop. 7.30]) Given \(p\in [1,\infty )\), if X is a p-metric space, then X is not q-geodesic for all \(1\leqslant q<p\).
Lemma B.16
([64, Prop. 7.27]) Let X be a geodesic metric space. Then, for any \(p\geqslant 1\), \(S_{1/p}(X)\) is p-geodesic, where \(S_\alpha \) denotes the snowflake transform for \(\alpha >0\) (cf. Sect. 3.3).
For \(p=1\), the proof is based on the following property of the 1-Wasserstein space.
Lemma B.17
([9, Thm. 5.1]) Let X be a compact metric space. Then, the space \(W_1(X):= ({\mathcal {P}}(X),d_{\textrm{W},1}^X)\) is a geodesic space.
Based on the above results and Corollary B.2, the proof of Proposition 3.28 is straightforward.
Proof of Proposition 3.28
Let \({\mathcal {X}}\) and \({\mathcal {Y}}\) be two compact ultrametric measure spaces. First, we consider the case \(p=1\). By Corollary B.2, there exist a compact ultrametric space Z and isometric embeddings \(\phi :X\hookrightarrow Z\) and \(\psi :Y\hookrightarrow Z\) such that
The space \(W_1(Z)\) is geodesic (cf. Lemma B.17). Therefore, there exists a Wasserstein geodesic \({\widetilde{\gamma }}:[0,1]\rightarrow W_1(Z)\) connecting \(\phi _\#\,\mu _X\) and \(\psi _\#\,\mu _Y\). This induces a curve \(\gamma :[0,1]\rightarrow {\mathcal {U}}^{\textrm{w}}\) where for each \(t\in [0,1]\),
Note that \(\gamma (0)\cong _{\textrm{w}}{\mathcal {X}}\) and \(\gamma (1)\cong _{\textrm{w}}{\mathcal {Y}}\) and hence we simply replace \(\gamma (0)\) and \(\gamma (1)\) with \({\mathcal {X}}\) and \({\mathcal {Y}}\), respectively. Now, for each \(s,t\in [0,1]\), we have that
Therefore, \(\gamma \) is a geodesic connecting \({\mathcal {X}}\) and \({\mathcal {Y}}\) and thus \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}})\) is geodesic.
For the case \(p>1\), by Corollary B.13, \(S_p({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\cong ({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}})\). This implies that \(S_{1/p}({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}})\cong ({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\). Hence, by Lemma B.16, \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\) is p-geodesic. \(\square \)
1.5 Technical Details from Sect. 3
In this section, we address various technical issues from Sect. 3.
1.5.1 The Wasserstein Pseudometric
Given a set X, a pseudometric is a symmetric function \(d_X:X\hspace{1.111pt}{\times }\hspace{1.111pt}X\rightarrow {\mathbb {R}}_{\geqslant 0}\) satisfying the triangle inequality and \(d_X(x,x)=0\) for all \(x\in X\). Note that if moreover \(d_X(x,y)=0\) implies \(x=y\), then \(d_X\) is a metric. There is a canonical identification on pseudometric spaces \((X,d_X)\): \(x\sim x'\) if \(d_X(x,x')=0\). Then, \(\sim \) is in fact an equivalence relation and we define the quotient space \({\widetilde{X}}=X/{\sim }\). Define a function \({\widetilde{d}}_X:{\widetilde{X}}\hspace{1.111pt}{\times }\hspace{1.111pt}{\widetilde{X}}\rightarrow {\mathbb {R}}_{\geqslant 0}\) as follows:
\({\widetilde{d}}_X\) turns out to be a metric on \({\widetilde{X}}\). In the sequel, the metric space \(({\widetilde{X}},{\widetilde{d}}_X)\) is referred to as the metric space induced by the pseudometric space \((X,d_X)\). Note that \({\widetilde{d}}_X\) preserves the induced topology (see e.g. [41]) and thus the quotient map \(\Psi :X\rightarrow {\widetilde{X}}\) is continuous.
Analogously to the Wasserstein distance, which is defined for probability measures on metric spaces, we define the Wasserstein pseudometric for measures on compact pseudometric spaces as done in [85]. Let \(\alpha ,\beta \in {\mathcal {P}}(X)\). Then, we define for \(p\in [1,\infty )\) the Wasserstein pseudometric of order p as
and for \(p=\infty \) as
It is easy to see that the Wasserstein pseudometric is closely related to the Wasserstein distance on the induced metric space. More precisely, one can show the following.
Lemma B.18
Let \((X,d_X)\) denote a compact pseudometric space, let \(\alpha ,\beta \in {\mathcal {P}}(X)\). Then, it follows for \(p\in [1,\infty ]\) that
and that the infimum in (28) (resp. in (29) if \(p=\infty \)) is attained for some \(\mu \in {\mathcal {C}}(\alpha ,\beta )\).
Proof
In the course of this proof we focus on the case \(p<\infty \) and remark that the case \(p=\infty \) follows by similar arguments. The quotient map allows us to define the map \(\theta :{\mathcal {C}}(\alpha ,\beta )\rightarrow {\mathcal {C}}(\Psi _\#\,{\alpha },\Psi _\#\,{\beta })\) via \(\mu \mapsto (\Psi \hspace{1.111pt}{\times }\hspace{1.111pt}\Psi )_\#\,\mu \). It is easy to see that \(\theta \) is well defined and surjective. Furthermore, it holds by construction that
for all \(\mu \in {\mathcal {C}}(\alpha ,\beta )\). Hence, (30) follows.
We come to the second part of the claim. By [91, Sect. 4] there exists an optimal coupling \({\widetilde{\mu }}^*\in {\mathcal {C}}(\Psi _\#\,{\alpha },\Psi _\#\,{\beta })\) such that
In consequence, we find using our previous results that for any \(\mu ^*\in \theta ^{-1}({\widetilde{\mu }}^*)\) it holds
This yields the claim.\(\square \)
1.5.2 Regularity of the Cost Functionals of \(u_{\textrm{GW},p}\) and \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\)
In the remainder of this section, we collect various technical results required to demonstrate the existence of optimizers in the definitions of \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\) (see (8)) and \(u_{\textrm{GW},p}\) (see (11)).
Lemma B.19
Let \({\mathcal {X}}={(X,{u_{X}},{\mu _{X}}) }\) and \({\mathcal {Y}}={(Y,{u_{Y}},{\mu _{Y}}) }\) be compact ultrametric measure spaces. Then, \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\subseteq {\mathcal {P}}(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y,\max \hspace{0.55542pt}(u_X,u_Y))\) is compact w.r.t. weak convergence.
Proof
The proof follows directly from [21, Lem. 2.2].\(\square \)
Lemma B.20
Let \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\). Let \(D_1\subseteq {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) be a non-empty subset satisfying the following: there exist \((x_0,y_0)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) and \(C>0\) such that \(u(x_0,y_0)\leqslant C\) for all \(u\in D_1\). Then, \(D_1\) is pre-compact with respect to uniform convergence.
Proof
Let \(\{u_n\}_{n\in {\mathbb {N}}}\subseteq D_1\) be a sequence. Note that \(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\subseteq X\sqcup Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\sqcup Y\). Let \(v_n:= u_n|_{X\times Y}\). For any \(n\in {\mathbb {N}}\) and any \((x,y),(x'\!,y')\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), we have that
This means that \(\{v_n\}_{n\in {\mathbb {N}}}\) is equicontinuous with respect to the ultrametric \(\max \hspace{0.55542pt}\{{u_{X}},{u_{Y}}\}\) on \(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\). Now, since \(u_n(x_0,y_0)\leqslant C\), we have that for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\),
Consequently, \(\{v_n\}_{n\in {\mathbb {N}}}\) is uniformly bounded. By the Arzéla–Ascoli theorem ([47, Thm. 7 on p. 61]), each subsequence of \(\{v_n\}_{n\in {\mathbb {N}}}\) has a uniformly convergent subsequence. Hence, we assume without loss of generality that \(\{v_n\}_{n\in {\mathbb {N}}}\) converges to \(v:X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\rightarrow {\mathbb {R}}_{\geqslant 0}\).
Now, we define a symmetric function \(u:X\sqcup Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\sqcup Y\rightarrow {\mathbb {R}}_{\geqslant 0}\) as follows:
-
(i)
\(u|_{X\times X}:= u_X\) and \(u|_{Y\times Y}:= u_Y\);
-
(ii)
\(u|_{X\times Y}:= v\); for \((y,x)\in Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\), we let \(u(y,x):= u(x,y)\).
It is easy to verify that \(u\in {\mathcal {D}}^{\textrm{ult}}(u_X,u_Y)\) and that u is a cluster point of the sequence \(\{u_n\}_{n\in {\mathbb {N}}}\). Therefore, \(D_1\) is pre-compact.\(\square \)
Lemma B.21
Let \({\mathcal {X}}={(X,{u_{X}},{\mu _{X}}) }\) and \({\mathcal {Y}}={(Y,{u_{Y}},{\mu _{Y}}) }\) be compact ultrametric measure spaces. Let \(\{\mu _n\}_{n\in {\mathbb {N}}}\subseteq {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) be a sequence weakly converging to \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\). Let \(\{u_n\}_{n\in {\mathbb {N}}}\subseteq {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\). Suppose that there exist a non-decreasing sequence \(\{p_n\}_{n\in {\mathbb {N}}}\subseteq [1,\infty )\) and \(C>0\) such that for all \(n\in {\mathbb {N}}\),
Then, \(\{u_n\}_{n\in {\mathbb {N}}}\) uniformly converges to some \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) (up to taking a subsequence).
Proof
The following argument adapts the proof of [83, Lem. 3.3] to the current setting. For any \((x_0,y_0)\in {{\textrm{supp}}\hspace{0.55542pt}(\mu ) }\), there exist \(\varepsilon ,\delta >0\) and \(N\in {\mathbb {N}}\) such that for all \(n\geqslant N\)
Therefore, \(\{u_n(x_0,y_0)\}_{n\geqslant N}\) is uniformly bounded. By Lemma B.20, we have that \(\{u_n\}_{n\in {\mathbb {N}}}\) has a uniformly convergent subsequence.\(\square \)
Lemma B.22
Let X, Y be ultrametric spaces, then
is continuous with respect to the product topology (induced by \(\max \hspace{0.55542pt}({u_{X}},{u_{Y}}, {u_{X}},{u_{Y}})\)).
Proof
Fix \((x,y,x'\!,y')\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) and \(\varepsilon >0\). Choose \(0<\delta <\varepsilon \) such that \(\delta <u_X(x,x')\) if \(x\ne x'\) and \(\delta <u_Y(y,y')\) if \(y\ne y'\). Then, consider any point \((x_1,y_1,x_1',y_1')\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) such that
For \({u_{X}}(x_1,x_1')\), we have the following two situations:
-
(i)
\(x=x'\): \({u_{X}}(x_1,x_1')\leqslant \max \hspace{0.55542pt}({u_{X}}(x_1,x),{u_{X}}(x,x_1'))\leqslant \delta <\varepsilon \);
-
(ii)
\(x\ne x'\): \({u_{X}}(x_1,x_1')\leqslant \max \hspace{0.55542pt}({u_{X}}(x_1,x),{u_{X}}(x,x'),{u_{X}}(x'\!,x_1'))={u_{X}}(x,x')\). Similarly, \({u_{X}}(x,x')\leqslant {u_{X}}(x_1,x_1')\) and thus \({u_{X}}(x,x')={u_{X}}(x_1,x_1')\).
Similar result holds for \({u_{Y}}(y_1,y_1')\).
This leads to four cases for \(\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))\):
-
(i)
\(x=x'\), \(y=y'\): In this case we have \({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1')< \varepsilon \). Then,
$$\begin{aligned} \bigl |\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))&-\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\bigr |\\&=\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))\leqslant \varepsilon ; \end{aligned}$$ -
(ii)
\(x=x'\), \(y\ne y'\): Now \({u_{X}}(x_1,x_1')<\varepsilon \) and \({u_{Y}}(y_1,y_1')={u_{Y}}(y,y')\). If \({u_{Y}}(y,y')\geqslant \varepsilon >{u_{X}}(x_1,x_1')\), then
$$\begin{aligned} \bigl |\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))&-\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\bigr |\\&=|{u_{Y}}(y,y')-{u_{Y}}(y,y')|=0. \end{aligned}$$Otherwise \({u_{Y}}(y,y')<\varepsilon \), which implies that \(\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))\leqslant \varepsilon \) and \(\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))={u_{Y}}(y,y')\leqslant \varepsilon \). Therefore,
$$\begin{aligned}\bigl |\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))-\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\bigr |\leqslant \varepsilon ;\end{aligned}$$ -
(iii)
\(x\ne x'\), \(y=y'\): Similarly with (ii) we have
$$\begin{aligned}\bigl |\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))-\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\bigr |\leqslant \varepsilon ;\end{aligned}$$ -
(iv)
\(x\ne x'\), \(y\ne y'\): Now \({u_{X}}(x_1,x_1')={u_{X}}(x,x')\) and \({u_{Y}}(y_1,y_1')={u_{Y}}(y,y')\). Therefore,
$$\begin{aligned}\bigl |\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))-\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\bigr |=0.\end{aligned}$$
In conclusion, whenever \({u_{X}}(x,x_1),{u_{Y}}(y,y_1),{u_{X}}(x'\!,x_1'),{u_{Y}}(y'\!,y_1')\leqslant \delta \) we have that
Therefore, \(\Lambda _\infty ({u_{X}},{u_{Y}})\) is continuous with respect to the metric \(\max \hspace{0.55542pt}({u_{X}},{u_{Y}}, {u_{X}},{u_{Y}})\).\(\square \)
1.5.3 \(u_{\textrm{GW},p}\) and the One Point Space
Below, we prove that \(u_{\textrm{GW},p}\), \(1\leqslant p\leqslant \infty \), between an arbitrary \({\mathcal {X}}\in {\mathcal {U}}^{\textrm{w}}\) and the one point ultrametric measure space \(*\) agrees with the p-diameter of \({\mathcal {X}}\) (see e.g., [60]): for \(1\leqslant p\leqslant \infty \) as \(\textrm{diam}_p({\mathcal {X}}):= \Vert d_X\Vert _{L^p({\mu _{X}}\otimes {\mu _{X}})}\).
Proposition B.23
Let \(*\in {\mathcal {U}}^{\textrm{w}}\) be the one-point space. Then, it holds for any \(1\leqslant p\leqslant \infty \) that \(u_{\textrm{GW},p}({\mathcal {X}},*) = \textrm{diam}_p({\mathcal {X}})\).
Proof
Note that in this case, for every \(x,x'\!\in X\) \(\Lambda _\infty (u_X(x,x'),u_*(*,*)) = \Lambda _\infty (u_X(x,x'),0) = u_X(x,x')\). Therefore, thanks to this observation, and the fact that \(\mu := \mu _X\hspace{0.55542pt}{\otimes }\hspace{1.111pt}\delta _*\) is the unique coupling between \(\mu _X\) and \(\delta _*\), (10) leads to the claim.\(\square \)
Technical Details from Sect. 4
1.1 Proofs from Sect. 4
In this section, we state the full proofs of the results from Sect. 4.
1.1.1 Proof of Theorem 4.1
Part 1. We observe that for any point x in an ultrametric space X, there always exists \(x'\!\in X\) such that \({u_{X}}(x,x')={\textrm{diam}}\hspace{0.55542pt}(X) \) (see [27]). Since by assumption \(\mu _X\) is fully supported, \(s_{X,\infty }\equiv {\textrm{diam}}\hspace{0.55542pt}(X) \) is a constant function. Therefore, \(\Lambda _\infty (s_{X,\infty }(x),s_{Y,\infty }(y))\equiv \Lambda _\infty ({\textrm{diam}}\hspace{0.55542pt}(X) ,{\textrm{diam}}\hspace{0.55542pt}(Y) )\) for all \(x\in X\) and \(y\in Y\). This implies that \({\textbf{FLB}}_{\infty }^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})=\Lambda _\infty ({\textrm{diam}}\hspace{0.55542pt}(X) ,{\textrm{diam}}\hspace{0.55542pt}(Y) )\). By [64, Cor. 5.3] and Corollary 3.16, we have that
Part 2. The proof for \(d_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})\geqslant \textbf{TLB}_p({\mathcal {X}},{\mathcal {Y}})\) in [60, Sect. 6] can be used essentially without any change for showing \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})\geqslant {\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})\). Hence, it remains to show that \({\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})\geqslant {\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})\):
Proposition C.1
Let \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\) and let \(p\in [1,\infty ]\). Then, \( {\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})\geqslant {\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})\).
In order to prove Proposition C.1, we need the following technical lemma.
Lemma C.2
Let \({\mathcal {X}}={(X,{d_{X}},{\mu _{X}}) }\in {\mathcal {U}}^{\textrm{w}}\). Then, \({\textrm{spec}}\hspace{0.55542pt}(X):= \{{u_{X}}(x,x')\,{|}\, x,x'\!\in {\mathcal {X}}\}\) is a compact subset of \(({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\).
Proof
By Lemma A.7, we have that for each \(t>0\), \(X_t\) is a finite set. Let \(\{t_n\}_{n=1}^\infty \) be a positive sequence decreasing to 0. Then, it is easy to see that \({\textrm{spec}}\hspace{0.55542pt}(X)=\bigcup _{n=1}^\infty {\textrm{spec}}\hspace{0.55542pt}(X_{t_n})\). Since each \({\textrm{spec}}\hspace{0.55542pt}(X_{t_n})\) is a finite set, \({\textrm{spec}}\hspace{0.55542pt}(X)\) is a countable set.
Now, pick any \(0\ne t\in {\textrm{spec}}\hspace{0.55542pt}(X)\). Suppose t is a cluster point in \({\textrm{spec}}\hspace{0.55542pt}(X)\). Then, there exists infinitely many \(s\in {\textrm{spec}}\hspace{0.55542pt}(X)\) greater than t/2. However, this will result in \(X_{t/2}\) being an infinite set, which contradicts the fact that \(X_{t/2}\) is finite. Therefore, 0 is the only possible cluster point of \({\textrm{spec}}\hspace{0.55542pt}(X)\). By Lemma A.2, we have that \({\textrm{spec}}\hspace{0.55542pt}(X)\) is compact.\(\square \)
Next we demonstrate Proposition C.1 and hence finish the proof of Theorem 4.1.
Proof of Proposition C.1
We first prove the case when \(p<\infty \). Let \(dh_{\mathcal {X}}(x):= {u_{X}}(x,\hspace{0.55542pt}{\cdot }\hspace{1.111pt})_\#\,{\mu _{X}}\) and let \(dh_{\mathcal {Y}}(y):= {u_{Y}}(y,\hspace{0.55542pt}{\cdot }\hspace{1.111pt})_\#\,{\mu _{Y}}\). Further, define
Lemma C.2 implies that the set \(S:= {\textrm{spec}}\hspace{0.55542pt}(X)\cup {\textrm{spec}}\hspace{0.55542pt}(Y)\) is a compact subset of \(({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\). It is easy to see that \(\textrm{supp}\hspace{1.111pt}(dh_{\mathcal {X}}),\textrm{supp}\hspace{0.55542pt}(dh_{\mathcal {Y}}),\textrm{supp}\hspace{0.55542pt}(dH_{\mathcal {X}}),\textrm{supp}\hspace{0.55542pt}(dH_{\mathcal {Y}})\subseteq S\subseteq {\mathbb {R}}_{\geqslant 0}\). Now, recall by Proposition 4.4 that \({\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})=d^{\,(S,\Lambda _{\infty })}_{\textrm{W},p}(dH_{\mathcal {X}},dH_{\mathcal {Y}})\) and
Further, we observe for any \(x\in X\) and \(y\in Y\) that
For the remainder of this proof, the metric on \(S\subseteq {\mathbb {R}}_{\geqslant 0}\) is always given by \(\Lambda _\infty \). Additionally, \({\mathcal {P}}(S)\) denotes the set of probability measures on S and we equip \({\mathcal {P}}(S)\) with the Borel \(\sigma \)-field with respect to the topology induced by weak convergence.
Claim 1
There is a measurable choice \((x,y)\mapsto \pi ^*_{xy}\) such that for each \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), \(\pi ^*_{x,y}\) is an optimal transport plan between \(dh_{\mathcal {X}}(x)\) and \(dh_{\mathcal {Y}}(y)\).
Proof of Claim 1
Since both \(\Lambda _1\) and \(\Lambda _\infty \) induce the same topology on S, and thus the same Borel sets on S, \(d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _1)}\) and \(d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )}\) metrize the same weak topology on \({\mathcal {P}}(S)\). By [61, Rem. 2.5], the following two maps are continuous with respect to the weak topology and thus measurable:
Since S is compact, the space \(({\mathcal {P}}(S),d_{\textrm{W},p}^{\,(S,\Lambda _\infty )})\) is separable [91, Thm. 6.18]. This yields that \({\mathscr {B}}( {\mathcal {P}}(S)\hspace{1.111pt}{\times }\hspace{1.111pt}{\mathcal {P}}(S))={\mathscr {B}}( {\mathcal {P}}(S))\hspace{1.111pt}{\otimes }\hspace{1.111pt}{\mathscr {B}}( {\mathcal {P}}(S))\) [33, Prop. 1.5]. Hence, the product \(\Phi :X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\rightarrow {\mathcal {P}}(S)\hspace{1.111pt}{\times }\hspace{1.111pt}{\mathcal {P}}(S)\) of \(\Phi _1\) and \(\Phi _2\), defined by \((x,y)\mapsto (dh_{\mathcal {X}}(x),dh_{\mathcal {Y}}(y))\) is measurable [33, Prop. 2.4]. Then, a direct application of [91, Cor. 5.22] gives the claim. \(\square \)
Now, we have that for every \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) that
by Fubini’s Theorem, where \(\bar{\mu }\in {\mathcal {P}}(S\hspace{1.111pt}{\times }\hspace{1.111pt}S)\) is defined as
for any measurable \(A\subseteq S\hspace{1.111pt}{\times }\hspace{1.111pt}S\). We remark that by Claim 1 the measure \(\bar{\mu }\) is well defined. Next, we verify that \(\bar{\mu }\in {\mathcal {C}}(dH_{\mathcal {X}},dH_{\mathcal {Y}})\). For any measurable \(A\subseteq (S,\Lambda _\infty )\) we have
where we have applied the marginal constraints for \(\pi _{xy}\) and \(\mu \). Further, (i) follows by the change-of-variables formula. The analogous arguments give that \(\bar{\mu }(S\hspace{1.111pt}{\times }\hspace{1.111pt}B)=dH_{\mathcal {Y}}(B)\) for any measurable \(B\subseteq S\). Thus, we conclude that for every \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\)
This gives the claim for \(p<\infty \).
Next, we prove the assertion for the case \(p=\infty \). Note that for any \(p<\infty \)
where the inequality holds since \(d^{\,(S,\Lambda _{\infty })}_{\textrm{W},p}\!\leqslant d^{\,(S,\Lambda _{\infty })}_{\textrm{W},\infty }\) and \(\Vert \hspace{1.111pt}{\cdot }\hspace{1.111pt}\Vert _{L^p(\mu )}\leqslant \Vert \hspace{1.111pt}{\cdot }\hspace{1.111pt}\Vert _{L^\infty (\mu )}\).
By [35, Prop. 3] we have that
Therefore,
\(\square \)
1.1.2 Proof of Proposition 4.4
We only prove the first statement for \(p\in [1,\infty )\). The case \(p=\infty \) as well as the second statement can be proven in a similar manner.
By directly using the change-of-variables formula, we have the following:
where
maps \((x,x'\!,y,y')\) to \(({u_{X}}(x,x'),{u_{Y}}(y,y'))\). By Lemma A.5,
Therefore,
1.1.3 An Example: \({\textbf{SLB}}_{}^{\textrm{ult}}\) vs. \({\textbf{TLB}}_{}^{\textrm{ult}}\)
We will demonstrate that there are ultrametric measure spaces \({\mathcal {X}}_1\) and \({\mathcal {X}}_2\) such that \({\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}}_1,{\mathcal {X}}_2)=0\), while it holds \({\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}}_1,{\mathcal {X}}_2)>0\).
Consider the three point space \(\Delta _3(1)=(\{x_1,x_2,x_3\},u)\) where \(u(x_i,x_j)=1\) whenever \(i\ne j\). Construct two probability measures \(\mu _1:= \frac{2}{3}\delta _{x_1}+\frac{1}{6}\delta _{x_2}+\frac{1}{6}\delta _{x_3}\) and \(\mu _2:= \frac{1}{3}\delta _{x_1}+\bigl (\frac{1}{3}-\frac{1}{2\sqrt{3}}\bigr )\hspace{0.55542pt}\delta _{x_2}+ \bigl (\frac{1}{3}+\frac{1}{2\sqrt{3}}\bigr )\hspace{0.55542pt}\delta _{x_3}\). We then let \({\mathcal {X}}_1:= (\Delta _3(1),\mu _1)\) and \({\mathcal {X}}_2:= (\Delta _3(1),\mu _2)\). Obviously, \(u_\#\,(\mu _1\hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu _1)=u_\#\,(\mu _2\hspace{0.55542pt}{\otimes }\hspace{1.111pt}\mu _2)=\delta _0/2+\delta _1/2\). Then, by Proposition 4.4 we immediately have that \({\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}}_1,{\mathcal {X}}_2)=0\) for any \(p\in [1,\infty ]\). Now, note that \(u(x_1,\hspace{0.55542pt}{\cdot }\hspace{1.111pt})_\#\,\mu _1={2}\delta _0/3+\delta _1/3\), which is different from \(u(x_i,\hspace{0.55542pt}{\cdot }\hspace{1.111pt})_\#\,\mu _2\) for each \(i=1,2,3\). This implies (by Proposition 4.4) that \({\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}}_1,{\mathcal {X}}_2)>0\) for any \(p\in [1,\infty ]\).
Note that this example works as well for showing that \({\textbf{TLB}}_{p}({\mathcal {X}}_1,{\mathcal {X}}_2)>{\textbf{SLB}}_{p}({\mathcal {X}}_1,{\mathcal {X}}_2)=0\).
Technical Details from Sect. 5
1.1 Technical Details from Sect. 5.2
Here, we list the precise results for the comparisons of the spaces \({\mathcal {X}}_i\), \(1\leqslant i\leqslant 4\), illustrated in Fig. 7. They are gathered in Tables 1 and 2.
1.2 Technical Details from Sect. 5.3
Here, we state more results for the comparison of the ultrametric measure spaces illustrated in Fig. 7 and give the precise construction of the ultrametric spaces \(Z_{k,t}^i\), \(2\leqslant k\leqslant 5\), \(t=0,0.2,0.4,0.4\), \(1\leqslant i\leqslant 15\).
The ultrametric measure spaces from Fig. 7 See Table 3 for the results of comparing the ultrametric dissimilarity spaces in Fig. 7 based on \(d_{\textrm{GW},1}\) and \({\textbf{SLB}}_{1}\).
Construction of \(Z_k\) For each \(k=2,3,4,5\) we first draw a sample with \(100\hspace{1.111pt}{\times }\hspace{1.111pt}k\) points from the distribution \(\sum _{i=0}^k U[1.5(k-1),1.5(k-1)+1]/k\), where U[a, b] denotes the uniform distribution on [a, b]. For each sample, we employ the single linkage algorithm to create a dendrogram, which then induces an ultrametric on the given sample. We further draw a 30-point subspace from each ultrametric space and denote it by \(Z_k\). These four spaces have similar diameter values between 0.5 and 0.6. Each space \(Z_k\) is equipped with the uniform probability measure and the resulting ultrametric measure space is denoted by \({\mathcal {Z}}_{k}=( Z_{k},u_{Z_k},\mu _{Z_k}) \), \(k=2,3,4,5\). We remark that k can be regarded as the number of blocks in the dendrogram representation of the obtained ultrametric measure spaces (see the top row of Fig. 8 for a visualization of three 3-block spaces).
Perturbations at level t . Given a perturbation level \(t\geqslant 0\) and an ultrametric space X, we consider the quotient space \(X_t\). Each equivalence class \([x]_t\subseteq X\) is an ultrametric subspace of X. If \(|[x]_t|>1\), we let \(m:= | {\textrm{spec}}\hspace{0.55542pt}([x]_t)|-1\) and write \({\textrm{spec}}\hspace{0.55542pt}([x]_t)=\{0<s_1<\cdots <s_m\}\). Let \(\delta := {\textrm{diam}}\hspace{0.55542pt}([x]_t) \). We generate m uniformly distributed numbers from \([0, t-\delta ]\) and sort them according to ascending order to obtain \(a_1\leqslant \cdots \leqslant a_m\). We then perturb \(u_{X}|_{[x]_t\hspace{1.111pt}{\times }\hspace{1.111pt}[x]_t}\) by replacing \(s_i\) with \(s_i+a_i\) for each \(i=1,\ldots ,m\). We do the same for all equivalence classes \([x]_t\) and thus obtain a new ultrametric on X.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mémoli, F., Munk, A., Wan, Z. et al. The Ultrametric Gromov–Wasserstein Distance. Discrete Comput Geom 70, 1378–1450 (2023). https://doi.org/10.1007/s00454-023-00583-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00454-023-00583-0