Skip to main content
Log in

The Ultrametric Gromov–Wasserstein Distance

  • Published:
Discrete & Computational Geometry Aims and scope Submit manuscript

Abstract

We investigate compact ultrametric measure spaces which form a subset \(\mathcal {U}^{\textrm{w}}\) of the collection of all metric measure spaces \(\mathcal {M}^{\textrm{w}}\). In analogy with the notion of the ultrametric Gromov–Hausdorff distance on the collection of ultrametric spaces \(\mathcal {U}\), we define ultrametric versions of two metrics on \(\mathcal {U}^{\textrm{w}}\), namely of Sturm’s Gromov–Wasserstein distance of order p and of the Gromov–Wasserstein distance of order p. We study the basic topological and geometric properties of these distances as well as their relation and derive for \(p=\infty \) a polynomial time algorithm for their calculation. Further, several lower bounds for both distances are derived and some of our results are generalized to the case of finite ultra-dissimilarity spaces. Finally, we study the relation between the Gromov–Wasserstein distance and its ultrametric version (as well as the relation between the corresponding lower bounds) in simulations and apply our findings for phylogenetic tree shape comparisons.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Algorithm 2
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

The code and datasets generated during and/or analyzed during the current study are available in https://github.com/ndag/uGW and from http://dx.doi.org/10.5061/dryad.3r8v1.

Notes

  1. The term “p-distortion” is not used in [59, 60]. However, the quantity \({\textrm{dis}}_p(\mu )\) is introduced as \(J_p(\mu )\) in both references.

  2. Here “approximation” is meant in the sense that one can write code which will locally minimize the functional. There are in general no theoretical guarantees that these algorithms will converge to a global minimum.

  3. A cluster point x in a topological space X is such that any neighborhood of x contains countably many points in X.

  4. A pseudo-ultrametric is a pseudometric which satisfies the strong triangle inequality (cf. (6)); see Sect. B.5.1 for the definition and further discussion on pseudometrics.

  5. The algorithm can be sped up via a binary search process which we do not include for simplicity of presentation.

References

  1. Adelson-Welsky, G.M., Kronrode, A.S.: Sur les lignes de niveau des fonctions continues possédant des dérivées partielles. C. R. (Doklady) Acad. Sci. URSS (N.S.) 49, 235–237 (1945)

  2. Agarwal, P.K., Fox, K., Nath, A., Sidiropoulos, A., Wang, Y.: Computing the Gromov-Hausdorff distance for metric trees. ACM Trans. Algorithms 14(2), 24 (2018)

    MathSciNet  MATH  Google Scholar 

  3. Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Pearson Education, London (1974)

    MATH  Google Scholar 

  4. Alvarez-Melis, D., Jaakkola, T.: Gromov–Wasserstein alignment of word embedding spaces. In: 2018 Conference on Empirical Methods in Natural Language Processing (Brussels 2018), pp. 1881–1890. Association for Computational Linguistics (2018)

  5. Bartal, Y.: Probabilistic approximation of metric spaces and its algorithmic applications. In: 37th Annual Symposium on Foundations of Computer Science (Burlington 1996), pp. 184–193. IEEE, Los Alamitos (1996)

  6. Billera, L.J., Holmes, S.P., Vogtmann, K.: Geometry of the space of phylogenetic trees. Adv. Appl. Math. 27(4), 733–767 (2001)

    MathSciNet  MATH  Google Scholar 

  7. Billingsley, P.: Convergence of Probability Measures. Probability and Statistics. Wiley, New York (2013)

    MATH  Google Scholar 

  8. Bonneel, N., Rabin, J., Peyré, G., Pfister, H.: Sliced and Radon Wasserstein barycenters of measures. J. Math. Imaging Vision 51(1), 22–45 (2015)

    MathSciNet  MATH  Google Scholar 

  9. Bottou, L., Arjovsky, M., Lopez-Paz, D., Oquab, M.: Geometrical insights for implicit generative modeling. In: Rozonoer, L., et al. (eds.) Braverman Readings in Machine Learning. Lecture Notes in Computer Science, vol. 11100, pp. 229–268. Springer, Cham (2018)

    Google Scholar 

  10. Brinkman, D., Olver, P.J.: Invariant histograms. Am. Math. Mon. 119(1), 4–24 (2012)

    MathSciNet  MATH  Google Scholar 

  11. Bronstein, A.M., Bronstein, M.M., Bruckstein, A.M., Kimmel, R.: Partial similarity of objects, or how to compare a centaur to a horse. Int. J. Comput. Vis. 84(2), 163–183 (2009)

    Google Scholar 

  12. Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Efficient computation of isometry-invariant distances between surfaces. SIAM J. Sci. Comput. 28(5), 1812–1836 (2006)

    MathSciNet  MATH  Google Scholar 

  13. Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching. Proc. Natl. Acad. Sci. USA 103(5), 1168–1172 (2006)

    MathSciNet  MATH  Google Scholar 

  14. Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Topology-invariant similarity of nonrigid shapes. Int. J. Comput. Vis. 81(3), 281–301 (2009)

    MATH  Google Scholar 

  15. Bronstein, A.M., Bronstein, M.M., Kimmel, R., Mahmoudi, M., Sapiro, G.: A Gromov–Hausdorff framework with diffusion geometry for topologically-robust non-rigid shape matching. Int. J. Comput. Vis. 89(2–3), 266–286 (2010)

    MATH  Google Scholar 

  16. Brown, P., Pullan, W., Yang, Y., Zhou, Y.: Fast and accurate non-sequential protein structure alignment using a new asymmetric linear sum assignment heuristic. Bioinformatics 32(3), 370–377 (2016)

    Google Scholar 

  17. Bunne, C., Alvarez-Melis, D., Krause, A., Jegelka, S.: Learning generative models across incomparable spaces. In: 36th International Conference on Machine Learning (Long Beach 2019), pp. 851–861. PMLR (2019)

  18. Carlsson, G., Mémoli, F.: Characterization, stability and convergence of hierarchical clustering methods. J. Mach. Learn. Res. 11, 1425–1470 (2010)

    MathSciNet  MATH  Google Scholar 

  19. Chazal, F., Cohen-Steiner, D., Guibas, L.J., Mémoli, F., Oudot, S.Y.: Gromov–Hausdorff stable signatures for shapes using persistence. In: 7th Symposium on Geometry Processing (Berlin 2009), pp. 1393–1403. ACM, New York (2009)

  20. Chen, J., Safro, I.: Algebraic distance on graphs. SIAM J. Sci. Comput. 33(6), 3468–3490 (2011)

    MathSciNet  MATH  Google Scholar 

  21. Chowdhury, S., Mémoli, F.: The Gromov-Wasserstein distance between networks and stable network invariants. Inf. Inference 8(4), 757–787 (2019)

    MathSciNet  MATH  Google Scholar 

  22. Chowdhury, S., Needham, T.: Generalized spectral clustering via Gromov–Wasserstein learning. In: 24th International Conference on Artificial Intelligence and Statistics (San Diego 2021), pp. 712–720. PMLR (2021)

  23. Colijn, C., Plazzotta, G.: A metric on phylogenetic tree shapes. Syst. Biol. 67(1), 113–126 (2018)

    Google Scholar 

  24. David, G., Semmes, S.W.: Fractured Fractals and Broken Dreams: Self-Similar Geometry Through Metric and Measure. Oxford Lecture Series in Mathematics and its Applications, vol. 7. Oxford University Press, New York (1997)

  25. Do Ba, K., Nguyen, H.L., Nguyen, H.N., Rubinfeld, R.: Sublinear time algorithms for Earth mover’s distance. Theory Comput. Syst. 48(2), 428–442 (2011)

    MathSciNet  MATH  Google Scholar 

  26. Dong, Y., Sawin, W.: COPT: Coordinated optimal transport on graphs. In: Advances in Neural Information Processing Systems, vol. 33, 19, 327–19, 338. Curran Associates, Red Hook (2020)

  27. Dordovskyi, D., Dovgoshey, O., Petrov, E.: Diameter and diametrical pairs of points in ultrametric spaces. \(p\)-Adic Numbers Ultrametric Anal. Appl. 3(4), 253–262 (2011)

  28. Dudley, R.M.: Real Analysis and Probability. CRC Press, Boca Raton (2017)

    Google Scholar 

  29. Edwards, D.A.: The structure of superspace. In: Studies in Topology (Charlotte 1974), pp. 121–133. Academic Press, New York (1975)

  30. Evans, S.N.: Probability and Real Trees. Lectures from the 35th Summer School on Probability Theory (Saint-Flour 2005). Springer, Berlin (2008)

  31. Evans, S.N., Matsen, F.A.: The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74(3), 569–592 (2012)

    MathSciNet  MATH  Google Scholar 

  32. Fakcharoenphol, J., Rao, S., Talwar, K.: A tight bound on approximating arbitrary metrics by tree metrics. J. Comput. Syst. Sci. 69(3), 485–497 (2004)

    MathSciNet  MATH  Google Scholar 

  33. Folland, G.B.: Real Analysis: Modern Techniques and their Applications. 2nd edn. Pure and Applied Mathematics (New York). Wiley, New York (1999)

  34. Gellert, M., Hossain, M.F., Berens, F.J.F., Bruhn, L.W., Urbainsky, C., Liebscher, V., Lillig, C.H.: Substrate specificity of thioredoxins and glutaredoxins—towards a functional classification. Heliyon 5(12), e02943 (2019)

    Google Scholar 

  35. Givens, C.R., Shortt, R.M.: A class of Wasserstein metrics for probability distributions. Mich. Math. J. 31(2), 231–240 (1984)

    MathSciNet  MATH  Google Scholar 

  36. Greven, A., Pfaffelhuber, P., Winter, A.: Convergence in distribution of random metric measure spaces (\(\Lambda \)-coalescent measure trees). Probab. Theory Relat. Fields 145(1–2), 285–322 (2009)

    MathSciNet  MATH  Google Scholar 

  37. Grindstaff, G., Owen, M.: Representations of partial leaf sets in phylogenetic tree space. SIAM J. Appl. Algebra Geom. 3(4), 691–720 (2019)

    MathSciNet  MATH  Google Scholar 

  38. Gromov, M.: Groups of polynomial growth and expanding maps (with an appendix by Jacques Tits). Inst. Hautes Études Sci. Publ. Math. 53, 53–78 (1981)

    MATH  Google Scholar 

  39. Hein, J.: Reconstructing evolution of sequences subject to recombination using parsimony. Math. Biosci. 98(2), 185–200 (1990)

    MathSciNet  MATH  Google Scholar 

  40. Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233(1), 123–138 (1993)

    Google Scholar 

  41. Howes, N.R.: Modern Analysis and Topology. Springer, Berlin (2012)

    MATH  Google Scholar 

  42. Jain, A.K., Dorai, C.: 3D object recognition: representation and matching. Stat. Comput. 10(2), 167–182 (2000)

    Google Scholar 

  43. Jardine, N., Sibson, R.: Mathematical Taxonomy. Wiley Series in Probability and Mathematical Statistics, Wiley, London (1971)

    MATH  Google Scholar 

  44. Kantorovich, L.: On the translocation of masses. C. R. (Doklady) Acad. Sci. URSS (N.S.) 37, 199–201 (1942)

  45. Kantorovich, L.V., Rubinstein, G.S.: On a space of completely additive functions. Vestnik Leningrad. Univ. 13(7), 52–59 (1958) (in Russian)

  46. Kloeckner, B.R.: A geometric study of Wasserstein spaces: ultrametrics. Mathematika 61(1), 162–178 (2015)

    MathSciNet  MATH  Google Scholar 

  47. Kolmogorov, A.N., Fomin, S.V.: Elements of the Theory of Functions and Functional Analysis, vol. 1. Graylock Press, Rochester (1957)

    Google Scholar 

  48. Kolouri, S., Nadjahi, K., Simsekli, U., Badeau, R., Rohde, G.: Generalized sliced Wasserstein distances. In: Advances in Neural Information Processing Systems, vol. 32, pp. 261–272. Curran Associates, Red Hook (2019)

  49. Kufareva, I., Abagyan, R.: Methods of protein structure comparison. Methods Mol. Biol. 857, 231–257 (2012)

    Google Scholar 

  50. Kuo, H.-Y., Su, H.-R., Lai, S.-H., Wu, C.-C.: 3D object detection and pose estimation from depth image for robotic bin picking. In: 2014 IEEE International Conference on Automation Science and Engineering (New Taipei 2014), pp. 1264–1269. IEEE (2014)

  51. Lafond, M., El-Mabrouk, N., Huber, K.T., Moulton, V.: The complexity of comparing multiply-labelled trees by extending phylogenetic-tree metrics. Theoret. Comput. Sci. 760, 15–34 (2019)

    MathSciNet  MATH  Google Scholar 

  52. Lambert, A., Uribe Bravo, G.: The comb representation of compact ultrametric spaces. \(p\)-Adic Numbers Ultrametric Anal. Appl. 9(1), 22–38 (2017)

  53. Le, T., Ho, N., Yamada, M.: Computationally Efficient Tree Variants of Gromov–Wasserstein (2019). arXiv:1910.04462

  54. Le, T., Yamada, M., Fukumizu, K., Cuturi, M.: Tree-sliced variants of Wasserstein distances. In: 33rd Conference on Neural Information Processing Systems (Vancouver 2019), pp. 12304–12315. Curran Associates, Red Hook (2019)

  55. Liebscher, V.: New Gromov-inspired metrics on phylogenetic tree space. Bull. Math. Biol. 80(3), 493–518 (2018)

    MathSciNet  MATH  Google Scholar 

  56. Lowe, D.G.: Local feature view clustering for 3D object recognition. In: 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Kauai 2001), pp. I–I. IEEE (2001)

  57. Mallows, C.L.: A note on asymptotic joint normality. Ann. Math. Stat. 43, 508–515 (1972)

    MathSciNet  MATH  Google Scholar 

  58. McGregor, A., Stubbs, D.: Sketching Earth-Mover distance on graph metrics. In: Approximation, Randomization, and Combinatorial Optimization (Berkeley 2013). Lecture Notes in Computer Science, vol. 8096, pp. 274–286. Springer, Heidelberg (2013)

  59. Mémoli, F.: On the use of Gromov–Hausdorff distances for shape comparison. In: Eurographics Symposium on Point-Based Graphics (Prague 2007). The Eurographics Association (2007). https://doi.org/10.2312/SPBG/SPBG07/081-090

  60. Mémoli, F.: Gromov–Wasserstein distances and the metric approach to object matching. Found. Comput. Math. 11(4), 417–487 (2011)

    MathSciNet  MATH  Google Scholar 

  61. Mémoli, F., Needham, T.: Distance distributions and inverse problems for metric measure spaces. Stud. Appl. Math. 149(4), 943–1001 (2022)

    MathSciNet  Google Scholar 

  62. Mémoli, F., Sapiro, G.: Comparing point clouds. In: 2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing (Nice 2004), pp. 32–40. ACM, New York (2004)

  63. Mémoli, F., Smith, Z., Wan, Z.: The Gromov–Hausdorff distance between ultrametric spaces: its structure and computation. J. Comput. Geom. (to appear). arXiv:2110.03136

  64. Mémoli, F., Wan, Z.: On \(p\)-metric spaces and the \(p\)-Gromov–Hausdorff distance. \(p\)-Adic Numbers Ultrametric Anal. Appl. 14(3), 173–223 (2022)

  65. Mézard, M., Parisi, G., Virasoro, M.A.: Spin Glass Theory and Beyond. World Scientific Lecture Notes in Physics, vol. 9. World Scientific, Teaneck (1987)

  66. Morozov, D., Beketayev, K., Weber, G.H.: Interleaving distance between merge trees. TopoInVis’13. https://www.mrzv.org/publications/interleaving-distance-merge-trees/manuscript/

  67. Nies, T.G., Staudt, T., Munk, A.: Transport dependency: Optimal transport based dependency measures (2021). arXiv:2105.02073 (2021)

  68. Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. ACM Trans. Graph. 21(4), 807–832 (2002)

    MathSciNet  MATH  Google Scholar 

  69. Owen, M., Provan, J.S.: A fast algorithm for computing geodesic distances in tree space. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(1), 2–13 (2011)

    Google Scholar 

  70. Papazov, C., Haddadin, S., Parusel, S., Krieger, K., Burschka, D.: Rigid 3D geometry matching for grasping of known objects in cluttered scenes. Intern. J. Robotics Res. 31(4), 538–553 (2012)

    Google Scholar 

  71. Pardalos, P.M., Vavasis, S.A.: Quadratic programming with one negative eigenvalue is NP-hard. J. Global Optim. 1(1), 15–22 (1991)

    MathSciNet  MATH  Google Scholar 

  72. Peyré, G., Cuturi, M., Solomon, J.: Gromov–Wasserstein averaging of kernel and distance matrices. In: 33rd International Conference on Machine Learning (New York 2016), pp. 2664–2672. JMLR (2016)

  73. Qiu, D.: Geometry of non-Archimedean Gromov–Hausdorff distance. \(p\)-Adic Numbers Ultrametric Anal. Appl. 1(4), 317–337 (2009)

  74. Rammal, R., Toulouse, G., Virasoro, M.A.: Ultrametricity for physicists. Rev. Mod. Phys. 58(3), 765–788 (1986)

    MathSciNet  Google Scholar 

  75. Reeb, G.: Sur les points singuliers d’une forme de Pfaff complètement intégrable ou d’une fonction numérique. C. R. Acad. Sci. Paris 222, 847–849 (1946)

    MathSciNet  MATH  Google Scholar 

  76. Robinson, D.F.: Comparison of labeled trees with valency three. J. Comb. Theory Ser. B 11(2), 105–119 (1971)

    MathSciNet  Google Scholar 

  77. Robinson, D.F., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131–147 (1981)

    MathSciNet  MATH  Google Scholar 

  78. Rubner, Y., Tomasi, C., Guibas, L.J.: The Earth Mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)

    MATH  Google Scholar 

  79. Scetbon, M., Peyré, G., Cuturi, M.: Linear-time Gromov–Wasserstein distances using low rank couplings and costs. In: 39th International Conference on Machine Learning (Baltimore 2022), pp. 19,347–19,365. PMLR (2022)

  80. Schmiedl, F.: Computational aspects of the Gromov–Hausdorff distance and its application in non-rigid shape matching. Discrete Comput. Geom. 57(4), 854–880 (2017)

    MathSciNet  MATH  Google Scholar 

  81. Semmes, S.: An introduction to the geometry of ultrametric spaces (2007). arXiv:0711.0709

  82. Semple, C., Steel, M.: Phylogenetics. Oxford Lecture Series in Mathematics and Its Applications, vol. 24. Oxford University Press, New York (2003)

  83. Sturm, K.-T.: On the geometry of metric measure spaces. I. Acta Math. 196(1), 65–131 (2006)

    MathSciNet  MATH  Google Scholar 

  84. Sturm, K.T.: The space of spaces: Curvature bounds and gradient flows on the space of metric measure spaces (2012). arXiv:1208.0434

  85. Thorsley, D., Klavins, E.: Model reduction of stochastic processes using Wasserstein pseudometrics. In: 2008 American Control Conference (Seattle 2008), pp. 1374–1381. IEEE (2008)

  86. Titouan, V., Courty, N., Tavenard, R., Flamary, R.: Optimal transport for structured data with application on graphs. In: 36th International Conference on Machine Learning (Long Beach 2019), pp. 6275–6284. PMLR (2019)

  87. Touli, E.F., Wang, Y.: FPT-algorithms for computing Gromov–Hausdorff and interleaving distances between trees. In: 27th Annual European Symposium on Algorithms (Munich 2019). Leibniz Int. Proc. Inform., vol. 144, # 83. Leibniz-Zent. Inform., Wadern (2019)

  88. Vallender, S.S.: Calculation of the Wasserstein distance between probability distributions on the line. Theory Probab. Appl. 18(4), 784–786 (1974)

    Google Scholar 

  89. Vayer, T., Flamary, R., Tavenard, R., Chapel, L., Courty, N.: Sliced Gromov–Wasserstein (2019). arXiv:1905.10124

  90. Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics, vol. 58. American Mathematical Society, Providence (2003)

  91. Villani, C.: Optimal Transport: Old and New. Grundlehren der Mathematischen Wissenschaften, vol. 338. Springer, Berlin (2008)

  92. Wan, Z.: A novel construction of Urysohn universal ultrametric space via the Gromov–Hausdorff ultrametric. Topology Appl. 300, # 107759 (2021)

  93. Zarichnyi, I.: Gromov–Hausdorff ultrametric (2005). arXiv:math/0511437

Download references

Acknowledgements

F.M. and A.M. thank the Mathematisches Forschungsinstitut Oberwolfach. Conversations which eventually led to this project were initiated during the 2019 workshop “Statistical and Computational Aspects of Learning with Complex Structure”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Facundo Mémoli.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Editor in Charge: Kenneth Clarkson

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

F.M. and Z.W. acknowledge funding from the National Science Foundation under grants CCF 1740761, DMS 1723003, and RI 1901360. A.M. and C.W. gratefully acknowledge support by the DFG Research Training Group 2088, CRC 1456 project A04 and Cluster of Excellence MBExC 2067

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 1299 KB)

Appendices

Technical Details from Sect. 2

1.1 Proofs from Sect. 2

In this section we give the proofs of various results form Sect. 2.

1.1.1 Proof of Theorem 2.2

Recall that for a given \(\theta \in {\mathcal {D}}(X)\), we define \(u_\theta :X\hspace{1.111pt}{\times }\hspace{1.111pt}X\rightarrow {\mathbb {R}}_{\geqslant 0}\) as follows:

$$\begin{aligned}u_\theta (x,x'):= \inf \hspace{1.111pt}\bigl \{t\geqslant 0 \mid x \text { and }x'\text { belong to the same block of }\theta (t)\bigr \}. \end{aligned}$$

It is easy to verify that \(u_\theta \) is an ultrametric. For any Cauchy sequence \(\{x_n\}_{n\in {\mathbb {N}}}\) in \((X,u_\theta )\), let \(D_i:= \sup _{\,m,n\geqslant i}u_\theta (x_m,x_n)\) for each \(i\in {\mathbb {N}}\). Then, each \(D_i<\infty \) and \(\lim _{\,i\rightarrow \infty }D_i=0\). By definition of \(u_\theta \), for each \(i\in {\mathbb {N}} \) the set \(\{x_n\}_{n=i}^\infty \) is contained in the block \([x_i]_{D_i}\in \theta (D_i)\). Let \(X_i:= [x_i]_{D_i}\) for each \(i\in {\mathbb {N}} \). Then, obviously we have that \(X_j\subseteq X_i\) for any \(1\leqslant i<j\). By condition (vii) in Definition 2.1, we have that \(\bigcap _{\,i\in {\mathbb {N}}}X_i\ne \text{\O }\). Choose \(x_*\!\in \bigcap _{\,i\in {\mathbb {N}}}X_i\), then it is easy to verify that \(x_*\!=\lim _{\,n\rightarrow \infty }x_n\) and thus \((X,u_\theta )\) is a complete space. To prove that \((X,u_\theta )\) is a compact space, we need to verify that for each \(t>0\), \(X_t\) is a finite space (cf. Lemma A.7). Since \(\theta (t)\) is finite by condition (vi) in Definition 2.1, we have that \(X_t=\{[x]_t\,{|}\,x\in X\}=\theta (t)\) is finite and thus X is compact. Therefore, we have proved that \(u_\theta \!\in {\mathcal {U}}(X)\). Based on this, the map \(\Upsilon _X:{\mathcal {D}}(X)\rightarrow {\mathcal {U}}(X)\) defined by \(\theta \mapsto u_\theta \) is well defined.

Now given \(u\in {\mathcal {U}}(X)\), we define a map \(\theta _u:[0,\infty )\rightarrow \textbf{Part}\hspace{0.55542pt}(X)\) as follows: for each \(t\geqslant 0\), consider the equivalence relation \(\sim _t\) with respect to u, i.e., \(x\sim _t x'\) iff \(u(x,x')\leqslant t\). This is actually the same equivalence relation defined in Sect. 2.2 for introducing quotient ultrametric spaces. We then let \(\theta _u(t)\) to be the partition induced by \(\sim _t\), i.e., \(\theta _u(t)=X_t\). It is not hard to show that \(\theta _u\) satisfies conditions (i)–(v) in Definition 2.1. Since X is compact, then \(\theta _u(t)=X_t\) is finite for each \(t>0\) and thus \(\theta _u\) satisfies condition (vi) in Definition 2.1. Now, let \(\{t_n\}_{n\in {\mathbb {N}}}\) be a decreasing sequence such that \(\lim _{\,n\rightarrow \infty }t_n=0\) and let \(X_n\in \theta _X(t_n)\) be such that for any \(1\leqslant n<m\), \(X_m\subseteq X_n\). Since each \(X_n=[x_n]_{t_n}\) for some \(x_n\in X\), \(X_n\) is a compact subset of X. Since X is also complete, we have that \(\bigcap _{\,n\in {\mathbb {N}}}X_n\ne \text{\O }\). Therefore, \(\theta _u\) satisfies condition (vii) in Definition 2.1 and thus \(\theta _u\in {\mathcal {D}}(X)\). Then, we define the map \(\Delta _X:{\mathcal {U}}(X)\rightarrow {\mathcal {D}}(X)\) by \(u\mapsto \theta _u\).

It is easy to check that \(\Delta _X\) is the inverse of \(\Upsilon _X\) and thus we have established that \(\Upsilon _X:{\mathcal {D}}(X)\rightarrow {\mathcal {U}}(X)\) is bijective.

1.1.2 Proof of Lemma 2.8

First of all, we prove that the following supremum is attained to verify that the right-hand side of (12) is well defined

$$\begin{aligned}\sup _{\begin{array}{c} B\in V(X)\backslash \{X\}\\ \alpha (B)\ne \beta (B) \end{array}}\!\!{\textrm{diam}}\hspace{0.55542pt}(B^*) .\end{aligned}$$

Fix any \(B_0\in V(X)\backslash \{X\}\) such that \(\alpha (B_0)\ne \beta (B_0)\). Then, it is obvious that \({\textrm{diam}}\hspace{0.55542pt}(B^*_0) >0\). By Lemma A.7, \(X_{{\textrm{diam}}\hspace{0.55542pt}(B^*_0) }\) is finite. So there are only finitely many \(B\in V(X)\backslash \{X\}\) such that \({\textrm{diam}}\hspace{0.55542pt}(B) \geqslant {\textrm{diam}}\hspace{0.55542pt}(B^*_0) \) and thus \({\textrm{diam}}\hspace{0.55542pt}(B^*) \geqslant {\textrm{diam}}\hspace{0.55542pt}(B^*_0) \). This implies that the supremum above is attained and thus

$$\begin{aligned} \sup _{\begin{array}{c} B\in V(X)\backslash \{X\}\\ \alpha (B)\ne \beta (B) \end{array}}\!\!{\textrm{diam}}\hspace{0.55542pt}(B^*) \, =\!\max _{\begin{array}{c} B\in V(X)\backslash \{X\}\\ \alpha (B)\ne \beta (B) \end{array}}\!\!{\textrm{diam}}\hspace{0.55542pt}(B^*) . \end{aligned}$$
(22)

Let \(B_1\) denote the maximizer in (22) and let \(\delta := {\textrm{diam}}\hspace{0.55542pt}(B_1^*) \). It is easy to see that for any \(x\in X\), \(\alpha ([x]_\delta )=\beta ([x]_\delta )\).

By Strassen’s theorem (see for example [28, Thm. 11.6.2]),

$$\begin{aligned} d_{\textrm{W},\infty }(\alpha ,\beta )=\inf \hspace{1.111pt}\{r\geqslant 0\mid \text {for any closed subset }A\subseteq X,\,\alpha (A)\leqslant \beta (A^r)\bigr \}, \end{aligned}$$
(23)

where \(A^r:= \{x\in X\,{|}\,u_X(x,A)\leqslant r\}\).

Since \(\alpha (B_1)\ne \beta (B_1)\), we assume without loss of generality that \(\alpha (B_1)>\beta (B_1)\). By definition of \(B_1^*\), it is obvious that \((B_1)^\delta =B_1^*\) (recall: \(\delta := {\textrm{diam}}\hspace{0.55542pt}(B_1^*) \)) and \((B_1)^r=B_1\) for all \(0\leqslant r<\delta \). Therefore, \(\alpha (B_1)\leqslant \beta ((B_1)^r)\) only when \(r\geqslant \delta \). By (23), this implies that \(d_{\textrm{W},\infty }(\alpha ,\beta )\geqslant \delta \). Conversely, for any closed set A, we have that \(A^\delta =\bigcup _{x\in A}[x]_\delta \). For two closed balls in ultrametric spaces, either one includes the other or they have no intersection. Therefore, there exists a subset \(S\subseteq A\) such that \([x]_\delta \cap [x']_\delta =\text{\O }\) for all \(x,x'\!\in S\) and \(x\ne x'\), and that \(A^\delta =\bigsqcup _{\,x\in S}[x]_\delta \). Then, \(\alpha (A)\leqslant \alpha (A^\delta )=\sum _{x\in S}\alpha ([x]_\delta )=\sum _{x\in S}\beta ([x]_\delta )=\beta (A^\delta )\). Hence, \(d_{\textrm{W},\infty }(\alpha ,\beta )\leqslant \delta \) and thus we conclude the proof.

1.2 Technical Details from Sect. 2

In this section, we address various technical issues from Sect. 2.

1.2.1 Synchronized Rooted Trees

A synchronized rooted tree, is a combinatorial tree \(T=(V,E)\) with a root \(o\in V\) and a height function \(h:V\rightarrow [0,\infty )\) such that \(h^{-1}(0)\) coincides with the leaf set and \(h(v)< h(v^*)\) for each \(v\in V\backslash \{o\}\), where \(v^*\) is the parent of v. Similarly as in Theorem 2.2 that there exists a correspondence between ultrametric spaces and dendrograms, an ultrametric space X uniquely determines a synchronized rooted tree \(T_X\) [46].

Given \((X,{u_{X}})\in {\mathcal {U}}\), recall from Sect. 2.3 that \(V(X):= \bigcup _{t>0}\theta _X(t)\) and that for each \(B\in V(X)\backslash \{X\}\), \(B^*\) denotes the smallest element in V(X) containing B. The existence of \(B^*\) is guaranteed by the following lemma:

Lemma A.1

Let \(X\in {\mathcal {U}}\). For each \(B\in V(X)\) such that \(B\ne X\), there exists \(B^*\!\in V(X)\) such that \(B^*\!\ne B\) and \(B^*\!\subseteq B'\) for all \(B'\!\in V(X)\) with \(B\subsetneqq B'\).

Proof

Let \(\delta := {\textrm{diam}}\hspace{0.55542pt}(B) \). Let \(x\in B\), then \(B=[x]_\delta \). By Lemma A.7, \(X_\delta \) is a finite set. Consider \(\delta ^*\!:= \min \hspace{0.88882pt}\{u_{X_\delta }([x]_\delta ,[x']_\delta )\,{|}\,[x']_\delta \ne [x]_\delta \}\). Let \(B^*\!:= [x]_{\delta ^*}\), then \(B^*\) is the smallest element in V(X) containing B under inclusion. Indeed, \(B^*\!\ne B\) and if \(B\subseteq B'\) for some \(B'\!\in V(X)\), then \(B'\!=[x]_r\) for some \(r> \delta \). It is easy to see that for all \(\delta<r<\delta ^*\), \([x]_r=[x]_\delta \). Therefore, if \(B'\!\ne B\), we must have that \(r\geqslant \delta ^*\) and thus \(B^*\!=[x]_{\delta ^*}\subseteq [x]_r=B'\).\(\square \)

Now, we construct the synchronized rooted tree \(T_X\) corresponding to X via the proper dendrogram \(\theta _X\) associated with \({u_{X}}\). We first define a combinatorial tree \(T_X=(V_X,E_X)\) as follows: we let \(V_X:= V(X)\); for any distinct \(B,B'\in V_X\), we let \((B,B')\in E_X\) iff either \(B=(B')^*\) or \(B'\!=B^*\). We choose \(X\in V_X\) to be the root of \(T_X\), then any \(B\ne X\) in \(V_X\) has a unique parent \(B^*\). We define \(h_X:V_X\rightarrow [0,\infty )\) such that \(h_X(B):= {{\textrm{diam}}\hspace{0.55542pt}(B) }/{2}\) for any \(B\in V_X\). Now, \(T_X\) endowed with the root X and the height function \(h_X\) is a synchronized rooted tree. It is easy to see that X can be isometrically identified with \(h_X^{-1}(0)\) of the so-called metric completion of \(T_X\) (see [46, Sect. 2.3] for details). With this construction Lemma 2.7 follows directly from [46, Lem. 3.1].

1.3 \(d^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )}_{\textrm{W},p}\) Between Compactly Supported Measures

Next, we demonstrate that Theorem 2.9 extends naturally to the case of compactly supported probability measures in \(({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\). For this purpose, it is important to note that compact subsets of \(({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\) have a very particular structure as shown by the next lemma.

Lemma A.2

Let \(X\subseteq \mathbb ({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\). X is a compact subset iff X is either a finite set or a countable set containing 0 and with 0 being the unique cluster point (w.r.t. the usual Euclidean distance \(\Lambda _1\)).

Proof

If X is finite, then obviously X is compact. Assume that X is a countable set with 0 being the unique cluster point (w.r.t. the usual Euclidean distance \(\Lambda _1\)). If \(\{x_n\}_{n\in {\mathbb {N}}}\subseteq X\) is a Cauchy sequence with respect to \(\Lambda _\infty \), then either \(x_n\) is a constant when n is large or \(\lim _{n\rightarrow \infty }x_n=0\). In either case, the limit of \(\{x_n\}_{n\in {\mathbb {N}}}\) belongs to X and thus X is complete. Now for any \(\varepsilon >0\), by Lemma A.7, \(X_\varepsilon \) is a finite set. Denote \(X_\varepsilon =\{[x_1]_\varepsilon ,\ldots ,[x_n]_\varepsilon \}\). Then, \(\{x_1,\ldots ,x_n\}\) is a finite \(\varepsilon \)-net of X. Therefore, X is totally bounded and thus X is compact.

Now, assume that X is compact. Then, for any \(\varepsilon >0\), \(X_\varepsilon \) is a finite set. Suppose \(X_\varepsilon =\{[x_1]_\varepsilon ,\ldots ,[x_n]_\varepsilon \}\) where \(0\leqslant x_1<x_2<\cdots <x_n\). Further, we have that \(\Lambda _\infty (x_i,x_j)=x_j\) whenever \(1\leqslant i<j\leqslant n\). This implies that

  1. (i)

    \(x_i>\varepsilon \) for all \(2\leqslant i\leqslant n\);

  2. (ii)

    \([x_i]_\varepsilon =\{x_i\}\) for all \(2\leqslant i\leqslant n\).

Therefore, \(X\cap (\varepsilon ,\infty )=\{x_2,\ldots ,x_n\}\) is a finite set. Since \(\varepsilon >0\) is arbitrary, X is at most countable and has no cluster point (w.r.t. the Euclidean distance \(\Lambda _1\)) other than 0. If X is countable, then 0 must be a cluster point and by compactness of X, we have that \(0\in X\). \(\square \)

Based on the special structure of compact subsets of \(({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\), we derive the following extension of Theorem 2.9.

Theorem A.3

(\(d^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )}_{\textrm{W},p}\) between compactly supported measures) Let \(X:= \{0\}\cup \{x_i\hspace{0.55542pt}{|}\,i\in {\mathbb {N}}\}\subseteq {\mathbb {R}}_{\geqslant 0}\) such that \(0<\ldots< x_n<x_{n-1}<\ldots <x_1\) and 0 is the only cluster point w.r.t. the usual Euclidean distance. Let \(\alpha ,\beta \in {\mathcal {P}}(X)\). Let \(\alpha _i:= \alpha (\{x_i\})\) for \(i\in {\mathbb {N}}\) and \(\alpha _0:= \alpha (\{0\})\). Similarly, let \(\beta _i:= \beta (\{x_i\})\) and \(\beta _0:= \beta (\{0\})\). Then for \(p\in [1,\infty )\),

Let \(F_\alpha \) and \(F_\beta \) be the cumulative distribution functions of \(\alpha \) and \(\beta \), respectively. Then,

$$\begin{aligned}d_{\textrm{W},\infty }^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )}(\alpha ,\beta )=\max \hspace{1.111pt}\Bigl (\max _{\begin{array}{c} 2\leqslant i<\infty \\ F_\alpha (x_i)\ne F_\beta (x_i) \end{array}}\!\!x_{i-1},\max _{\begin{array}{c} 1\leqslant i<\infty \\ \alpha _i\ne \beta _i \end{array}}\!\!x_i\Bigr ).\end{aligned}$$

Proof

Note that \(V(X)=\{\{0\}\cup \{x_j\,{|}\,j\geqslant i\}\,{|}\,i\in {\mathbb {N}}\}\cup \{\{x_i\}\,{|}\,i\in {\mathbb {N}}\}\) (recall that each set corresponds to a closed ball). Thus, we conclude by applying Lemmas 2.7 and  2.8. \(\square \)

1.3.1 Closed-Form Solution for \(d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _q)}\)

In this section, we will derive the subsequent theorem.

Theorem A.4

Given \(1\leqslant p,q <\infty \) and two compactly supported probability measures \(\alpha \) and \(\beta \) on \({\mathbb {R}}_{\geqslant 0}\), we have that

$$\begin{aligned}d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _q)}(\alpha ,\beta )\leqslant \biggl (\int _0^1\Lambda _q(F_\alpha ^{-1}(t),F_\beta ^{-1}(t))^p\hspace{0.55542pt}dt\biggr )^{\!1/p}.\end{aligned}$$

When \(q\leqslant p\), the equality holds whereas when \(q>p\), the equality does not hold in general.

One important ingredient for the proof is the following direct adaptation of [67, Lem. 1].

Lemma A.5

Let XY be two Polish metric spaces and let \(f:X\rightarrow {\mathbb {R}}\) and \(g:Y\rightarrow {\mathbb {R}}\) be measurable maps. Denote by \(f\hspace{1.111pt}{\times }\hspace{1.111pt}g:X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\rightarrow {\mathbb {R}}^2\) the map \((x,y)\mapsto (f(x),g(y))\). Then, for any \({\mu _{Y}}\in {\mathcal {P}}(X)\) and \({\mu _{Y}}\in {\mathcal {P}}(Y)\)

$$\begin{aligned}(f\hspace{1.111pt}{\times }\hspace{1.111pt}g)_\#\,{\mathcal {C}}({\mu _{X}},{\mu _{Y}})={\mathcal {C}}(f_\#\,{\mu _{Y}},g_\#\,{\mu _{Y}}). \end{aligned}$$

Based on Lemma A.5, we show the following auxiliary result.

Lemma A.6

Let \(1\leqslant q\leqslant p<\infty \). Assume that \(\alpha \) and \(\beta \) are compactly supported probability measures on \({\mathbb {R}}_{\geqslant 0}\). Then,

$$\begin{aligned}\bigl (d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _q)}(\alpha ,\beta )\bigr )^p =\bigl (d_{\textrm{W},{p}/{q}}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _1)}((S_q)_\#\,\alpha ,(S_q)_\#\,\beta )\bigr )^{p/q}, \end{aligned}$$

where \(S_q:{\mathbb {R}}_{\geqslant 0}\rightarrow {\mathbb {R}}_{\geqslant 0}\) taking x to \(x^q\) is the q-snowflake transform defined in Sect. 3.3.

Proof

Since \({p}/{q}\geqslant 1\) and by Lemma A.5 we have that

$$\begin{aligned} \bigl (d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _q)}(\alpha ,\beta )\bigr )^p&= \inf _{\mu \in {\mathcal {C}}(\alpha ,\beta )}\int _{{\mathbb {R}}_{\geqslant 0}\times {\mathbb {R}}_{\geqslant 0}}(\Lambda _q(x,y))^p\hspace{1.111pt}\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\nonumber \\&=\inf _{\mu \in {\mathcal {C}}(\alpha ,\beta )} \int _{{\mathbb {R}}_{\geqslant 0}\times {\mathbb {R}}_{\geqslant 0}}\!\!|S_q(x)-S_q(y)|^{p/q}\hspace{1.111pt}\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\nonumber \\ {}&=\inf _{\mu \in {\mathcal {C}}(\alpha ,\beta )} \int _{{\mathbb {R}}_{\geqslant 0}\times {\mathbb {R}}_{\geqslant 0}}\!\!|s-t|^{p/q}\hspace{0.55542pt}(S_q\hspace{0.55542pt}{\times }\hspace{1.111pt}S_q)_\#\,\mu (ds\hspace{1.111pt}{\times }\hspace{1.111pt}dt)\nonumber \\&=\bigl (d_{\textrm{W},{p}/{q}}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _1)}((S_q)_\#\,\alpha ,(S_q)_\#\,\beta )\bigr )^{\!p/q}. \end{aligned}$$

\(\square \)

With Lemma A.6 at our disposal, we can demonstrate Theorem A.4.

Proof of Theorem A.4

We first note that

$$\begin{aligned} d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _q)}(\alpha ,\beta )=\inf _{(\xi ,\eta )}({\mathbb {E}}(\Lambda _q(\xi ,\eta )^p))^{1/p}, \end{aligned}$$

where \(\xi \) and \(\eta \) are two random variables with marginal distributions \(\alpha \) and \(\beta \), respectively. Moreover, let \(\zeta \) be the random variable uniformly distributed on [0, 1], then \(F_\alpha ^{-1}(\zeta )\) has distribution function \(F_\alpha \) and \(F_\beta ^{-1}(\zeta )\) has distribution function \(F_\beta \) (see for example [88]). Let \(\xi =F_\alpha ^{-1}(\zeta )\) and \(\eta =F_\beta ^{-1}(\zeta )\), then we have

$$\begin{aligned} d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _q)}(\alpha ,\beta )\leqslant ({\mathbb {E}}(\Lambda _q(\xi ,\eta )^p))^{1/p} =\biggl (\int _0^1\Lambda _q(F_\alpha ^{-1}(t),F_\beta ^{-1}(t))^p\hspace{1.111pt}dt\biggr )^{\!1/p}. \end{aligned}$$

Next, we assume that \(q\leqslant p\). By Lemma A.6, we have that

$$\begin{aligned} \bigl (d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _q)}(\alpha ,\beta )\bigr )^p =\bigl (d_{\textrm{W},{p/q}}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _1)}((S_q)_\#\,\alpha ,(S_q)_\#\,\beta )\bigr )^{p/q}. \end{aligned}$$

Then,

$$\begin{aligned}\bigl (d_{\textrm{W},{p/q}}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _1)}((S_q)_\#\,\alpha ,(S_q)_\#\,\beta )\bigr )^{p/q}=\int _0^1|F_{\alpha ,q}^{-1}(t)-F_{\beta ,q}^{-1}(t)|^{p/q}\hspace{1.111pt}dt, \end{aligned}$$

where \(F_{\alpha ,q}\) and \(F_{\beta ,q}\) are distribution functions of \((S_q)_\#\,\alpha \) and \((S_q)_\#\,\beta \), respectively. It is easy to verify that \(F_{\alpha ,q}(t)=(F_\alpha ^{-1}(t))^q\) and \(F_{\beta ,q}(t)=(F_\beta ^{-1}(t))^q\). Therefore,

$$\begin{aligned}d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _q)}(\alpha ,\beta )= \biggl (\int _0^1\Lambda _q(F_\alpha ^{-1}(t),F_\beta ^{-1}(t))^p\hspace{0.55542pt}dt\biggr )^{\!1/p}.\end{aligned}$$

Finally, we demonstrate that for \(q>p\) the equality does not hold in general. We first consider the extreme case \(p=1\) and \(q=\infty \) (though we require \(q<\infty \) in the assumptions of the theorem, we relax this for now). Let \(\alpha _0= \delta _1/2+ \delta _2/2\) and \(\beta _0 = \delta _2/2+\delta _3/2\) where \(\delta _x\) means the Dirac measure at point \(x\in {\mathbb {R}}_{\geqslant 0}\). Then, we have that

$$\begin{aligned}d_{\textrm{W},1}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )}(\alpha _0,\beta _0)=\frac{3}{2}<\frac{5}{2}=\int _0^1\Lambda _\infty (F_\alpha ^{-1}(t),F_\beta ^{-1}(t))\,dt. \end{aligned}$$

It is not hard to see that both \(d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _q)}(\alpha _0,\beta _0)\) and

$$\begin{aligned} \biggl (\int _0^1\Lambda _q(F_\alpha ^{-1}(t),F_\beta ^{-1}(t))^p\hspace{1.111pt}dt\biggr )^{\!1/p} \end{aligned}$$

are continuous with respect to \(p\in [1,\infty )\) and \(q\in [1,\infty ]\). Then, for p close to 1 and \(q<\infty \) large enough, and in particular, \(p<q\), we have that

$$\begin{aligned} d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _q)}(\alpha _0,\beta _0)< \biggl (\int _0^1\Lambda _q(F_\alpha ^{-1}(t),F_\beta ^{-1}(t))^p\hspace{1.111pt}dt\biggr )^{\!1/p}. \end{aligned}$$

\(\square \)

1.3.2 Miscellaneous

In the remainder of this section, we collect several technical results that find implicit or explicit usage throughout Sect. 2.

Lemma A.7

A complete ultrametric space X is compact iff for any \(t>0\), \(X_t\) is finite.

Proof

Wan [92, Lem. 2.3] proves that whenever X is compact, \(X_t\) is finite for any \(t>0\).

Conversely, we assume that \(X_t\) is finite for any \(t>0\). We only need to prove that X is totally bounded. For any \(\varepsilon >0\), \(X_\varepsilon \) is a finite set and thus there exist \(x_1,\ldots ,x_n\in X\) such that \(X_\varepsilon =\{[x_1]_\varepsilon ,\ldots ,[x_n]_\varepsilon \}\). Now, for any \(x\in X\), there exists \(x_i\) for some \(i=1,\ldots ,n\) such that \(x\in [x_i]_\varepsilon \). This implies that \(u_X(x,x_i)\leqslant \varepsilon \). Therefore, the set \(\{x_1,\ldots ,x_n\}\subseteq X\) is an \(\varepsilon \)-net of X. Then, X is totally bounded and thus compact.\(\square \)

Lemma A.8

V(X) is the collection of all closed balls in X except for singletons \(\{x\}\) such that x is a cluster point in X.

Proof

Given any \(t>0\) and \(x\in X\), \([x]_t=B_t(x)=\{x'\!\in X\,{|}\, u_X(x,x')\leqslant t\}\). Therefore, V(X) is a collection of closed balls in X. On the contrary, any closed ball \(B_t(x)\) with positive radius \(t>0\) coincides with \([x]_t\in \theta _X(t)\) and thus belongs to V(X). Now, for any singleton \(\{x\}=B_0(x)\), if x is not a cluster point, then there exists \(t>0\) such that \(B_t(x)=\{x\}\) which implies that \(\{x\}\in V(X)\). If x is a cluster point, then for any \(t>0\), \(\{x\}\subsetneqq B_t(x)=[x]_t\). This implies that \(\{x\}\ne [x]_t\) for all \(t>0\) and thus \(\{x\}\notin V(X)\). This concludes the proof.\(\square \)

Technical Details from Sect. 3

1.1 Proofs from Sect. 3.1

Next, we give the missing proofs of the results stated in Sect. 3.1.

1.1.1 Proof of Proposition 3.3

Part 1. This directly follows from the definitions of \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\) and \(d_{\textrm{GW},p}^{\mathrm{\,sturm}}\) (see (8) and (4)).

Part 2. This simply follows from Jensen’s inequality.

Part 3. By Part 2, \(\{u_{\textrm{GW},n}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\}_{n\in {\mathbb {N}}}\) is an increasing sequence with a finite upper bound \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\). Therefore, \(L:= \lim _{\,n\rightarrow \infty }u_{\textrm{GW},n}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\) exists and \(L\leqslant u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\).

Next, we come to the opposite inequality. By Proposition B.1, there exist \(u_n\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(\mu _n\in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) such that

$$\begin{aligned}\biggl (\int _{X\times Y}(u_n(x,y))^n\mu _n(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\biggr )^{\!1/n}\!\!= \,u_{\textrm{GW},n}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}}). \end{aligned}$$

By Lemmas B.19 and  B.21, the sequence \(\{u_n\}_{n\in {\mathbb {N}}}\) uniformly converges to some \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(\{\mu _n\}_{n\in {\mathbb {N}}}\) weakly converges to some \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) (after taking appropriate subsequences of both sequences). Let

$$\begin{aligned} M:= \!\sup _{(x,y)\in {{\textrm{supp}}\hspace{0.55542pt}(\mu ) }}\!\!u(x,y). \end{aligned}$$

Let \(\varepsilon >0\) and let \(U=\{(x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\,{|}\,u(x,y)> M-\varepsilon \}\). Then, \(\mu (U)>0\). Since U is open, it follows that there exists a small \(\varepsilon _1>0\) such that \(\mu _n(U)>\mu (U)-\varepsilon _1>0\) for all n large enough (see e.g. [7, Thm. 2.1]). Moreover, by uniform convergence of the sequence \(\{u_n\}_{n\in {\mathbb {N}}}\), we have \(|u(x,y)-u_n(x,y)|\leqslant \varepsilon \) for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) when n is large enough. Therefore, we obtain for n large enough

$$\begin{aligned} \biggl (\int _{X\times Y}(u_n(x,y))^n\mu _n(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\biggr )^{\!1/n}&\geqslant (\mu _n(U))^{1/n}(M-2\varepsilon )\\ {}&\geqslant (\mu (U)-\varepsilon _1)^{1/n}(M-2\varepsilon ). \end{aligned}$$

Letting \(n\rightarrow \infty \), we obtain \(L\geqslant M-2\varepsilon \). Since \(\varepsilon >0\) is arbitrary, \(L\geqslant M\geqslant u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\).

1.1.2 Proof of Theorem 3.4

In this section, we devote to prove Theorem 3.4. To this end, we will first verify the existence of optimal metrics and optimal couplings in (15).

Proposition B.1

(Existence of optimal couplings) Let \({\mathcal {X}}={(X,{u_{X}},{\mu _{X}}) }\) and \({\mathcal {Y}}={(Y,{u_{Y}},{\mu _{Y}}) }\) be compact ultrametric measure spaces. Then, there always exist \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) such that for \(1\leqslant p\leqslant \infty \),

$$\begin{aligned}u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})=\Vert u\Vert _{L^p(\mu )}.\end{aligned}$$

Proof

The following proof is a suitable adaptation from proof of [83, Lem. 3.3]. We will only prove the claim for the case \(p<\infty \) since the case \(p=\infty \) can be shown in a similar manner. Let \(u_n\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(\mu _n\in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) be such that

$$\begin{aligned}\biggl (\int _{X\times Y}(u_n(x,y))^p\mu _n(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\biggr )^{\!1/p}\!\!\leqslant \, u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})+\frac{1}{n}\hspace{0.55542pt}. \end{aligned}$$

By Lemma B.19, \(\{\mu _n\}_{n\in {\mathbb {N}}}\) weakly converges (after taking an appropriate subsequence) to some \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\). By Lemma B.21, \(\{u_n\}_{n\in {\mathbb {N}}}\) uniformly converges (after taking an appropriate subsequence) to some \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\). Then, it is easy to verify that

$$\begin{aligned} \biggl (\int _{X\times Y}(u(x,y))^p\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\biggr )^{\!1/p}\!\!\leqslant \, u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}}). \end{aligned}$$

\(\square \)

As a direct consequence of the proposition, we get the subsequent result.

Corollary B.2

Fix \(1\leqslant p\leqslant \infty \). Let \({\mathcal {X}}={(X,{u_{X}},{\mu _{X}}) }\) and \({\mathcal {Y}}={(Y,{u_{Y}},{\mu _{Y}}) }\) be compact ultrametric measure spaces. Then, there exist a compact ultrametric space Z and isometric embeddings \(\phi :X\hookrightarrow Z\) and \(\psi :Y\hookrightarrow Z\) such that

$$\begin{aligned}u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})=d_{\textrm{W},p}^Z(\phi _\#\,{\mu _{X}},\psi _\#\,{\mu _{Y}}).\end{aligned}$$

Before we come to the proof of Theorem 3.4, it remains to establish another auxiliary result. We ensure that the Wasserstein pseudometric of order p on a compact pseudo-ultrametric space \((X,u_X)\) is for \(p\in [1,\infty )\) a p-pseudometric and for \(p=\infty \) a pseudo-ultrametric, i.e., we prove for \(1\leqslant p<\infty \) that for all \(\alpha _1,\alpha _2,\alpha _3\in {\mathcal {P}}(X)\),

$$\begin{aligned} d_{\textrm{W},p}^{\,(X,u_X)}(\mu _1,\mu _3)\leqslant \Bigl (\bigl (d_{\textrm{W},p}^{\,(X,u_X)}(\mu _1,\mu _2)\bigr )^p+\bigl (d_{\textrm{W},p}^{\,(X,u_X)}(\mu _2,\mu _3)\bigr )^p \Bigr )^{1/p}\end{aligned}$$

and for \(p=\infty \) that for all \(\alpha _1,\alpha _2,\alpha _3\in {\mathcal {P}}(X)\)

$$\begin{aligned}d_{\textrm{W},p}^{\,(X,u_X)}(\mu _1,\mu _3)\leqslant \max \hspace{1.111pt}\bigl (d_{\textrm{W},p}^{\,(X,u_X)}(\mu _1,\mu _2),d_{\textrm{W},p}^{\,(X,u_X)}(\mu _2,\mu _3) \bigr ).\end{aligned}$$

Lemma B.3

Let \((X,{u_{X}})\) be a compact pseudo-ultrametric space. Then, for \(1\leqslant p\leqslant \infty \) the p-Wasserstein metric \(d_{\textrm{W},p}^{\,(X,{u_{X}})}\) is a p-pseudometric on \({\mathcal {P}}(X)\). In particular, when \(p=\infty \), it is a pseudo-ultrametric on \({\mathcal {P}}(X)\).

Proof

We prove the statement by adapting the proof of the triangle inequality for the p-Wasserstein distance (see e.g., [90, Thm. 7.3]). We only prove the case when \(p<\infty \) whereas the case \(p=\infty \) follows by analogous arguments.

Let \(\alpha _1,\alpha _2,\alpha _3\in {\mathcal {P}}(X)\), denote by \(\mu _{12}\) an optimal transport plan between \(\alpha _1\) and \(\alpha _2\) and by \(\mu _{23}\) an optimal transport plan between \(\alpha _2\) and \(\alpha _3\) (see [91, Thm. 4.1] for the existence of \(\mu _{12}\) and \(\mu _{23}\)). Furthermore, let \(X_i\) be the support of \(\alpha _i\), \(1\leqslant i \leqslant 3\). Then, by the Gluing Lemma [90, Lem. 7.6] there exists a measure \(\mu \in {\mathcal {P}}(X_1\hspace{0.55542pt}{\times }\hspace{1.111pt}X_2\hspace{0.55542pt}{\times }\hspace{1.111pt}X_3)\) with marginals \(\mu _{12}\) on \(X_1\hspace{0.55542pt}{\times }\hspace{1.111pt}X_2\) and \(\mu _{23}\) on \(X_2\hspace{0.55542pt}{\times }\hspace{1.111pt}X_3\). Clearly, we obtain

$$\begin{aligned} \bigl (d_{\textrm{W},p}^{\,(X,{u_{X}})}(\alpha _1,\alpha _3)\bigr )^p&\leqslant \int _{X_1\times X_2\times X_3}\! {u_{X}}^p(x,z)\,\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy\hspace{1.111pt}{\times }\hspace{1.111pt}dz)\\ {}&\leqslant \int _{X_1\times X_2\times X_3}\!\bigl ( {u_{X}}^p(x,y)+{u_{X}}^p(y,z)\bigr )\,\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy\hspace{1.111pt}{\times }\hspace{1.111pt}dz). \end{aligned}$$

Here, we used that \({u_{X}}\) is an ultrametric, i.e., in particular a p-metric [64, Prop. 2.11]. With this we obtain that

$$\begin{aligned} \bigl (d_{\textrm{W},p}^{\,(X,{u_{X}})}(\alpha _1,\alpha _2)\bigr )^p&\leqslant \int _{X_1\times X_2}\! {u_{X}}^p(x,y)\hspace{1.111pt}\mu _{12}\hspace{1.111pt}(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\nonumber \\&\quad + \int _{X_2\times X_3}\! {u_{X}}^p(y,z)\hspace{1.111pt}\mu _{23}\hspace{1.111pt}(dy\hspace{1.111pt}{\times }\hspace{1.111pt}dz)\nonumber \\&= \bigl (d_{\textrm{W},p}^{\,(X,{u_{X}})}(\alpha _1,\alpha _2)\bigr )^p +\bigl (d_{\textrm{W},p}^{\,(X,{u_{X}})}(\alpha _2,\alpha _3)\bigr )^p. \end{aligned}$$

\(\square \)

With Proposition B.1 and Lemma B.3 at our disposal we are now ready to prove Theorem 3.4 which states that \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\) is indeed a p-metric on \({\mathcal {U}}^{\textrm{w}}\).

Proof of Theorem 3.4

It is clear that \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\) is symmetric and that \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}}) =0\) if \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\). Furthermore, we remark that \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\geqslant d_{\textrm{GW},p}^{\,\textrm{sturm}}({\mathcal {X}},{\mathcal {Y}})\) by Proposition 3.3. Since \(d_{\textrm{GW},p}^{\,\textrm{sturm}}({\mathcal {X}},{\mathcal {Y}})=0\) implies that \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\) ([84]), we have that \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})=0\) implies that \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\). It remains to verify the p-triangle inequality. To this end, we only prove the case when \(p<\infty \) whereas the case \(p=\infty \) follows by analogous arguments.

Let \({\mathcal {X}},{\mathcal {Y}},{\mathcal {Z}}\in {\mathcal {U}}^{\textrm{w}}\). Suppose \(u_{XY}\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(u_{YZ}\in {\mathcal {D}}^{\textrm{ult}}({u_{Y}},{u_{Z}})\) are optimal metric couplings such that

$$\begin{aligned} \bigl (u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\bigr )^p&=\bigl (d_{\textrm{W},p}^{\,(X\sqcup Y,u_{XY})}({\mu _{X}},{\mu _{Y}})\bigr )^p, \\ \bigl (u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {Y}},{\mathcal {Z}})\bigr )^p&=\bigl (d_{\textrm{W},p}^{\,(Y\sqcup Z,u_{YZ})}({\mu _{Y}},\mu _Z)\bigr )^p. \end{aligned}$$

Further, define \(u_{XYZ}\) on \(X\sqcup Y\sqcup Z\) as

$$\begin{aligned}u_{XYZ}(x_1,x_2)={\left\{ \begin{array}{ll} \,u_{XY}(x_1,x_2), &{}x_1,x_2\in X\sqcup Y,\\ \, u_{YZ}(x_1,x_2), &{}x_1,x_2\in Y\sqcup Z,\\ \, \inf \hspace{1.111pt}\{\max \hspace{0.55542pt}(u_{XY}(x_1,y),u_{YZ}(y,x_2))\,{|}\,y\in Y\}, &{}x_1\in X,\;x_2\in Z,\\ \, \inf \hspace{1.111pt}\{\max \hspace{0.55542pt}(u_{XY}(x_2,y),u_{YZ}(y,x_1))\,{|}\,y\in Y\}, &{}x_1\in Z,\;x_2\in X. \end{array}\right. }\end{aligned}$$

Then, by [93, Lem. 1.1] \(u_{XYZ}\) is a pseudo-ultrametric on \(X\sqcup Y\sqcup Z\) that coincides with \(u_{XY}\) on \(X\sqcup Y\) and with \(u_{YZ}\) on \(Y\sqcup Z\). Thus by Lemma B.3 we obtain that

$$\begin{aligned} \bigl (u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Z}})\bigr )^p&\leqslant \bigl (d_{\textrm{W},p}^{\,(X\sqcup Y\sqcup Z,u_{XYZ})}({\mu _{X}},\mu _Z)\bigr )^p \\ {}&\leqslant \bigl (d_{\textrm{W},p}^{\,(X\sqcup Y\sqcup Z,u_{XYZ})}({\mu _{X}},{\mu _{Y}})\bigr )^p\!+\bigl (d_{\textrm{W},p}^{\,(X\sqcup Y\sqcup Z,u_{XYZ})}({\mu _{Y}},\mu _Z)\bigr )^p\\&= \bigl (d_{\textrm{W},p}^{\,(X\sqcup Y,u_{XY})}({\mu _{X}},{\mu _{Y}})\bigr )^p\!+\bigl (d_{\textrm{W},p}^{\,( Y\sqcup Z,u_{YZ})}({\mu _{Y}},\mu _Z)\bigr )^p \\ {}&=\bigl (u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\bigr )^p\!+\bigl (u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {Y}},{\mathcal {Z}})\bigr )^p. \end{aligned}$$

This gives the claim for \(p<\infty \). \(\square \)

1.1.3 Proof of Theorem 3.7

In order to proof Theorem 3.7, we will first establish the statement for finite ultrametric measure spaces. For this purpose, we need to introduce some notation. Given \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\), let \({\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\) denote the collection of all admissible pseudo-ultrametrics on \(X\sqcup Y\), where \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) is called admissible, if there exists no \(u^*\!\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) such that \(u^*\ne u\) and \(u^*(x,y)\leqslant u(x,y)\) for all \(x,y\in X\sqcup Y\).

Lemma B.4

For any \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\), \({\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\ne \text{\O }\). Moreover,

$$\begin{aligned}u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\,=\!\!\inf _{u \in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})}\! d_{\textrm{W},p}^{\,(X\sqcup Y,u)}({\mu _{X}},{\mu _{Y}}).\end{aligned}$$

Proof

If \(\{u_n\}_{n\in {\mathbb {N}}}\subseteq {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) is a decreasing sequence (with respect to pointwise inequality), it is easy to verify that \(u:= \inf _{\,n\in {\mathbb {N}}}u_n\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and thus u is a lower bound of \(\{u_n\}_{n\in {\mathbb {N}}}\). Then, by Zorn’s lemma \({\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\ne \text{\O }\). Therefore, we obtain the claim.\(\square \)

Combined with Example 3.6, the following result implies that each \(u\in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\) gives rise to an element in \({\mathcal {A}}\).

Lemma B.5

Given finite spaces \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\), for each \(u\in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\), \(u^{-1}(0)\ne \text{\O }\).

Proof

Assume otherwise that \(u^{-1}(0)=\text{\O }\). Let \((x_0,y_0)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) be such that \(u(x_0,y_0)=\min _{x\in X,y\in Y}u(x,y)\). The existence of \((x_0,y_0)\) is due to the finiteness of X and Y. We define \( u_{(x_0,y_0)}:X\sqcup Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\sqcup Y\rightarrow {\mathbb {R}}_{\geqslant 0}\) as follows:

  1. (i)

    \( u_{(x_0,y_0)}|_{X\times X}:= u_X\) and \( u_{(x_0,y_0)}|_{Y\times Y}:= u_Y\).

  2. (ii)

    For \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\),

    $$\begin{aligned} u_{(x_0,y_0)}(x,y):= \min \hspace{1.111pt}(u(x,y),\max \hspace{0.55542pt}(u_X(x,x_0),u_Y(y,y_0))). \end{aligned}$$
  3. (iii)

    For any \((y,x)\in Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\), \( u_{(x_0,y_0)}(y,x):= u_{(x_0,y_0)}(x,y)\).

It is easy to verify that \(u_{(x_0,y_0)}\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\). Further, it is obvious that \(u_{(x_0,y_0)}(x_0,y_0)=0<u(x_0,y_0)\) and that \(u_{(x_0,y_0)}(x,y)\leqslant u(x,y)\) for all \(x,y\in X\sqcup Y\) which contradicts with \(u\in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\). Therefore, \(u^{-1}(0)\ne \text{\O }\).\(\square \)

Theorem B.6

Let \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\) be finite spaces. Then, we have for each \(p\in [1,\infty )\) that

$$\begin{aligned} u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})=\!\!\inf _{(A,\varphi )\in {\mathcal {A}}}\! d_{\textrm{W},p}^{Z_A} \bigl ({(\phi ^X_{(A,\varphi )})}_\#\,{\mu _{X}},{(\psi ^Y_{(A,\varphi )})}_\#\,{\mu _{Y}}\bigr ).\end{aligned}$$
(24)

Proof

By Lemma B.4 suffices to prove that \(u\in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\) induces \((A,\varphi )\in {\mathcal {A}}\) such that

$$\begin{aligned} d_{\textrm{W},p}^{\,(X\sqcup Y,u)}({\mu _{X}},{\mu _{Y}})\geqslant d_{\textrm{W},p}^{Z_{A}}\bigl ({(\phi ^X_{(A,\varphi )})}_\#\,{\mu _{X}},{(\psi ^Y_{(A,\varphi )})}_\#\,{\mu _{Y}}\bigr ).\end{aligned}$$

Let \(u\in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\). Define \(A_0:= \{x\in X\,{|}\,\exists \, y\in Y \text { such that }u(x,y)=0\}\) (\(A_0\ne \text{\O }\) by Lemma B.5). By Example 3.6, the map \(\varphi _0:A_0\rightarrow Y\) taking x to y such that \(u(x,y)=0\) is a well-defined isometric embedding. This means in particular that \((A_0,\varphi _0)\in {\mathcal {A}}\).

If \(u(x,y)\geqslant u_{Z_{A_0}}(\phi ^X_{(A_0,\varphi _0)}(x),\psi ^Y_{(A_0,\varphi _0)}(y))\) holds for all \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), then we set \(A:= A_0\) and \(\varphi := \varphi _0\). This gives

$$\begin{aligned} d_{\textrm{W},p}^{\,(X\sqcup Y,u)}({\mu _{X}},{\mu _{Y}})\geqslant d_{\textrm{W},p}^{Z_{A}}\bigl ({(\phi ^X_{(A,\varphi )})}_\#\,{\mu _{X}},{(\psi ^Y_{(A,\varphi )})}_\#\,{\mu _{Y}}\bigr ).\end{aligned}$$

Otherwise, there exists \((x,y)\in X\backslash A_0\hspace{1.111pt}{\times }\hspace{1.111pt}Y\backslash \varphi _0(A_0)\) such that

$$\begin{aligned}u(x,y)< u_{Z_{A_0}}\bigl (\phi ^X_{(A_0,\varphi _0)}(x),\psi ^Y_{(A_0,\varphi _0)}(y)\bigr )\end{aligned}$$

(if \(x\in A_0\) or \(y\in \varphi _0(A_0)\), then \(u(x,y)\geqslant u_{Z_{A_0}}\bigl (\phi ^X_{(A_0,\varphi _0)}(x),\psi ^Y_{(A_0,\varphi _0)}(y)\bigr )\) must hold). Let \((x_1,y_1)\in X\backslash A_0\hspace{0.55542pt}{\times }\hspace{1.111pt}Y\backslash \varphi _0(A_0)\) be such that

$$\begin{aligned} u(x_1,y_1)=\min \hspace{1.111pt}\left\{ u(x,y) \ \Bigg | \ \begin{array}{l} (x,y)\in X\backslash A_0\hspace{0.55542pt}{\times }\hspace{1.111pt}Y\backslash \varphi _0(A_0) \text { and }\\ u(x,y)< u_{Z_{A_0}}\!\bigl (\phi ^X_{(A_0,\varphi _0)}(x),\psi ^Y_{(A_0,\varphi _0)}(y)\bigr )\end{array} \right\} >0. \end{aligned}$$

The existence of \((x_1,y_1)\) follows from finiteness of X and Y. It is easy to check that \(\varphi _0\) extends to an isometry from \(A_0\cup \{x_1\}\) to \(\varphi _0(A_0)\cup \{y_1\}\) by taking \(x_1\) to \(y_1\). We denote the new isometry \(\varphi _1\) and set \(A_1:= A_0\cup \{x_1\}\). If for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), we have that \(u(x,y)\geqslant u_{Z_{A_1}}\!(\phi ^X_{(A_1,\varphi _1)}(x),\psi ^Y_{(A_1,\varphi _1)}(y))\), then we define \(A:= A_1\) and \(\varphi := \varphi _1\). Otherwise, we continue the process to obtain \(A_2, A_3,\dots \). This process will eventually stop since we are considering finite spaces. Suppose the process stops at \(A_n\), then \(A:= A_n\) and \(\varphi := \varphi _n\) satisfy that \(u(x,y)\geqslant u_{Z_{A}}(\phi ^X_{(A,\varphi )}(x),\psi ^Y_{(A,\varphi )}(y))\) for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\). Therefore,

$$\begin{aligned} d_{\textrm{W},p}^{\,(X\sqcup Y,u)}({\mu _{X}},{\mu _{Y}})\geqslant d_{\textrm{W},p}^{Z_{A}}\bigl ({(\phi ^X_{(A,\varphi )})}_\#\,{\mu _{X}},{(\psi ^Y_{(A,\varphi )})}_\#\,{\mu _{Y}}\bigr ).\end{aligned}$$

Since \(u\in {\mathcal {D}}^{\textrm{ult}}_{\textrm{adm}}({u_{X}},{u_{Y}})\) is arbitrary, this gives the claim.\(\square \)

As a direct consequence of Theorem B.6, we obtain that it is sufficient, as claimed in Remark 3.8, for finite spaces to infimize in (24) over the collection of all maximal pairs \({\mathcal {A}}^*\!\subseteq {\mathcal {A}}\). Recall that a pair \((A,\varphi _1)\in {\mathcal {A}}\) is denoted as maximal, if for all pairs \((B,\varphi _2)\in {\mathcal {A}}\) with \(A\subseteq B\) and \(\varphi _2|_A\!=\varphi _1\) it holds \(A=B\).

Corollary B.7

Let \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\) be finite spaces. Then, we have for each \(p\in [1,\infty ]\) that

$$\begin{aligned} u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\,=\!\!\inf _{(A,\varphi )\in {\mathcal {A}}^*}d_{\textrm{W},p}^{Z_A} \bigl ({(\phi ^X_{(A,\varphi )})}_\#\,{\mu _{X}},{(\psi ^Y_{(A,\varphi )})}_\#\,{\mu _{Y}}\bigr ).\end{aligned}$$
(25)

By proving Theorem B.6, we have verified Theorem 3.7 for finite ultrametric measure spaces. Then, we will use Theorem B.6 and weighted quotients to demonstrate Theorem 3.7. However, before we come to this, we need to establish the following two auxiliary results.

Lemma B.8

Let \(X\in {\mathcal {U}}\) be a compact ultrametric space. Let \(t>0\) and let \(p\in [1,\infty )\). Then, for any \(\alpha ,\beta \in {\mathcal {P}}(X)\), we have that

$$\begin{aligned}\bigl ( d_{\textrm{W},p}^{X_t}(\alpha _t,\beta _t)\bigr )^p\geqslant \bigl ( d_{\textrm{W},p}^{X}(\alpha ,\beta )\bigr )^p-t^p,\end{aligned}$$

where \(\alpha _t\) is the push forward of \(\alpha \) under the canonical quotient map \(Q_t:X\rightarrow X_t\) taking \(x\in X\) to \([x]_t\in X_t\).

Proof

For any \(\mu _t\in {\mathcal {C}}(\alpha _t,\beta _t)\), it is easy to see that there exists \(\mu \in {\mathcal {C}}(\alpha ,\beta )\) such that \(\mu _t=( Q_t\hspace{0.55542pt}{\times }\hspace{1.111pt}Q_t)_\#\,\mu \) where \(Q_t\hspace{0.55542pt}{\times }\hspace{1.111pt}Q_t:X\hspace{1.111pt}{\times }\hspace{1.111pt}X\rightarrow X_t\hspace{0.55542pt}{\times }\hspace{1.111pt}X_t\) maps \((x,x')\in X\hspace{1.111pt}{\times }\hspace{1.111pt}X\) to \(([x]_t,[x']_t)\). For example, suppose \(X_t=\{[x_1]_t,\ldots ,[x_n]_t\}\), then one can let

$$\begin{aligned}\mu := \!\sum _{i,j=1}^n\mu _t(([x_i]_t,[x_j]_t))\,\frac{\alpha |_{[x_i]_t}}{\alpha ([x_i]_t)}\hspace{1.111pt}{\otimes }\hspace{1.111pt}\frac{\beta |_{[x_j]_t}}{\beta ([x_j]_t)}\hspace{0.55542pt},\end{aligned}$$

where \(\alpha |_{[x_i]_t}\) is the restriction of \(\alpha \) on \([x_i]_t\).

For any \(x,x'\!\in X\), we have that \(( u_X(x,x'))^p\leqslant ( u_{X_t}([x]_t,[x']_t))^p+t^p\). Then

$$\begin{aligned} \bigl ( d_{\textrm{W},p}^{X}(\alpha ,\beta )\bigr )^p&\leqslant \int _{X\times X}( u_X(x,x'))^p\hspace{1.111pt}\mu (dx\hspace{1.111pt}{\times }\hspace{1.111pt}dx')\\&\leqslant \int _{X\times X}\bigl (( u_{X_t}([x]_t,[x']_t)\bigr )^p\!+t^p\bigr )\hspace{1.111pt}\mu (dx\hspace{1.111pt}{\times }\hspace{1.111pt}dx')\\&=\int _{X\times X}( u_X(Q_t(x),Q_t(x')))^p\hspace{1.111pt}\mu (dx\hspace{1.111pt}{\times }\hspace{1.111pt}dx')+t^p\\&=\int _{X_t\times X_t}\bigl ( u_{X_t}([x]_t,[x']_t)\bigr )^p\hspace{1.111pt}\mu _t(d[x]_t\hspace{1.111pt}{\times }\hspace{1.111pt}d[x']_t)+t^p \end{aligned}$$

Infimizing over all \(\mu _t\in {\mathcal {C}}(\alpha _t,\beta _t)\), we obtain the claim.\(\square \)

Lemma B.9

Let \({\mathcal {X}}\in {\mathcal {U}}^{\textrm{w}}\) and let \(p\in [1,\infty ]\). Then, for any \(t>0\), we have that \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}}_t,{\mathcal {X}})\leqslant t\). In particular, \(\lim _{\,t\rightarrow 0}u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}}_t,{\mathcal {X}})=0\).

Proof

It is obvious that \(({\mathcal {X}}_t)_t\cong _{\textrm{w}}{\mathcal {X}}_t\). Hence, it holds by Theorem 3.14 that \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}}_t,{\mathcal {X}})\leqslant t\). By Proposition 3.3 we have that for any \(p\in [1,\infty ]\),

$$\begin{aligned} u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}}_t,{\mathcal {X}})\leqslant u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}}_t,{\mathcal {X}})\leqslant t. \end{aligned}$$

\(\square \)

With Lemmas B.8 and  B.9 available, we can come to the proof of Theorem 3.7.

Proof of Theorem 3.7

It follows from the definition of \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\) (see (8)) that

$$\begin{aligned}u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\,\leqslant \!\inf _{(A,\varphi )\in {\mathcal {A}}}d_{\textrm{W},p}^{Z_A} \bigl ({(\phi ^X_{(A,\varphi )})}_\#\,{\mu _{X}},{(\psi ^Y_{(A,\varphi )})}_\#\,{\mu _{Y}}\bigr ). \end{aligned}$$

Hence, we focus on proving the opposite inequality. Given any \(t>0\), by Lemma A.7, both \({\mathcal {X}}_t\) and \({\mathcal {Y}}_t\) are finite spaces. By Theorem B.6 we have that

$$\begin{aligned}u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}}_t,{\mathcal {Y}}_t)=\inf _{(A_t,\varphi _t)\in {\mathcal {A}}_t}d_{\textrm{W},p}^{Z_{A_t}} \bigl ({(\phi ^{X_t}_{(A_t,\varphi _t)})}_\#\,({\mu _{X}})_t,{(\psi ^{Y_t}_{(A_t,\varphi _t)})}_\#\,({\mu _{Y}})_t\bigr ),\end{aligned}$$

where \({\mathcal {A}}_t:= \{(A_t,\varphi _t)\mid \text{\O }\ne A_t\subseteq X_t \text { is closed and } \varphi _t:A_t\hookrightarrow Y_t \text { is an} \text {isometricembedding } \}\).

For any \((A_t,\varphi _t)\in {\mathcal {A}}_t\), assume that \(A_t=\{[x_1]_t^X\!,\ldots ,[x_n]_t^X\}\) and that \(\varphi _t([x_i]_t)=[y_i]_t\in Y_t\) for all \(i=1,\ldots ,n\). Let \(A:= \{x_1,\ldots ,x_n\}\). Then, the map \(\varphi :A\rightarrow Y\) defined by \(x_i\mapsto y_i\) for \(i=1,\ldots ,n\) is an isometric embedding. Therefore, \((A,\varphi )\in {\mathcal {A}}\).

Claim 1

\(((Z_A)_t,u_{(Z_A)_t})\cong ( Z_{A_t},u_{Z_{A_t}})\).

Proof of Claim 1

We define a map \(\Psi :(Z_A)_t\rightarrow Z_{A_t}\) by \([x]_t^{Z_A}\!\mapsto [x]_t^X\) for \(x\in X\) and \([y]_t^{Z_A}\!\mapsto [y]_t^Y\) for \(y\in Y\backslash \varphi (A)\). We first show that \(\Psi \) is well defined. For any \(x'\!\in X\), if \(u_{Z_A}(x,x')\leqslant t\), then obviously we have that \(u_X(x,x')=u_{Z_A}(x,x')\leqslant t\) and thus \([x]_t^X\!=[x']_t^X\). Now, assume that there exists \(y\in Y\backslash \varphi (A)\) such that \(u_{Z_A}(x,y)\leqslant t\), i.e., \([x]_t^{Z_A}\!=[y]_t^{Z_A}\). Then, by finiteness of A and definition of \(Z_A\), there exists \(x_i\in A\) such that \(u_{Z_A}(x,y)=\max \hspace{0.55542pt}( u_X(x,x_i),u_Y(\varphi (x_i),y))\leqslant t\). This gives that

$$\begin{aligned}u_{Z_{A_t}}\!\bigl ([x]_t^X\!,[y]_t^Y\bigr )\leqslant \max \hspace{1.111pt}\bigl ( u_{X_t}\bigl ([x]_t^X\!,[x_i]_t^X\bigr ),u_{Y_t}\bigl ([\varphi (x_i)]_t^Y\!,[y]_t^Y\bigr )\bigr )\leqslant t.\end{aligned}$$

However, this happens only if \(u_{Z_{A_t}}\!([x]_t^X\!,[y]_t^Y)=0\), that is, \([x]_t^X\) is identified with \([y]_t^Y\) under the map \(\varphi _t\). Therefore, \(\Psi \) is well defined. It is easy to see from the definition that \(\Psi \) is surjective. Thus, it suffices to show that \(\Psi \) is an isometric embedding to finish the proof. For any \(x,x'\!\in X\) such that \(u_X(x,x')>t\), we have that

$$\begin{aligned} u_{(Z_A)_t}\!\bigl ([x]_t^{Z_A}\!,[x']_t^{Z_A}\bigr )&= u_{Z_A}(x,x')\\ {}&=u_X(x,x')=u_{X_t}\bigl ([x]_t^{X}\!,[x']_t^{X}\bigr )=u_{Z_{A_t}}\!\bigl ([x]_t^{X}\!,[x']_t^{X}\bigr ). \end{aligned}$$

Similarly, for any \(y,y'\!\in Y\backslash \varphi (A)\) such that \(u_Y(y,y')>t\), we have that

$$\begin{aligned}u_{(Z_A)_t}\bigl ([y]_t^{Z_A}\!,[y']_t^{Z_A}\bigr ) =u_{Z_{A_t}}\bigl ([y]_t^{Y}\!,[y']_t^{Y}\bigr ).\end{aligned}$$

Now, consider \(x\in X\) and \(y\in Y\backslash \varphi (A)\). Assume that \(u_{Z_A}\!(x,y)>t\) (otherwise \([x]_t^{Z_A}\!=[y]_t^{Z_A}\)). Then, we have that

$$\begin{aligned} u_{Z_A}\!( x,y)=\!\min _{i=1,\ldots ,n}\max \hspace{0.55542pt}( u_{X}( x,x_i),u_{Y}( \varphi (x_i),y))>t. \end{aligned}$$

This implies that

$$\begin{aligned} u_{Z_{A_t}}\!\bigl ( [x]_t^X\!,[y]_t^Y\bigr )&=\!\min _{i=1,\ldots ,n}\max \hspace{0.55542pt}\bigl ( u_{X_t}\bigl ( [x]_t^X\!,[x_i]_t^X\bigr ), u_{Y_t}\bigl ( \varphi _t([x_i]_t^X),[y]_t^Y\bigr )\bigr )\\&=\!\min _{i=1,\ldots ,n}\max \hspace{0.55542pt}( u_{X}( x,x_i),u_{Y}( \varphi (x_i),y))\\&=u_{Z_A}( x,y)=u_{(Z_A)_t}\!\bigl ( [x]_t^{Z_A}\!,[y]_t^{Z_A}\bigr ). \end{aligned}$$

Therefore, \(\Psi \) is an isometric embedding and thus we conclude the proof. \(\square \)

By Lemma B.8 we have that

$$\begin{aligned} \Bigl ( d_{\textrm{W},p}^{Z_{A_t}} \bigl ({\bigl (\phi ^{X_t}_{(A_t,\varphi _t)}\bigr )}_\#&\, ({\mu _{X}})_t,{\bigl (\psi ^{Y_t}_{(A_t,\varphi _t)}\bigr )}_\#\,({\mu _{Y}})_t\bigr )\Bigr )^p \\ {}&\geqslant \Bigl ( d_{\textrm{W},p}^{Z_{A}}\bigl ({\bigl (\phi ^{X}_{(A,\varphi )}\bigr )}_\#\, {\mu _{X}},{\bigl (\psi ^{Y}_{(A,\varphi )}\bigr )}_\#\,{\mu _{Y}}\bigr )\Bigr )^p\!-t^p \end{aligned}$$

Therefore,

$$\begin{aligned} u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}}_t,{\mathcal {Y}}_t)&=\!\inf _{(A_t,\varphi _t)\in {\mathcal {A}}_t}d_{\textrm{W},p}^{Z_{A_t}} \bigl ({\bigl (\phi ^{X_t}_{(A_t,\varphi _t)}\bigr )}_\#\,({\mu _{X}})_t,{\bigl (\psi ^{Y_t}_{(A_t,\varphi _t)}\bigr )}_\#\,({\mu _{Y}})_t\bigr )\\&\geqslant \!\inf _{(A,\varphi )\in {\mathcal {A}}}\Bigl (\bigl ( d_{\textrm{W},p}^{Z_{A}}\bigl ({\bigl (\phi ^{X}_{(A,\varphi )}\bigr )}_\#\, {\mu _{X}},{\bigl (\psi ^{Y}_{(A,\varphi )}\bigr )}_\#\,{\mu _{Y}}\bigr )\bigr )^p-t^p\Bigr )^{1/p}. \end{aligned}$$

Notice that the last inequality already holds when we only consider \((A,\varphi )\) corresponding to \((A_t,\varphi _t)\in {\mathcal {A}}_t\). By Lemma B.9, we have that

$$\begin{aligned} u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})&=\lim _{t\rightarrow 0}u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}}_t,{\mathcal {Y}}_t)\\ {}&\geqslant \!\inf _{(A,\varphi )\in {\mathcal {A}}} d_{\textrm{W},p}^{Z_{A}}\bigl ({\bigl (\phi ^{X}_{(A,\varphi )}\bigr )}_\#\,{\mu _{X}},{\bigl (\psi ^{Y}_{(A,\varphi )}\bigr )}_\#\,{\mu _{Y}}\bigr ), \end{aligned}$$

which concludes the proof. \(\square \)

1.2 Proofs from Sect. 3.2

In this section, we give the complete proofs of the results stated in Sect. 3.2.

1.2.1 Proof of Proposition 3.10

Part 1. This follows directly from the definitions of \(u_{\textrm{GW},p}\) and \(d_{\textrm{GW},p}\) (see (11) and (5)).

Part 2. By Jensen’s inequality we have that \({\textrm{dis}}^{\textrm{ult}}_p(\mu )\leqslant {\textrm{dis}}^{\textrm{ult}}_q(\mu )\) for any \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\). Therefore, \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})\leqslant u_{\textrm{GW},q}({\mathcal {X}},{\mathcal {Y}}) \).

Part 3. By Part 2 we know that \(\{u_{\textrm{GW},n}({\mathcal {X}},{\mathcal {Y}})\}_{n\in {\mathbb {N}}}\) is an increasing sequence with a finite upper bound \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\). Therefore, \(L:= \lim _{\,n\rightarrow \infty }u_{\textrm{GW},n}({\mathcal {X}},{\mathcal {Y}})\) exists and it holds \(L\leqslant u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\).

To prove the opposite inequality, by Proposition B.10, there exists for each \(n\in {\mathbb {N}}\), \(\mu _n\in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) such that

$$\begin{aligned}\Vert \Gamma _{X,Y}^\infty \Vert _{L^n(\mu _n\otimes \mu _n)}= u_{\textrm{GW},n}({\mathcal {X}},{\mathcal {Y}}). \end{aligned}$$

By Lemma B.19, \(\{\mu _n\}_{n\in {\mathbb {N}}}\) weakly converges (after taking an appropriate subsequence) to some \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\). Let

$$\begin{aligned} M:=\! \sup _{(x,y),(x'\!,y')\in {{\textrm{supp}}\hspace{0.55542pt}(\mu ) }}\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y')) \end{aligned}$$

and for a given \(\varepsilon >0\) let

$$\begin{aligned} U=\bigl \{((x,y),(x'\!,y'))\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\mid \Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))> M-\varepsilon \bigr \}. \end{aligned}$$

Then, we have \(\mu \hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu (U)>0\). As \(\mu _n\) weakly converges to \(\mu \), we have that \(\mu _n\hspace{0.55542pt}{\otimes }\hspace{1.111pt}\mu _n\) weakly converges to \(\mu \hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu \). Since U is open, there exists a small \(\varepsilon _1>0\) such that \(\mu _n\hspace{0.55542pt}{\otimes }\hspace{1.111pt}\mu _n(U)>\mu \hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu (U)-\varepsilon _1>0\) for n large enough (see e.g. [7, Thm. 2.1]). Therefore,

$$\begin{aligned}\Vert \Gamma _{X,Y}^\infty \Vert _{L^n(\mu _n\otimes \mu _n)}\geqslant (\mu _n\hspace{0.55542pt}{\otimes }\hspace{1.111pt}\mu _n(U))^{1/n}(M-\varepsilon )\geqslant (\mu \hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu (U)-\varepsilon _1)^{1/n}(M-\varepsilon ).\end{aligned}$$

Letting \(n\rightarrow \infty \), we obtain \(L\geqslant M-\varepsilon \). Since \(\varepsilon >0\) is arbitrary, we obtain \(L\geqslant M\geqslant u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\).

1.2.2 Proof of Theorem 3.11

One main step to verify Theorem 3.11 is to demonstrate the existence of optimal couplings.

Proposition B.10

Let \({\mathcal {X}}={(X,{u_{X}},{\mu _{X}}) }\) and \({\mathcal {Y}}={(Y,{u_{Y}},{\mu _{Y}}) }\) be compact ultrametric measure spaces. Then, for any \(p\in [1,\infty ]\), there always exists an optimal coupling \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) such that \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})=\mathrm{dis\hspace{0.55542pt}}_p^{\textrm{ult}}(\mu )\).

Proof

We will only prove the claim for the case \(p<\infty \) since the case \(p=\infty \) can be proven in a similar manner. Let \(\mu _n\in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) be such that

$$\begin{aligned}\Vert \Lambda _\infty (u_X,u_Y)\Vert _{L^p(\mu _n\otimes \mu _n)}\leqslant u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})+\frac{1}{n}\hspace{0.55542pt}. \end{aligned}$$

By Lemma B.19, \(\{\mu _n\}_{n\in {\mathbb {N}}}\) weakly converges to some \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) (after taking an appropriate subsequence). Then, by the boundedness and continuity of \(\Lambda _\infty ({u_{X}},{u_{Y}})\) on \(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) (cf. Lemma B.22) as well as the weak convergence of \(\mu _n\hspace{0.55542pt}{\otimes }\hspace{1.111pt}\mu _n\), we have that

$$\begin{aligned}\mathrm{dis\hspace{0.55542pt}}_p^{\textrm{ult}}(\mu )=\lim _{n\rightarrow \infty }\mathrm{dis\hspace{0.55542pt}}_p^{\textrm{ult}}(\mu _n)\leqslant u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}}). \end{aligned}$$

Hence, \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})=\mathrm{dis\hspace{0.55542pt}}_p^{\textrm{ult}}(\mu )\).\(\square \)

Based on Proposition B.10, it is straightforward to prove Theorem 3.11.

Proof of Theorem 3.11

It is clear that \(u_{\textrm{GW},p}\) is symmetric and that \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}}) =0\) if \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\). Furthermore, we remark that \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})\geqslant d_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})\) by Proposition 3.10. Since \(d_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})=0\) implies that \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\) (see [60]), we have that \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})=0\) implies that \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\). It remains to verify the p-triangle inequality. To this end, we only prove the case when \(p<\infty \) whereas the case \(p=\infty \) follows by analogous arguments.

Now let \({\mathcal {X}},{\mathcal {Y}},{\mathcal {Z}}\) be three ultrametric measure spaces. Let \(\mu _{XY}\in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) and \(\mu _{YZ}\in {\mathcal {C}}({\mu _{Y}},\mu _Z)\) be optimal (cf. Proposition B.10). By the Gluing Lemma [90, Lem. 7.6], there exists a measure \(\mu _{XYZ}\in {\mathcal {P}}(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}Z)\) with marginals \(\mu _{XY}\) on \(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) and \(\mu _{YZ}\) on \(Y\hspace{1.111pt}{\times }\hspace{1.111pt}Z\). Further, we define \(\mu _{XZ}=(\pi _{XZ})_\#\,\mu \in {\mathcal {P}}(X\hspace{1.111pt}{\times }\hspace{1.111pt}Z)\), where \(\pi _{XZ}\) denotes the canonical projection \(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}Z\rightarrow X\hspace{1.111pt}{\times }\hspace{1.111pt}Z\). Then

$$\begin{aligned}&(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Z}}))^p\leqslant \Vert \Lambda _\infty ({u_{X}},u_Z)\Vert ^p_{L^p(\mu _{XZ}\otimes \mu _{XZ}) } \\ {}&\qquad \quad =\Vert \Lambda _\infty ({u_{X}},u_Z)\Vert ^p_{L^p(\mu _{XYZ}\otimes \mu _{XYZ} )}\\&\qquad \quad \leqslant \Vert \Lambda _\infty ({u_{X}},u_Y)\Vert ^p_{L^p(\mu _{XYZ}\otimes \mu _{XYZ}) }+\Vert \Lambda _\infty ({u_{Y}},u_Z)\Vert ^p_{L^p(\mu _{XYZ}\otimes \mu _{XYZ} )}\\&\qquad \quad = \Vert \Lambda _\infty ({u_{X}},u_Y)\Vert ^p_{L^p(\mu _{XY}\otimes \mu _{XY}) }+\Vert \Lambda _\infty ({u_{Y}},u_Z)\Vert ^p_{L^p(\mu _{YZ}\otimes \mu _{YZ} )}\\&\qquad \quad =(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}}))^p+(u_{\textrm{GW},p}({\mathcal {Y}},{\mathcal {Z}}))^p, \end{aligned}$$

where the second inequality follows from the fact that \(\Lambda _\infty \) in an ultrametric on \({\mathbb {R}}_{\geqslant 0}\) (cf. [64, Exam. 2.7]) and the observation that an ultrametric is automatically a p-metric for any \(p\in [1,\infty ]\) [64, Prop. 2.11]. \(\square \)

1.2.3 Proof of Theorem 3.14

We first prove that

$$\begin{aligned} u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})=\inf \hspace{1.111pt}\lbrace t\geqslant 0 \,{|}\,{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \end{aligned}$$
(26)

and then show that the infimum is attainable.

Since \({\mathcal {X}}_0\cong _{\textrm{w}} {\mathcal {X}}\) and \({\mathcal {Y}}_0\cong _{\textrm{w}} {\mathcal {Y}}\), if \({\mathcal {X}}_0\cong _{\textrm{w}}{\mathcal {Y}}_0\), then \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\) and thus by Theorem 3.11

$$\begin{aligned}u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})=0=\inf \hspace{1.111pt}\lbrace t\geqslant 0 \,{|}\,{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace .\end{aligned}$$

Now, assume that for some \(t>0\), \({\mathcal {X}}_t\cong _{\textrm{w}} {\mathcal {Y}}_t\). By Lemma A.7, for some \(n\in {\mathbb {N}}\) we can write \({X}_t=\{[x_1]_t,\dots ,[x_n]_t\}\) and \({Y}_t=\{[y_1]_t,\dots ,[y_n]_t\}\) such that \(u_{X_t}([x_i]_t,[x_j]_t)=u_{Y_t}([y_i]_t,[y_j]_t)\) and \({\mu _{X}}([x_i]_t)={\mu _{Y}}([y_i]_t)\). Let \({\mu _{X}}^i:= {\mu _{X}}|_{[x_i]_t}\) and \({\mu _{Y}}^i:= {\mu _{Y}}|_{[y_i]_t}\) for all \(i=1,\dots ,n\). Let \(\mu := \sum _{i=1}^n{\mu _{X}}^i\hspace{0.55542pt}{\otimes }\hspace{1.111pt}{\mu _{Y}}^i\). It is easy to check that \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) and \(\textrm{supp}\hspace{0.55542pt}(\mu )=\bigcup _{i=1}^n[x_i]_t\hspace{0.55542pt}{\times }\hspace{1.111pt}[y_i]_t\). Assume \((x,y)\in [x_i]_t\hspace{0.55542pt}{\times }\hspace{1.111pt}[y_i]_t\) and \((x'\!,y')\in [x_j]_t\hspace{0.55542pt}{\times }\hspace{1.111pt}[y_j]_t\). If \(i\ne j\), then \(u_{X_t}([x_i]_t,[x_j]_t)=u_{Y_t}([y_i]_t,[y_j]_t)\) and thus

$$\begin{aligned}\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))=\Lambda _\infty (u_{X_t}([x_i]_t,[x_j]_t),u_{Y_t}([y_i]_t,[y_j]_t))=0.\end{aligned}$$

If \(i=j\), then \({u_{X}}(x,x'),{u_{Y}}(y,y')\leqslant t\) and thus \(\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\leqslant t\). In either case, we have that

$$\begin{aligned} u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\,\leqslant \! \sup _{(x,y),(x'\!,y')\in \textrm{supp}\hspace{0.55542pt}(\mu )}\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\leqslant t. \end{aligned}$$

Therefore, \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\leqslant \inf \hspace{1.111pt}\lbrace t\geqslant 0 \,{|}\,{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \).

Conversely, suppose \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) and let

$$\begin{aligned} t:= \!\sup _{(x,y),(x'\!,y')\in \textrm{supp}\hspace{0.55542pt}(\mu )}\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y')). \end{aligned}$$

By [60, Lem. 2.2], we know that \(\textrm{supp}\hspace{0.55542pt}(\mu )\) is a correspondence between X and Y. We define a map \(f_t:X_t\rightarrow Y_t\) by taking \([x]_t^X\!\in X_t\) to \([y]_t^Y\!\in Y_t\) such that \((x,y)\in \textrm{supp}\hspace{0.55542pt}(\mu )\). It is easy to check that \(f_t\) is well defined and moreover \(f_t\) is an isometry (see for example the proof of [64, Thm. 5.1]). Next, we prove that \(f_t\) is actually an isomorphism between \({\mathcal {X}}_t\) and \({\mathcal {Y}}_t\). For any \([x]^X_t\in X_t\), let \(y\in Y\) be such that \((x,y)\in {{\textrm{supp}}\hspace{0.55542pt}(\mu ) }\) (in this case, \([y]^Y_t\!=f_t([x]^X_t)\)). If there exists \((x'\!,y')\in \textrm{supp}\hspace{0.55542pt}(\mu )\) such that \(x'\!\in [x]^X_t\) and \(y'\!\not \in [y]^Y_t\), then \(\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))={u_{Y}}(y,y')>t\), which is impossible. Consequently, \(\mu ([x]^X_t\hspace{1.111pt}{\times }\hspace{1.111pt}(Y\backslash [y]^Y_t))=0\) and similarly, \(\mu ((X\backslash [x]^X_t)\hspace{0.55542pt}{\times }\hspace{1.111pt}[y]^Y_t)=0\). This yields that

$$\begin{aligned}{\mu _{X}}([x]^X_t)=\mu ([x]^Y_t\hspace{0.55542pt}{\times }\hspace{1.111pt}Y)=\mu ([x]^X_t\hspace{0.55542pt}{\times }\hspace{1.111pt}[y]^Y_t)=\mu (X\hspace{1.111pt}{\times }\hspace{1.111pt}[y]^Y_t)={\mu _{Y}}([y]^Y_t).\end{aligned}$$

Therefore, \(f_t\) is an isomorphism between \({\mathcal {X}}_t\) and \({\mathcal {Y}}_t\). Hence, we have that \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\geqslant \inf \hspace{1.111pt}\lbrace t\geqslant 0 \,{|}\,{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \) and hence \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})=\inf \hspace{1.111pt}\lbrace t\geqslant 0 \,{|}{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \).

Finally, we show that the infimum of \(\inf \hspace{1.111pt}\lbrace t\geqslant 0 \,{|}\,{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \) is attainable. Let \(\delta := \inf \lbrace t\geqslant 0 \,{|}\,{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \). If \(\delta >0\), let \(\{t_n\}_{n\in {\mathbb {N}}}\) be a decreasing sequence converging to \(\delta \) such that \({\mathcal {X}}_{t_n}\!\cong _{\textrm{w}} {\mathcal {Y}}_{t_n}\) for all \(t_n\). Since \({\mathcal {X}}_\delta \) and \({\mathcal {Y}}_\delta \) are finite, \({\mathcal {X}}_{t_n}\!={\mathcal {X}}_{\delta }\) and \({\mathcal {Y}}_{t_n}\!={\mathcal {Y}}_{\delta }\) when n is large enough. This immediately implies that \({\mathcal {X}}_\delta \cong _{\textrm{w}} {\mathcal {Y}}_\delta \). Now, if \(\delta =0\), then by (26) we have that \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})=\delta =0\). By Theorem 3.11, \({\mathcal {X}}\cong _{\textrm{w}}{\mathcal {Y}}\). This is equivalent to \({\mathcal {X}}_\delta \cong _{\textrm{w}}{\mathcal {Y}}_\delta \). Therefore, the infimum of \(\inf \hspace{0.55542pt}\lbrace t\geqslant 0 \,{|}\,{\mathcal {X}}_t \cong _{\textrm{w}} {\mathcal {Y}}_t\rbrace \) is always attainable.

1.2.4 Proof of Theorem 3.18

An important observation for the proof of Theorem 3.18 is that the snowflake transform relates the p-Wasserstein pseudometric on a pseudo-ultrametric space X with the 1-Wasserstein pseudometric on the space \(S_p(X)\), \(1\leqslant p<\infty \).

Lemma B.11

Given a pseudo-ultrametric space \((X,{u_{X}})\) and \(p\geqslant 1\), we have for any \(\alpha ,\beta \in {\mathcal {P}}(X)\) that \(d_{\textrm{W},p}^{\,(X,{u_{X}})}(\alpha ,\beta )=(d_{\textrm{W},1}^{\,S_p(X)}(\alpha ,\beta ))^{1/p}\).

Remark B.12

Since \(S_p\hspace{0.55542pt}{\circ }\hspace{1.111pt}{u_{X}}\) and \({u_{X}}\) induce the same topology and thus the same Borel sets on X, \({\mathcal {P}}(X)={\mathcal {P}}(S_p(X))\) and thus the expression \(d_{\textrm{W},1}^{\,S_p(X)}(\alpha ,\beta )\) in the lemma is well defined.

Proof of Lemma B.11

Suppose \(\mu _1,\mu _2\in {\mathcal {C}}(\alpha ,\beta )\) are optimal for \(d_{\textrm{W},p}^X(\alpha ,\beta )\) and \(d_{\textrm{W},1}^{\,S_p(X)}(\alpha ,\beta )\), respectively (see Sect. B.5.1 for the existence of \(\mu _1\) and \(\mu _2\)). Then,

$$\begin{aligned} \bigl (d_{\textrm{W},p}^{\,(X,{u_{X}})}(\alpha ,\beta )\bigr )^p&=\int _{X\times X}({u_{X}}(x,y))^p\hspace{1.111pt}\mu _1(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\\ {}&=\int _{X\times X}S_p({u_{X}})(x,y)\hspace{1.111pt}\mu _1(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\geqslant d_{\textrm{W},1}^{\,S_p(X)}(\alpha ,\beta ), \end{aligned}$$

and

$$\begin{aligned} d_{\textrm{W},1}^{\,S_p(X)}(\alpha ,\beta )&=\int _{X\times X}S_p({u_{X}})(x,y)\,\mu _2(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\\ {}&=\int _{X\times X}({u_{X}}(x,y))^p\hspace{1.111pt}\mu _2(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\geqslant \bigl (d_{\textrm{W},p}^{\,(X,{u_{X}})}(\alpha ,\beta )\bigr )^p. \end{aligned}$$

Therefore, \(d_{\textrm{W},p}^{\,(X,{u_{X}})}(\alpha ,\beta )=(d_{\textrm{W},1}^{\,S_p(X)}(\alpha ,\beta ))^{1/p}\). \(\square \)

With Lemma B.11 at our disposal we can prove Theorem 3.18.

Proof of Theorem 3.18

Let \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\). Then,

$$\begin{aligned} \Vert \Lambda _\infty ({u_{X}},{u_{Y}})\Vert _{L^p(\mu \times \mu )}^p=\Vert \Lambda _\infty ({u_{X}}^p\!,{u_{Y}}^p)\Vert _{L^1(\mu \times \mu )}. \end{aligned}$$

By infimizing over \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) on both sides, we obtain that \((u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}}))^p =u_{\textrm{GW},1}(S_p({\mathcal {X}}),S_p({\mathcal {Y}}))\).

To prove the second part of the claim, let \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\). By Lemma B.11 we have that

$$\begin{aligned}\bigl (d_{\textrm{W},p}^{\,(X\sqcup Y,u)}({\mu _{X}},{\mu _{Y}})\bigr )^p= d_{\textrm{W},1}^{\,(S_p(X)\sqcup S_p(Y),S_p(u))}({\mu _{X}},{\mu _{Y}}).\end{aligned}$$

Finally, infimizing over \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) yields

$$\begin{aligned} u_{\textrm{GW},p}^{\,\mathrm sturm}({\mathcal {X}},{\mathcal {Y}})^p=u_{\textrm{GW},1}^{\,\mathrm sturm}(S_p({\mathcal {X}}),S_p({\mathcal {Y}})) \end{aligned}$$

\(\square \)

As a direct consequence of Theorem 3.18, we obtain the following relation between \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}})\) and \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\) for \(p\in [1,\infty )\).

Corollary B.13

For each \(p\in [1,\infty )\), the metric space \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}})\) is isometric to the snowflake transform of \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\), i.e., \(S_p({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\cong ({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}}) \).

Proof

Consider the snowflake transform map \(S_p:{\mathcal {U}}^{\textrm{w}}\!\rightarrow {\mathcal {U}}^{\textrm{w}}\) sending \(X\in {\mathcal {U}}^{\textrm{w}}\) to \(S_p(X)\in {\mathcal {U}}^{\textrm{w}}\). It is obvious that \(S_p\) is bijective. By Theorem 3.18, \(S_p\) is an isometry from \(S_p({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\) to \( ({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}})\). Therefore, \(S_p({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\cong ({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}}) \).\(\square \)

1.3 Proofs from Sect. 3.3

Throughout the following, we demonstrate the open claims from Sect. 3.3.

1.3.1 Proof of Theorem 3.19

First, we focus on the statement for \(p=1\), i.e., on showing

$$\begin{aligned} u_{\textrm{GW},1}({\mathcal {X}},{\mathcal {Y}})\leqslant 2\hspace{1.111pt}u_{\textrm{GW},1}^{\textrm{sturm}}({\mathcal {X}},{\mathcal {Y}}). \end{aligned}$$
(27)

Let \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) be such that

$$\begin{aligned} u_{\textrm{GW},1}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})=\int u(x,y)\,\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy). \end{aligned}$$

The existence of u and \(\mu \) follows from Proposition B.1.

Claim 1

For any \((x,y),(x'\!,y')\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), we have

$$\begin{aligned}\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\leqslant \max \hspace{0.55542pt}(u(x,y),u(x'\!,y'))\leqslant u(x,y)+u(x'\!,y'). \end{aligned}$$

Proof of Claim 1

We only need to show that

$$\begin{aligned} \Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\leqslant \max \hspace{0.55542pt}(u(x,y),u(x'\!,y')). \end{aligned}$$

If \({u_{X}}(x,x')={u_{Y}}(y,y')\), then there is nothing to prove. Otherwise, we assume without loss of generality that \({u_{X}}(x,x')<{u_{Y}}(y,y')\). If \(\max \hspace{0.55542pt}(u(x,y),u(x'\!,y'))<{u_{Y}}(y,y')\), then by the strong triangle inequality we must have \(u(x,y')={u_{Y}}(y,y')=u(x'\!,y)\). However, \(u(x'\!,y)\leqslant \max \hspace{0.55542pt}({u_{X}}(x,x'),u(x,y))<{u_{Y}}(y,y')\), which leads to a contradiction. Therefore,

$$\begin{aligned} \Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\leqslant \max \hspace{0.55542pt}(u(x,y),u(x'\!,y')). \end{aligned}$$

\(\square \)

By Claim 1, we have that

$$\begin{aligned}&\iint _{X\times Y \times X\times Y}\!\!\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\,\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\,\mu (dx'\hspace{0.55542pt}{\times }\hspace{1.111pt}dy')\\&\quad \leqslant \int _{X\times Y}\!\!u(x,y)\,\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)+\int _{X\times Y}\!\! u(x'\!,y')\,\mu (dx'\hspace{0.55542pt}{\times }\hspace{1.111pt}dy')\leqslant 2\hspace{1.111pt}u_{\textrm{GW},1}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}}). \end{aligned}$$

Therefore, \(u_{\textrm{GW},1}({\mathcal {X}},{\mathcal {Y}})\leqslant 2\hspace{1.111pt}u_{\textrm{GW},1}^{\textrm{sturm}}({\mathcal {X}},{\mathcal {Y}})\).

Applying Theorem 3.18 and (27), yields that for any \(p\in [1,\infty )\)

$$\begin{aligned} u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})&=(u_{\textrm{GW},1}(S_p({\mathcal {X}}),S_p({\mathcal {Y}})))^{1/p}\\ {}&\leqslant (2\hspace{1.111pt}u_{\textrm{GW},1}^{\mathrm{\,sturm}}(S_p({\mathcal {X}}),S_p({\mathcal {Y}})))^{1/p}=2^{1/p}\hspace{1.111pt}u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}}). \end{aligned}$$

1.3.2 Proof of Results in Example 3.21

It follows from [60, Rem. 5.17] that

$$\begin{aligned} d_{\textrm{GW},p}^{\mathrm{\,sturm}}\bigl ({\widehat{\Delta }}_n(1),{\widehat{\Delta }}_{2n}(1)\bigr )&\geqslant \frac{1}{4}\hspace{0.55542pt},\\ d_{\textrm{GW},p}\bigl ({\widehat{\Delta }}_n(1),{\widehat{\Delta }}_{2n}(1)\bigr )&\leqslant \frac{1}{2}\,\biggl (\frac{3}{2n}\biggr )^{\!1/p}. \end{aligned}$$

Then, by Proposition 3.3, we have that

$$\begin{aligned}u_{\textrm{GW},p}^{\mathrm{\,sturm}}\bigl ({\widehat{\Delta }}_n(1),{\widehat{\Delta }}_{2n}(1)\bigr ) \geqslant d_{\textrm{GW},p}^{\mathrm{\,sturm}}\bigl ({\widehat{\Delta }}_n(1),{\widehat{\Delta }}_{2n}(1)\bigr )\geqslant \frac{1}{4}\hspace{0.55542pt}.\end{aligned}$$

Let \(\mu _n\) denote the uniform probability measure of \({\widehat{\Delta }}_n(1)\). Since \({\widehat{\Delta }}_n(1)\) has the constant interpoint distance 1, it is obvious that for any coupling \(\mu \in {\mathcal {C}}(\mu _n,\mu _{2n})\), \({\textrm{dis}}_p(\mu ) = {\textrm{dis}}^{\textrm{ult}}_p(\mu )\) This implies that \(u_{\textrm{GW},p}({\widehat{\Delta }}_n(1),{\widehat{\Delta }}_{2n}(1)) =2\hspace{1.111pt}d_{\textrm{GW},p}({\widehat{\Delta }}_n(1),{\widehat{\Delta }}_{2n}(1))\leqslant ({3}/({2n}))^{1/p}\).

1.3.3 Proof of Theorem 3.22

First, we prove that \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\geqslant u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\). Indeed, for any \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\), we have that

$$\begin{aligned} \sup _{(x,y)\in {{\textrm{supp}}\hspace{0.55542pt}(\mu ) }}\!\! u(x,y)&=\!\!\sup _{(x,y),(x'\!,y')\in {{\textrm{supp}}\hspace{0.55542pt}(\mu ) }}\!\!\max \hspace{0.55542pt}(u(x,y),u(x'\!,y'))\\&\geqslant \!\!\sup _{(x,y),(x'\!,y')\in {{\textrm{supp}}\hspace{0.55542pt}(\mu ) }}\!\!\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y')) \geqslant u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}}), \end{aligned}$$

where the first inequality follows from Claim 1 in the proof of Theorem 3.19. Then, by a standard limit argument, we conclude that \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\geqslant u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\).

Next, we prove that \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\leqslant \min \hspace{0.88882pt}\{t\geqslant 0\,{|}\,{\mathcal {X}}_t\cong _{\textrm{w}} {\mathcal {Y}}_t\}\). Let \(t> 0\) be such that \({\mathcal {X}}_t\cong _{\textrm{w}} {\mathcal {Y}}_t\) and let \(\varphi :{{\mathcal {X}}}_t\rightarrow {{\mathcal {Y}}}_t\) denote such an isomorphism. Then, we define a function \(u:X\sqcup Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\sqcup Y\rightarrow {\mathbb {R}}_{\geqslant 0}\) as follows:

  1. 1.

    \(u|_{X\times X}:= u_X\) and \(u|_{Y\times Y}:= u_Y\);

  2. 2.

    for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\),

    $$\begin{aligned} u(x,y):= {\left\{ \begin{array}{ll} \,u_{Y_t}(\varphi ([x]_t^X),[y]_t^Y),&{}\text {if}\;\;\varphi ([x]_t^X)\ne [y]_t^Y,\\ \, t,&{}\text {if}\;\;\varphi ([x]_t^X)=[y]_t^Y; \end{array}\right. } \end{aligned}$$
  3. 3.

    for any \((y,x)\in Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\), \(u(y,x):= u(x,y)\).

Then, it is easy to verify that \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and that u is actually an ultrametric. Let \(Z:= (X\sqcup Y,u)\). By Lemma 2.8, we have

$$\begin{aligned}u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\leqslant d_{\textrm{W},\infty }^Z({\mu _{X}},{\mu _{Y}})\,=\!\max _{\begin{array}{c} B\in V(Z)\backslash \{Z\}\\ {\mu _{X}}(B)\ne {\mu _{Y}}(B) \end{array}}\!\!{\textrm{diam}}\hspace{0.55542pt}(B^*) . \end{aligned}$$

We verify that \(d_{\textrm{W},\infty }^Z({\mu _{X}},{\mu _{Y}})\leqslant t\) next. It is obvious that \(Z_t\cong X_t\cong Y_t\). Write \(X_t=\{[x_i]_t^X\}_{i=1}^n\) and \(Y_t=\{[y_i]_t^Y\}_{i=1}^n\) such that \([y_i]_t^Y=\varphi ([x_i]_t^X)\) for each \(i=1,\ldots ,n\). Then, \([x_i]_t^{Z}\!=[y_i]_t^{Z}\) and \(Z_t=\{[x_i]_t^{Z}\,{|}\,i=1,\ldots ,n\}\). Since \(\varphi \) is an isomorphism, for any \(i=1,\dots ,n\) we have that \({\mu _{X}}([x_i]_t^X)={\mu _{Y}}([y_i]_t^Y)\) and thus \({\mu _{X}}([x_i]_t^{Z})={\mu _{Y}}([y_i]_s^{Z})={\mu _{Y}}([x_i]_t^{Z})\) when \({\mu _{X}}\) and \({\mu _{Y}}\) are regarded as pushforward measures under the inclusion map \(X\hookrightarrow Z\) and \(Y\hookrightarrow Z\), respectively. Now for any \(B\in V(Z)\) (cf. Sect. 2.3), if \({\textrm{diam}}\hspace{0.55542pt}(B) \geqslant t\), then B is the union of certain \([x_i]_t^{Z}\)’s in \(Z_t\) and thus \({\mu _{X}}(B)={\mu _{Y}}(B)\). If \({\textrm{diam}}\hspace{0.55542pt}(B) < t\) and \({\textrm{diam}}\hspace{0.55542pt}(B^*) > t\), then there exists some \(x_i\) such that \(B=[x_i]_s^{Z}\) and \([x_i]_s^{Z}\!=[x_i]_t^{Z}\) where \(s:= {\textrm{diam}}\hspace{0.55542pt}(B) \). This implies that \({\mu _{X}}(B)={\mu _{Y}}(B)\). In consequence, we have that \(d_{\textrm{W},\infty }^Z({\mu _{X}},{\mu _{Y}})\leqslant t \) and thus \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\leqslant d_{\textrm{W},\infty }^{\,(X\sqcup Y,u)}({\mu _{X}},{\mu _{Y}})\leqslant t\). Therefore, \(u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\leqslant \inf \hspace{0.88882pt}\{t\geqslant 0\,{|}\,{\mathcal {X}}_t\cong _{\textrm{w}} {\mathcal {Y}}_t\}\).

Finally, by invoking Theorem 3.14, we conclude that

$$\begin{aligned} u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})=u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}}). \end{aligned}$$

1.3.4 Proof of Theorem 3.23

We prove the result via an explicit construction. By Theorem 3.22, we have \(s=u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})=u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\). By Theorem 3.14, there exists an isomorphism \(\varphi :{\mathcal {X}}_s\rightarrow {\mathcal {Y}}_s\). Since \(s>0\), by Lemma A.7, both \({\mathcal {X}}_s\) and \({\mathcal {Y}}_s\) are finite spaces. Let \(X_s=\{[x_1]_s^X\!,\dots ,[x_n]_s^X\}\), \(Y_s=\{[y_1]_s^Y\!,\dots ,[y_n]_s^Y\}\) and assume \([y_i]_s^Y\!=\varphi ([x_i]_s^X)\) for each \(i=1,\ldots ,n\). Let \(A:= \{x_1,\dots ,x_n\}\) and define \(\phi :A\rightarrow Y\) by sending \(x_i\) to \(y_i\) for each \(i=1,\ldots ,n\). We prove that \((A,\phi )\) satisfies the conditions in the statement.

Since \(\varphi \) is an isomorphism, for any \(1\leqslant i<j\leqslant n\),

$$\begin{aligned} {u_{Y}}(y_i,y_j)&=u_{Y_s}([y_i]_s^Y\!,[y_j]_s^Y)\\ {}&= u_{Y_s}(\varphi ([x_i]_s^X),\varphi ([x_j]_s^X))=u_{X_s}([x_i]_s^X\!,[x_j]_s^X)={u_{X}}(x_i,x_j). \end{aligned}$$

This implies that \(\phi :A\rightarrow Y\) is an isometric embedding and thus \((A,\phi )\in {\mathcal {A}}\).

It is obvious that \((Z_A)_s\) is isometric to both \(X_s\) and \(Y_s\). In fact, \([x_i]_s^{Z_A}=[y_i]_s^{Z_A}\) in \(Z_A\) for each \(i=1,\ldots ,n\) and \((Z_A)_s=\{[x_i]_s^{Z_A}\hspace{0.55542pt}{|}\,i=1,\ldots ,n\}\). Since \(\varphi \) is an isomorphism, for any \(i=1,\dots ,n\) we have that \({\mu _{X}}([x_i]_s^X)={\mu _{Y}}([y_i]_s^Y)\) and thus \({\mu _{X}}([x_i]_s^{Z_A})={\mu _{Y}}([y_i]_s^{Z_A})={\mu _{Y}}([x_i]_s^{Z_A})\) when \({\mu _{X}}\) and \({\mu _{Y}}\) are regarded as pushforward measures under the inclusion maps \(X\rightarrow Z_A\) and \(Y\rightarrow Z_A\), respectively. Now for any \(B\in V(Z_A)\) (cf. Sect. 2.3), if \({\textrm{diam}}\hspace{0.55542pt}(B) \geqslant s\), then B is the union of certain \([x_i]_s^{Z_A}\)’s and thus \({\mu _{X}}(B)={\mu _{Y}}(B)\). If otherwise \({\textrm{diam}}\hspace{0.55542pt}(B) < s\) and \({\textrm{diam}}\hspace{0.55542pt}(B^*) > s\), then there exists \(x_i\) such that \(B=[x_i]_t^{Z_A}\) and \([x_i]_t^{Z_A}\!=[x_i]_s^{Z_A}\) where \(t:= {\textrm{diam}}\hspace{0.55542pt}(B) \). This implies that \({\mu _{X}}(B)={\mu _{Y}}(B)\). By Lemma 2.8, we have \( d_{\textrm{W},\infty }^{Z_A}({\mu _{X}},{\mu _{Y}})\leqslant s\) and thus \( d_{\textrm{W},\infty }^{Z_A}({\mu _{X}},{\mu _{Y}})=s\) since \(d_{\textrm{W},\infty }^{Z_A}({\mu _{X}},{\mu _{Y}})\) is an upper bound for \(s=u_{\textrm{GW},\infty }^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\) due to (8).

1.3.5 Proof of Theorem 3.25

In this section, we prove Theorem 3.25 by modifying the proof of [60, Prop. 5.3].

Lemma B.14

Let \((X,{u_{X}})\) and \((Y,{u_{Y}})\) be compact ultrametric spaces and let \(S\subseteq X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) be non-empty. Assume that \(\sup _{(x,y),(x'\!,y')\in S}\Lambda _\infty (u_X(x,x'),u_Y(y,y'))\leqslant \eta \). Define \(u_S:X\sqcup Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\sqcup Y\rightarrow {\mathbb {R}}_{\geqslant 0}\) as follows:

  1. (i)

    \(u_S|_{X\times X}:= u_X\) and \(u_S|_{Y\times Y}:= u_Y\);

  2. (ii)

    for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), \(u_S(x,y):= \inf _{(x'\!,y')\in S}\max \hspace{0.55542pt}(u_X(x,x'),u_Y(y,y'),\eta )\);

  3. (iii)

    for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), \(u_S(y,x):= u_S(x,y)\).

Then, \(u_S\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) and \(u_S(x,y)\leqslant \eta \) for all \((x,y)\in S\).

Proof

That \(u_S\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) essentially follows by [93, Lem. 1.1]. It remains to prove the second half of the statement. For \((x,y)\in S\), we set \((x'\!,y'):= (x,y)\). This yields

$$\begin{aligned} u_S(x,y)\leqslant \max \hspace{0.88882pt}(u_X(x,x'),u_Y(y,y'),\eta )=\max \hspace{0.55542pt}(0,0,\eta )=\eta . \end{aligned}$$

\(\square \)

Proof of Theorem 3.25

Let \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) be a coupling s.t. \(\Vert \Gamma _{X,Y}^\infty \Vert _{L^p(\mu \otimes \mu )}<\delta ^5\). Set \(\varepsilon := 4v_\delta (X)\leqslant 4\). By [60, Claim 10.1], there exist a positive integer \(N\leqslant [1/\delta ]\) and points \(x_1,\ldots ,x_N\) in X such that \(\min _{\,i\ne j}u_X(x_i,x_j)\geqslant {\varepsilon }/{2}\), \(\min _{\,i}{\mu _{X}}( B_\varepsilon ^X(x_i)) >\delta \) and \({\mu _{X}}\bigl (\bigcup _{i=1}^NB_\varepsilon ^X(x_i)\bigr )\geqslant 1-\varepsilon \).

Claim 1

For every \(i=1,\ldots ,N\) there exists \(y_i\in Y\) such that

$$\begin{aligned}\mu \bigl (B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}B_{2(\varepsilon +\delta )}^Y(y_i)\bigr )\geqslant (1-\delta ^2)\hspace{1.111pt}{\mu _{X}}( B_\varepsilon ^X(x_i)). \end{aligned}$$

Proof of Claim 1

Assume the claim is false for some i and let

$$\begin{aligned} Q_i(y)=B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}( Y\backslash B_{2(\varepsilon +\delta )}^Y(y)). \end{aligned}$$

Then, as \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) it holds

$$\begin{aligned} {\mu _{X}}( B_\varepsilon ^X(x_i))&=\mu \bigl ( B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}Y\bigr )\\ {}&=\mu \bigl ( B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}B_{2(\varepsilon +\delta )}^Y(y)\bigr )+\mu \bigl ( B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}\bigl ( Y\backslash B_{2(\varepsilon +\delta )}^Y(y)\bigr )\bigr ). \end{aligned}$$

Consequently, we have that \(\mu (Q_i(y))\geqslant \delta ^2{\mu _{X}}( B_\varepsilon ^X(x_i)) \). Further, let

$$\begin{aligned}{\mathcal {Q}}_i:= \bigl \{(x,y,x'\!,y')\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\mid x,x'\!\in B_\varepsilon ^X(x_i),\, {u_{Y}}(y,y')\geqslant 2(\varepsilon +\delta )\bigr \}.\end{aligned}$$

Clearly, it holds for \((x,y,x'\!,y')\in {\mathcal {Q}}_i\) that

$$\begin{aligned}\Gamma _{X,Y}^\infty (x,y,x'\!,y')=\Lambda _\infty ( u_X(x,x'),u_Y(y,y'))=u_Y(y,y')\geqslant 2\delta .\end{aligned}$$

Further, we have that \(\mu \hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu ({\mathcal {Q}}_i)\geqslant \delta ^4\). Indeed, it holds

$$\begin{aligned} \mu \hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu ({\mathcal {Q}}_i)&=\int _{B_\varepsilon ^X(x_i)\times Y}\!\int _{Q_i(y)}\!1\,\mu (dx'\hspace{0.55542pt}{\times }\hspace{1.111pt}dy')\,\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\\ {}&=\int _{B_\varepsilon ^X(x_i)\times Y}\!\mu (Q_i(y))\,\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\\&={\mu _{X}}( B_\varepsilon ^X(x_i)) \int _Y\!\mu (Q_i(y))\,{\mu _{Y}}(dy) \geqslant ({\mu _{X}}( B_\varepsilon ^X(x_i)) )^2\hspace{1.111pt}\delta ^2 \geqslant \delta ^4. \end{aligned}$$

However, this yields that

$$\begin{aligned} \Vert \Gamma ^\infty _{X,Y}\Vert _{L^p(\mu \otimes \mu )}&\geqslant \Vert \Gamma ^\infty _{X,Y}\Vert _{L^1(\mu \otimes \mu )}\\ {}&\geqslant \Vert \Gamma ^\infty _{X,Y}\mathbb {1}_{{\mathcal {Q}}_i}\Vert _{L^1(\mu \otimes \mu )}\geqslant 2\delta \hspace{1.111pt}{\cdot }\hspace{1.111pt}\mu \hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu ({\mathcal {Q}}_i)\geqslant 2\delta ^5, \end{aligned}$$

which contradicts \(\Vert \Gamma _{X,Y}^\infty \Vert _{L^p(\mu \otimes \mu )}<\delta ^5\). \(\square \)

Define for each \(i=1,\ldots ,N\), \(S_i:= B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}B_{2(\varepsilon +\delta )}^Y(y_i)\). Then, by Claim 1, \(\mu (S_i)\geqslant \delta (1-\delta ^2)\), for all \(i=1,\ldots ,N\).

Claim 2

\(\Gamma _{X,Y}^\infty (x_i,y_i,x_j,y_j)\leqslant 6\hspace{0.55542pt}(\varepsilon +\delta )\) for all \(i,j=1,\ldots ,N\).

Proof of Claim 2

Assume the claim fails for some \((i_0,j_0)\), i.e.,

$$\begin{aligned} \Lambda _\infty (u_X(x_{i_0},x_{j_0}),u_Y(y_{i_0},y_{j_0}))>6\hspace{0.55542pt}(\varepsilon +\delta )>0. \end{aligned}$$

Then, we have \(\Lambda _\infty (u_X(x_{i_0},x_{j_0}),u_Y(y_{i_0},y_{j_0}))=\max \hspace{0.88882pt}(u_X(x_{i_0},x_{j_0}),u_Y(y_{i_0},y_{j_0}))\). We assume without loss of generality that

$$\begin{aligned}u_X(x_{i_0},x_{j_0})=\Lambda _\infty (u_X(x_{i_0},x_{j_0}),u_Y(y_{i_0},y_{j_0}))> u_Y(y_{i_0},y_{j_0}).\end{aligned}$$

Consider any \((x,y)\in S_{i_0}\) and \((x'\!,y')\in S_{j_0}\). By the strong triangle inequality and the fact that \(u_X(x_{i_0},x_{j_0})>6(\varepsilon +\delta )>\varepsilon \), it is easy to verify that \(u_X(x,x')=u_X(x_{i_0},x_{j_0})\). Moreover,

$$\begin{aligned} u_Y(y,y')&\leqslant \max \hspace{1.111pt}\bigl (u_Y(y,y_{i_0}),u_Y(y_{i_0},y_{j_0}),u_Y(y_{j_0},y')\bigr )\\&< \max \hspace{0.55542pt}\bigl (2\hspace{0.55542pt}(\varepsilon +\delta ), u_X(x_{i_0},x_{j_0}),2\hspace{0.55542pt}(\varepsilon +\delta )\bigr )=u_X(x_{i_0},x_{j_0})=u_X(x,x'). \end{aligned}$$

Therefore, \(\Gamma _{X,Y}^\infty (x,y,x'\!,y')=u_X(x,x')=u_X(x_{i_0},x_{j_0})= \Gamma _{X,Y}^\infty (x_{i_0},y_{i_0},x_{j_0},y_{j_0})>6\hspace{0.55542pt}(\varepsilon +\delta )>2\delta \). Consequently, we have that

$$\begin{aligned} \Vert \Gamma _{X,Y}^\infty \Vert _{L^p(\mu \otimes \mu )}&\geqslant \Vert \Gamma _{X,Y}^\infty \Vert _{L^1(\mu \otimes \mu )}\\ {}&\geqslant \Vert \Gamma _{X,Y}^\infty \mathbb {1}_{S_{i_0}}\mathbb {1}_{S_{j_0}}\Vert _{L^1(\mu \otimes \mu )}\geqslant 2\hspace{1.111pt}\delta \mu (S_{i_0})\hspace{1.111pt}\mu (S_{j_0}) >2\hspace{1.111pt}\delta (\delta (1-\delta ^2))^2. \end{aligned}$$

However, for \(\delta \leqslant 1/2\), \(2\delta \hspace{0.55542pt}(\delta (1-\delta ^2))^2\geqslant 2\delta ^5\). This leads to a contradiction. \(\square \)

Consider \(S\subseteq X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) given by \(S:= \{(x_i,y_i)\,{|}\,i=1,\ldots ,N\}\). Let \(u_S\) be the ultrametric on \(X\sqcup Y\) given by Lemma B.14. By Claim 2,

$$\begin{aligned} \sup _{(x,y),(x'\!,y')\in S}\Gamma _{X,Y}^\infty (x,y,x'\!,y')\leqslant 6\hspace{0.55542pt}(\varepsilon +\delta ). \end{aligned}$$

Then, for all \(i=1,\ldots ,N\) we have that \(u_S(x_i,y_i)\leqslant 6(\varepsilon +\delta )\) and for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) we have that \(u_S(x,y)\leqslant \max \hspace{0.55542pt}({\textrm{diam}}\hspace{0.55542pt}(X) ,{\textrm{diam}}\hspace{0.55542pt}(Y) ,6(\varepsilon +\delta )) \leqslant \max \hspace{0.55542pt}({\textrm{diam}}\hspace{0.55542pt}(X) ,{\textrm{diam}}\hspace{0.55542pt}(Y) ,27)=:M'\). Here in the second inequality we use the assumption that \(\delta <{1}/{2}\) and the fact that \(\varepsilon =4\hspace{0.55542pt}v_\delta (X)\leqslant 4\).

Claim 3

Fix \(i\in \{1,\dots ,N\}\). Then, for all \((x,y)\in S_i\), it holds \(u_S(x,y)\leqslant 6\hspace{0.55542pt}(\varepsilon +\delta )\).

Proof of Claim 3

Let \((x,y)\in S_i\). Then, \({u_{X}}(x,x_i)\leqslant \varepsilon \) and \({u_{Y}}(y,y_i)\leqslant 2\hspace{0.55542pt}(\varepsilon +\delta )\). Then, by the strong triangle inequality for \(u_S\) we obtain

$$\begin{aligned} u_S(x,y)&\leqslant \max \hspace{1.111pt}\{{u_{X}}(x,x_i),{u_{Y}}(y,y_i), u_S(x_i,y_i)\}\\ {}&\leqslant \max \hspace{1.111pt}\{\varepsilon ,2\hspace{0.55542pt}(\varepsilon +\delta ),6\hspace{0.55542pt}(\varepsilon +\delta )\}\leqslant 6\hspace{0.55542pt}(\varepsilon +\delta ). \end{aligned}$$

\(\square \)

Let \(L:= \bigcup _{i=1}^NS_i\). The next step is to estimate the mass of \(\mu \) in the complement of L.

Claim 4

\(\mu (X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\backslash L)\leqslant \varepsilon +\delta \).

Proof of Claim 4

For each \(i=1,\ldots ,N\), let

$$\begin{aligned} A_i:= B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}( Y\backslash B_{2(\varepsilon +\delta )}^Y(y_i)). \end{aligned}$$

Then,

$$\begin{aligned}A_i=\bigl ( B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}Y\bigr ) \backslash \bigl ( B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}B_{2(\varepsilon +\delta )}^Y(y_i)\bigr )=\bigl ( B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}Y\bigr ) \backslash S_i.\end{aligned}$$

Hence, \(\mu (A_i)=\mu ( B_\varepsilon ^X(x_i)\hspace{1.111pt}{\times }\hspace{1.111pt}Y) -\mu (S_i)={\mu _{X}}( B_\varepsilon ^X(x_i))-\mu (S_i)\), where the last equality follows from the fact that \(\mu \in {\mathcal {M}}({\mu _{X}},{\mu _{Y}})\). By Claim 1, we have that \(\mu (S_i)\geqslant {\mu _{X}}( B_\varepsilon ^X(x_i)) (1-\delta ^2)\). Consequently, \(\mu (A_i)\leqslant {\mu _{X}}( B_\varepsilon ^X(x_i)) \delta ^2\). Notice that

$$\begin{aligned}X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\backslash L\subseteq \biggl (X\mathbin {\Big \backslash } \bigcup _{i=1}^NB_\varepsilon ^X(x_i)\biggr )\hspace{1.111pt}{\times }\hspace{1.111pt}Y\cup \biggl (\,\bigcup _{i=1}^N A_i\biggr ). \end{aligned}$$

Hence,

$$\begin{aligned} \mu (X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\backslash L)&\leqslant {\mu _{X}}\biggl (X\mathbin {\Big \backslash } \bigcup _{i=1}^NB_\varepsilon ^X(x_i)\biggr )+\sum _{i=1}^N\mu (A_i)\\&\leqslant 1-{\mu _{X}}\biggl (\,\bigcup _{i=1}^NB_\varepsilon ^X(x_i)\biggr )+\sum _{i=1}^N\delta ^2{\mu _{X}}( B_\varepsilon ^X(x_i))\leqslant \varepsilon +N\hspace{1.111pt}{\cdot }\hspace{1.111pt}\delta ^2\leqslant \varepsilon +\delta . \end{aligned}$$

Here, the third inequality follows from the choice of the points \(x_i\)s at the beginning of this section and from the fact that \(N\leqslant [1/\delta ]\). \(\square \)

Now,

$$\begin{aligned} \int _{X\times Y}(u_S(x,y))^p\hspace{1.111pt}\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)&=\biggl (\int _L+\int _{X\times Y\backslash L}\biggr ) (u_S(x,y))^p\hspace{1.111pt}\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\\ {}&\leqslant (6\hspace{0.55542pt}(\varepsilon +\delta ))^p+{M'}^p\hspace{0.55542pt}{\cdot }\hspace{1.66656pt}(\varepsilon +\delta ). \end{aligned}$$

Since we have for any \(a,b\geqslant 0\) and \(p\geqslant 1\) that \(a^{1/p}+b^{1/p}\geqslant (a+b)^{1/p}\), we obtain

$$\begin{aligned} u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})&\leqslant (\varepsilon +\delta )^{1/p}\bigl (6(\varepsilon +\delta )^{1-{1/p}}+M'\bigr )\\ {}&\leqslant (\varepsilon +\delta )^{1/p}(27+M') \leqslant (4v_\delta ({\mathcal {X}})+\delta )^{1/p}\hspace{0.55542pt}{\cdot }\hspace{1.66656pt}M, \end{aligned}$$

where we used \(\varepsilon =4v_\delta ({\mathcal {X}})\) and \(M:= 2\max \hspace{0.55542pt}({\textrm{diam}}\hspace{0.55542pt}(X) ,{\textrm{diam}}\hspace{0.55542pt}(Y) )+54\geqslant M'+27\). Since the roles of \({\mathcal {X}}\) and \({\mathcal {Y}}\) are symmetric, we have \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})\leqslant (4\min \hspace{0.55542pt}(v_\delta ({\mathcal {X}}),v_\delta (Y))+\delta )^{1/p}\hspace{0.55542pt}{\cdot }\hspace{1.111pt}M\). \(\square \)

1.4 Proofs from Sect. 3.4

The subsequent section contains the full proofs of the statements in Sect. 3.4.

1.4.1 Proof of Theorem 3.27

Part 1. We first prove that \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p})\) is non-separable for each \(p\in [1,\infty ]\). Recall notation in Example 3.5 and consider the family \(\{{\widehat{\Delta }}_2(a)\}_{a\in [1,2]}\).

Claim 1

For all \( a\ne b\in [1,2]\), \( u_{\textrm{GW},p}({\widehat{\Delta }}_2(a),{\widehat{\Delta }}_2(b))=2^{-{1/p}}\Lambda _\infty (a,b)\geqslant 2^{-{1/p}}\), where \(2^{-{1/\infty }}:= 1\).

Proof of Claim 1

First note by Theorem 4.1 that

$$\begin{aligned}u_{\textrm{GW},p} ({\widehat{\Delta }}_2(a),{\widehat{\Delta }}_2(b))\geqslant {\textbf{SLB}}_{p}^{\textrm{ult}}({\widehat{\Delta }}_2(a),{\widehat{\Delta }}_2(b)).\end{aligned}$$

It is easy to verify that \({\textbf{SLB}}_{p}^{\textrm{ult}}({\widehat{\Delta }}_2(a),{\widehat{\Delta }}_2(b))=2^{-{1/p}}\Lambda _\infty (a,b)\). On the other hand, consider the diagonal coupling between \(\mu _a\) and \(\mu _b\), then for \(p\in [1,\infty )\)

$$\begin{aligned}u_{\textrm{GW},p} ({\widehat{\Delta }}_2(a),{\widehat{\Delta }}_2(b))\leqslant \biggl (2\cdot \Lambda _\infty (a,b)^p\cdot \frac{1}{2}\cdot \frac{1}{2}\biggr )^{\!1/p}=2^{-{1/p}}\Lambda _\infty (a,b), \end{aligned}$$

and for \(p=\infty \), \(u_{\textrm{GW},\infty }({\widehat{\Delta }}_2(a),{\widehat{\Delta }}_2(b))\leqslant \Lambda _\infty (a,b)\). This concludes the proof. \(\square \)

By Claim 1, we have that \(\{{\widehat{\Delta }}_2(a)\}_{a\in [1,2]}\) is an uncountable subset of \({\mathcal {U}}^{\textrm{w}}\) with pairwise distance greater than \(2^{-{1}/{p}}\), which implies that \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p})\) is non-separable.

Now for \(p\in [1,\infty )\), we show that \(u_{\textrm{GW},p}\) is not complete. Consider the family \(\{\Delta _{2^n}(1)\}_{n\in {\mathbb {N}}}\) of \(2^n\)-point spaces with unitary interpoint distances. Endow each space \(\Delta _{2^n}(1)\) with the uniform measure \(\mu _n\) and denote the corresponding ultrametric measure space by \({\widehat{\Delta }}_{2^n}(1)\). It is proven in [84, Exam. 2.2] that \(\{{\widehat{\Delta }}_{2^n}(1)\}_{n\in {\mathbb {N}}}\) is a Cauchy sequence with respect to \(d_{\textrm{GW},p}\) without a compact metric measure space as limit. It is not hard to check that

$$\begin{aligned}u_{\textrm{GW},p}({\widehat{\Delta }}_{2^m}(1),{\widehat{\Delta }}_{2^n}(1))=2d_{\textrm{GW},p}({\widehat{\Delta }}_{2^m}(1),{\widehat{\Delta }}_{2^n}(1)),\quad \text {for all}\;\; n,m\in {\mathbb {N}}. \end{aligned}$$

Therefore, \(\{{\widehat{\Delta }}_{2^n}(1)\}_{n\in {\mathbb {N}}}\) is a Cauchy sequence with respect to \(u_{\textrm{GW},p}\) without limit in \({\mathcal {U}}^{\textrm{w}}\). This implies that \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p})\) is not complete.

By Theorem 3.19 and that \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p})\) is not separable, \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\) is not separable. As for completeness, consider the subset \(X:= \{1-{1}/{n}\}_{n\in {\mathbb {N}}}\subseteq ({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\). By Lemma A.2, X is not a compact ultrametric space. Let \(\mu _0\in {\mathcal {P}}(X)\) be a probability defined as follows:

$$\begin{aligned}\mu _0\biggl (\biggl \{1-\frac{1}{n}\biggr \}\biggr ):= 2^{-n},\quad \text {for all}\;\; n\in {\mathbb {N}}. \end{aligned}$$

For each \(N\in {\mathbb {N}}\), let \(X_N:= \{1-{1}/{n}\,{|}\,n=1,\ldots ,N\}\). Since each \(X_N\) is finite, \((X_N,\Lambda _\infty )\) is a compact ultrametric space. Let \(\mu _N\in {\mathcal {P}}(X_N)\) be a probability defined as follows:

$$\begin{aligned}\mu _N\biggl (\biggl \{1-\frac{1}{n}\biggr \}\biggr ):= {\left\{ \begin{array}{ll}\, 2^{-n},&{} 1\leqslant n<N,\\ \,2^{-N+1}&{}n=N. \end{array}\right. }\end{aligned}$$

Then, it is easy to verify (e.g. via Theorem 3.7) that \(\{(X_N,\Lambda _\infty ,\mu _N)\}_{N\in {\mathbb {N}}}\) is a \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\) Cauchy sequence with \((X,\Lambda _\infty ,\mu _0)\) being the limit. Since the set X is not compact, \((X,\Lambda _\infty ,\mu _0)\notin {\mathcal {U}}^{\textrm{w}}\) and thus \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\) is not complete.

Part 2. That \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},\infty })\) is non-separable is already proved in Part 1. We prove completeness next. Given a Cauchy sequence \(\{{\mathcal {X}}_n=(X_n,u_n,\mu _n)\}_{n\in {\mathbb {N}}}\) with respect to \(u_{\textrm{GW},\infty }\), we have that the underlying ultrametric spaces \(\{X_n\}_{n\in {\mathbb {N}}}\) form a Cauchy sequence w.r.t. \(u_{\textrm{GH}}\) due to Corollary 3.16. Since \(({\mathcal {U}},u_{\textrm{GH}})\) is complete (see [93, Prop. 2.1]), there exists a compact ultrametric space \((X,u_X)\) such that \(\lim _{\,n\rightarrow \infty }u_{\textrm{GH}}(X_n,X)=0\).

Let \(\{\delta _n\}_{n\in {\mathbb {N}}}\) be a sequence of positive numbers converging to 0 such that \(\delta _n\geqslant u_{\textrm{GH}}(X_n,X)\). By Theorem 2.5, we have that \((X_n)_{\delta _n}\!\cong X_{\delta _n}\). Denote by \({\widehat{\mu }}_n\in {\mathcal {P}}(X_{\delta _n})\) the pushforward of \((\mu _n)_{\delta _n}\) under the isometry. Furthermore, we have by Lemma A.7 that \(X_{\delta _n}\) is finite and we let \(X_{\delta _n}=\{[x_1]_{\delta _n},\ldots ,[x_k]_{\delta _n}\}\) for \(x_1,\ldots ,x_k\in X\). Based on this, we define \(\nu _n:= \sum _{i=1}^k{\widehat{\mu }}_n([x_i]_{\delta _n})\hspace{1.111pt}{\cdot }\hspace{1.111pt}\delta _{x_i}\in {\mathcal {P}}(X) \), where \(\delta _{x_i}\) is the Dirac measure at \(x_i\). Since X is compact, \({\mathcal {P}}(X)\) is weakly compact. Therefore, the sequence \(\{\nu _n\}_{n\in {\mathbb {N}}}\) has a cluster point \(\nu \in {\mathcal {P}}(X)\).

Now we show that \({\mathcal {X}}:= (X,u_X,\nu )\) is a \(u_{\textrm{GW},\infty }\) cluster point of \(\{{\mathcal {X}}_{n}\}_{n\in {\mathbb {N}}}\) and thus the limit of \(\{{\mathcal {X}}_n\}_{n\in {\mathbb {N}}}\) (since \(\{{\mathcal {X}}_n\}_{n\in {\mathbb {N}}}\) is Cauchy). Without loss of generality, we assume that \(\{\nu _n\}_{n\in {\mathbb {N}}}\) weakly converges to \(\nu \). Fix any \(\varepsilon >0\), we need to show that \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {X}}_n)\leqslant \varepsilon \) when n is large enough. For any fixed \(x_*\!\in X\), \([x_*]_{\varepsilon }\) is both an open and closed ball in X. Therefore, \(\nu ([x_*]_{\varepsilon })=\lim _{\,n\rightarrow \infty }\nu _n([x_*]_{\varepsilon })\) (see e.g. [7, Thm. 2.1]). Since \(\delta _n\rightarrow 0\) as \(n\rightarrow \infty \), there exists \(N_1>0\) such that for any \(n>N_1\), \(\delta _n<\varepsilon \). We specify an isometry \(\varphi _n:(X_n)_{\delta _n}\!\rightarrow X_{\delta _n}\) that gives rise to the construction of \(\nu _n\). Then, we let \(\psi _n:(X_n)_\varepsilon \rightarrow X_\varepsilon \) be the isometry such that the following diagram commutes:

figure c

Assume that \([x_*]_\varepsilon ^X=\bigcup _{i=1}^l[x_i]_{\delta _n}^X\). Let \(x_*^n\in X_n\) be such that \(\psi _n([x_*^n]_\varepsilon ^{X_n})=[x_*]_\varepsilon ^X\) and let \(x_1^n,\ldots ,x_l^n\in X_n\) be such that \(\varphi _n([x_i^n]_{\delta _n}^{X_n})=[x_i]_{\delta _n}^X\) for each \(i=1,\ldots ,l\). Then, \([x^n_*]_\varepsilon ^{X_n}\!=\bigcup _{i=1}^l[x_i^n]_{\delta _n}^{X_n}\). Therefore,

$$\begin{aligned} \nu _n([x_*]_\varepsilon ^X)&=\sum _{i=1}^l\nu _n([x_i]_{\delta _n}^X)\\ {}&=\sum _{i=1}^l{\widehat{\mu }}_n([x_i]_{\delta _n}^X) =\sum _{i=1}^l{\mu }_n\bigl ([x_i^n]_{\delta _n}^{X_n}\bigr )=\mu _n\bigl ([x^n_*]_\varepsilon ^{X_n}\bigr ). \end{aligned}$$

Since \({\mathcal {X}}_n\) is a Cauchy sequence, there exists \(N_2>0\) such that \(u_{\textrm{GW},\infty }({\mathcal {X}}_n,{\mathcal {X}}_m)<\varepsilon \) when \(n,m>N_2\). Then, by Theorem 3.14, \(({\mathcal {X}}_n)_\varepsilon \cong _{\textrm{w}}({\mathcal {X}}_m)_\varepsilon \) for all \(n,m>N_2\). By Lemma A.7, \((X_n)_\varepsilon \) is finite, then \((X_n)_\varepsilon \) has cardinality independent of n when \(n>N_2\). For all \(n>N_2\), we define the finite set \(A_n:= \{\mu _n([x^n]_\varepsilon ^{X_n})\,{|}\,x^n\in X_n\}\). \(A_n\) is independent of n since \(({\mathcal {X}}_n)_\varepsilon \cong _{\textrm{w}}({\mathcal {X}}_m)_\varepsilon \) for all \(n,m>N_2\). This implies that \(\mu _n([x^n_*]_\varepsilon ^{X_n})\) only takes value in a finite set \(A_n\). Combining with the fact that \(\lim _{\,n\rightarrow \infty }\mu _n([x^n_*]_\varepsilon ^{X_n})=\lim _{\,n\rightarrow \infty }\nu _n([x]_\varepsilon ^X)=\nu ([x_*]_\varepsilon ^X)\) exists, there exists \(N_3>0\) such that when \(n>N_3\), \(\mu _n([x^n_*]_\varepsilon )\equiv C\) for some constant C. This implies that \(\nu ([x_*]_\varepsilon ^X)=\mu _n([x^n_*]_\varepsilon ^{X_n})\), when \(n>\max \hspace{0.55542pt}(N_1,N_2,N_3)\). Since \(X_\varepsilon \) is finite, there exists a common \(N>0\) such that for all \(n>N\) and for all \( [x_*]_\varepsilon \in X_\varepsilon \) we have \(\nu ([x_*]_\varepsilon ^X)=\mu _n([x^n_*]_\varepsilon ^{X_n}) \), where \([x^n_*]^{X_n}_\varepsilon =\psi ^{-1}_n([x_*]_\varepsilon ^X)\in (X_n)_\varepsilon \). This indicates that \(\nu _\varepsilon =(\psi _n)_\#\,(\mu _n)_\varepsilon \) when \(n>N\). Therefore, \({\mathcal {X}}_\varepsilon \cong _{\textrm{w}} ({\mathcal {X}}_n)_\varepsilon \) and thus \(u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {X}}_n)\leqslant \varepsilon \).

1.4.2 Proof of Proposition 3.28

Next, we will demonstrate Proposition 3.28. However, before we come to this we recall some facts about p-metric and p-geodesic spaces.

Lemma B.15

([64, Prop. 7.30]) Given \(p\in [1,\infty )\), if X is a p-metric space, then X is not q-geodesic for all \(1\leqslant q<p\).

Lemma B.16

([64, Prop. 7.27]) Let X be a geodesic metric space. Then, for any \(p\geqslant 1\), \(S_{1/p}(X)\) is p-geodesic, where \(S_\alpha \) denotes the snowflake transform for \(\alpha >0\) (cf. Sect. 3.3).

For \(p=1\), the proof is based on the following property of the 1-Wasserstein space.

Lemma B.17

([9, Thm. 5.1]) Let X be a compact metric space. Then, the space \(W_1(X):= ({\mathcal {P}}(X),d_{\textrm{W},1}^X)\) is a geodesic space.

Based on the above results and Corollary B.2, the proof of Proposition 3.28 is straightforward.

Proof of Proposition 3.28

Let \({\mathcal {X}}\) and \({\mathcal {Y}}\) be two compact ultrametric measure spaces. First, we consider the case \(p=1\). By Corollary B.2, there exist a compact ultrametric space Z and isometric embeddings \(\phi :X\hookrightarrow Z\) and \(\psi :Y\hookrightarrow Z\) such that

$$\begin{aligned}u_{\textrm{GW},p}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}})=d_{\textrm{W},p}^Z(\phi _\#\,{\mu _{X}},\psi _\#\,{\mu _{Y}}).\end{aligned}$$

The space \(W_1(Z)\) is geodesic (cf. Lemma B.17). Therefore, there exists a Wasserstein geodesic \({\widetilde{\gamma }}:[0,1]\rightarrow W_1(Z)\) connecting \(\phi _\#\,\mu _X\) and \(\psi _\#\,\mu _Y\). This induces a curve \(\gamma :[0,1]\rightarrow {\mathcal {U}}^{\textrm{w}}\) where for each \(t\in [0,1]\),

$$\begin{aligned} \gamma (t):= \bigl (\textrm{supp}\hspace{1.111pt}({\widetilde{\gamma }}(t)),u|_{\textrm{supp}({\widetilde{\gamma }}(t))\hspace{1.111pt}{\times }\hspace{1.111pt}\textrm{supp}({\widetilde{\gamma }}(t))},{\widetilde{\gamma }}(t)\bigr ). \end{aligned}$$

Note that \(\gamma (0)\cong _{\textrm{w}}{\mathcal {X}}\) and \(\gamma (1)\cong _{\textrm{w}}{\mathcal {Y}}\) and hence we simply replace \(\gamma (0)\) and \(\gamma (1)\) with \({\mathcal {X}}\) and \({\mathcal {Y}}\), respectively. Now, for each \(s,t\in [0,1]\), we have that

$$\begin{aligned} d_{\textrm{GW},1}^{\mathrm{\,sturm}}(\gamma (s),\gamma (t))&\leqslant d_{\textrm{W},1}^{Z}({\widetilde{\gamma }}(s),{\widetilde{\gamma }}(t))\\ {}&=|s-t|\hspace{1.111pt}d_{\textrm{W},1}^{Z}({\widetilde{\gamma }}(0),{\widetilde{\gamma }}(1))=|s-t|\hspace{1.111pt}d_{\textrm{GW},1}^{\mathrm{\,sturm}}({\mathcal {X}},{\mathcal {Y}}). \end{aligned}$$

Therefore, \(\gamma \) is a geodesic connecting \({\mathcal {X}}\) and \({\mathcal {Y}}\) and thus \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}})\) is geodesic.

For the case \(p>1\), by Corollary B.13, \(S_p({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\cong ({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}})\). This implies that \(S_{1/p}({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},1}^{\mathrm{\,sturm}})\cong ({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\). Hence, by Lemma B.16, \(({\mathcal {U}}^{\textrm{w}}\!,u_{\textrm{GW},p}^{\mathrm{\,sturm}})\) is p-geodesic. \(\square \)

1.5 Technical Details from Sect. 3

In this section, we address various technical issues from Sect. 3.

1.5.1 The Wasserstein Pseudometric

Given a set X, a pseudometric is a symmetric function \(d_X:X\hspace{1.111pt}{\times }\hspace{1.111pt}X\rightarrow {\mathbb {R}}_{\geqslant 0}\) satisfying the triangle inequality and \(d_X(x,x)=0\) for all \(x\in X\). Note that if moreover \(d_X(x,y)=0\) implies \(x=y\), then \(d_X\) is a metric. There is a canonical identification on pseudometric spaces \((X,d_X)\): \(x\sim x'\) if \(d_X(x,x')=0\). Then, \(\sim \) is in fact an equivalence relation and we define the quotient space \({\widetilde{X}}=X/{\sim }\). Define a function \({\widetilde{d}}_X:{\widetilde{X}}\hspace{1.111pt}{\times }\hspace{1.111pt}{\widetilde{X}}\rightarrow {\mathbb {R}}_{\geqslant 0}\) as follows:

$$\begin{aligned}{\widetilde{d}}_X([x],[x']):= {\left\{ \begin{array}{ll} \,d_X(x,x')&{}\text {if}\;\;d_X(x,x')\ne 0,\\ \,0&{}\text {otherwise}. \end{array}\right. }\end{aligned}$$

\({\widetilde{d}}_X\) turns out to be a metric on \({\widetilde{X}}\). In the sequel, the metric space \(({\widetilde{X}},{\widetilde{d}}_X)\) is referred to as the metric space induced by the pseudometric space \((X,d_X)\). Note that \({\widetilde{d}}_X\) preserves the induced topology (see e.g. [41]) and thus the quotient map \(\Psi :X\rightarrow {\widetilde{X}}\) is continuous.

Analogously to the Wasserstein distance, which is defined for probability measures on metric spaces, we define the Wasserstein pseudometric for measures on compact pseudometric spaces as done in [85]. Let \(\alpha ,\beta \in {\mathcal {P}}(X)\). Then, we define for \(p\in [1,\infty )\) the Wasserstein pseudometric of order p as

$$\begin{aligned} d_{\textrm{W},p}^{\,(X,d_X)}(\alpha ,\beta ):= \biggl (\inf _{\mu \in {\mathcal {C}}(\alpha ,\beta )}\int _{X\times X} (d_X(x,y))^p\hspace{1.111pt}\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\biggr )^{\!1/p} \end{aligned}$$
(28)

and for \(p=\infty \) as

$$\begin{aligned} d_{\textrm{W},\infty }^{\,(X,d_X)}(\alpha ,\beta ):= \!\inf _{\mu \in {\mathcal {C}}(\alpha ,\beta )}\sup _{(x,y)\in \textrm{supp}(\mu )}\!\! u(x,y). \end{aligned}$$
(29)

It is easy to see that the Wasserstein pseudometric is closely related to the Wasserstein distance on the induced metric space. More precisely, one can show the following.

Lemma B.18

Let \((X,d_X)\) denote a compact pseudometric space, let \(\alpha ,\beta \in {\mathcal {P}}(X)\). Then, it follows for \(p\in [1,\infty ]\) that

$$\begin{aligned} d_{\textrm{W},p}^{\,(X,d_X)}(\alpha ,\beta )=d_{\textrm{W},p}^{\,({\widetilde{X}},{\widetilde{d}}_X)}(\Psi _\#\,{\alpha },\Psi _\#\,{\beta }) \end{aligned}$$
(30)

and that the infimum in (28) (resp. in (29) if \(p=\infty \)) is attained for some \(\mu \in {\mathcal {C}}(\alpha ,\beta )\).

Proof

In the course of this proof we focus on the case \(p<\infty \) and remark that the case \(p=\infty \) follows by similar arguments. The quotient map allows us to define the map \(\theta :{\mathcal {C}}(\alpha ,\beta )\rightarrow {\mathcal {C}}(\Psi _\#\,{\alpha },\Psi _\#\,{\beta })\) via \(\mu \mapsto (\Psi \hspace{1.111pt}{\times }\hspace{1.111pt}\Psi )_\#\,\mu \). It is easy to see that \(\theta \) is well defined and surjective. Furthermore, it holds by construction that

$$\begin{aligned}\int _{X\times X}({d}_X(x,y))^p\hspace{1.111pt}{\mu }(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)=\int _{{\widetilde{X}}\times {\widetilde{X}}} ({\widetilde{d}}_X(x,y))^p\hspace{1.111pt}\theta ({\mu })(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\end{aligned}$$

for all \(\mu \in {\mathcal {C}}(\alpha ,\beta )\). Hence, (30) follows.

We come to the second part of the claim. By [91, Sect. 4] there exists an optimal coupling \({\widetilde{\mu }}^*\in {\mathcal {C}}(\Psi _\#\,{\alpha },\Psi _\#\,{\beta })\) such that

$$\begin{aligned}d_{\textrm{W},p}^{\,({\widetilde{X}},{\widetilde{d}}_X)}(\Psi _\#\,{\alpha },\Psi _\#\,{\beta })=\biggl (\int _{{\widetilde{X}}\times {\widetilde{X}}} ({\widetilde{d}}_X(x,y))^p\,{\widetilde{\mu }}^*(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\biggr )^{\!1/p}.\end{aligned}$$

In consequence, we find using our previous results that for any \(\mu ^*\in \theta ^{-1}({\widetilde{\mu }}^*)\) it holds

$$\begin{aligned} d_{\textrm{W},p}^{\,({\widetilde{X}},{\widetilde{d}}_X)}(\Psi _\#\,{\alpha },\Psi _\#\,{\beta })&=\biggl (\int _{{\widetilde{X}}\times {\widetilde{X}}} ({\widetilde{d}}_X(x,y))^p\hspace{1.111pt}{\widetilde{\mu }}^*(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\biggr )^{\!1/p}\\&=\biggl (\int _{{X}\times {X}} ({d}_X(x,y))^p\hspace{1.111pt}\mu ^*(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\biggr )^{\!1/p}\!\!=d_{\textrm{W},p}^{\,(X,d_X)}(\alpha ,\beta ).\end{aligned}$$

This yields the claim.\(\square \)

1.5.2 Regularity of the Cost Functionals of \(u_{\textrm{GW},p}\) and \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\)

In the remainder of this section, we collect various technical results required to demonstrate the existence of optimizers in the definitions of \(u_{\textrm{GW},p}^{\mathrm{\,sturm}}\) (see (8)) and \(u_{\textrm{GW},p}\) (see (11)).

Lemma B.19

Let \({\mathcal {X}}={(X,{u_{X}},{\mu _{X}}) }\) and \({\mathcal {Y}}={(Y,{u_{Y}},{\mu _{Y}}) }\) be compact ultrametric measure spaces. Then, \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\subseteq {\mathcal {P}}(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y,\max \hspace{0.55542pt}(u_X,u_Y))\) is compact w.r.t. weak convergence.

Proof

The proof follows directly from [21, Lem. 2.2].\(\square \)

Lemma B.20

Let \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\). Let \(D_1\subseteq {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) be a non-empty subset satisfying the following: there exist \((x_0,y_0)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) and \(C>0\) such that \(u(x_0,y_0)\leqslant C\) for all \(u\in D_1\). Then, \(D_1\) is pre-compact with respect to uniform convergence.

Proof

Let \(\{u_n\}_{n\in {\mathbb {N}}}\subseteq D_1\) be a sequence. Note that \(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\subseteq X\sqcup Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\sqcup Y\). Let \(v_n:= u_n|_{X\times Y}\). For any \(n\in {\mathbb {N}}\) and any \((x,y),(x'\!,y')\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), we have that

$$\begin{aligned} |u_n(x,y)-u_n(x'\!,y')|&\leqslant u_X(x,x')+u_Y(y,y')\\ {}&\leqslant 2\max \hspace{0.55542pt}(u_X,u_Y)((x,y),(x'\!,y')). \end{aligned}$$

This means that \(\{v_n\}_{n\in {\mathbb {N}}}\) is equicontinuous with respect to the ultrametric \(\max \hspace{0.55542pt}\{{u_{X}},{u_{Y}}\}\) on \(X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\). Now, since \(u_n(x_0,y_0)\leqslant C\), we have that for any \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\),

$$\begin{aligned} u_n(x,y)&\leqslant 2\max \hspace{0.55542pt}(u_X,u_Y)((x,y),(x_0,y_0))+u_n(x_0,y_0)\\ {}&\leqslant 2\max \hspace{0.55542pt}({\textrm{diam}}\hspace{0.55542pt}(X) ,{\textrm{diam}}\hspace{0.55542pt}(Y) )+C. \end{aligned}$$

Consequently, \(\{v_n\}_{n\in {\mathbb {N}}}\) is uniformly bounded. By the Arzéla–Ascoli theorem ([47, Thm. 7 on p. 61]), each subsequence of \(\{v_n\}_{n\in {\mathbb {N}}}\) has a uniformly convergent subsequence. Hence, we assume without loss of generality that \(\{v_n\}_{n\in {\mathbb {N}}}\) converges to \(v:X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\rightarrow {\mathbb {R}}_{\geqslant 0}\).

Now, we define a symmetric function \(u:X\sqcup Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\sqcup Y\rightarrow {\mathbb {R}}_{\geqslant 0}\) as follows:

  1. (i)

    \(u|_{X\times X}:= u_X\) and \(u|_{Y\times Y}:= u_Y\);

  2. (ii)

    \(u|_{X\times Y}:= v\); for \((y,x)\in Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\), we let \(u(y,x):= u(x,y)\).

It is easy to verify that \(u\in {\mathcal {D}}^{\textrm{ult}}(u_X,u_Y)\) and that u is a cluster point of the sequence \(\{u_n\}_{n\in {\mathbb {N}}}\). Therefore, \(D_1\) is pre-compact.\(\square \)

Lemma B.21

Let \({\mathcal {X}}={(X,{u_{X}},{\mu _{X}}) }\) and \({\mathcal {Y}}={(Y,{u_{Y}},{\mu _{Y}}) }\) be compact ultrametric measure spaces. Let \(\{\mu _n\}_{n\in {\mathbb {N}}}\subseteq {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) be a sequence weakly converging to \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\). Let \(\{u_n\}_{n\in {\mathbb {N}}}\subseteq {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\). Suppose that there exist a non-decreasing sequence \(\{p_n\}_{n\in {\mathbb {N}}}\subseteq [1,\infty )\) and \(C>0\) such that for all \(n\in {\mathbb {N}}\),

$$\begin{aligned} \biggl (\int _{X\times Y}(u_n(x,y))^{p_n}\mu _n(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\biggr )^{\!1/p_n}\!\!\leqslant \, C.\end{aligned}$$

Then, \(\{u_n\}_{n\in {\mathbb {N}}}\) uniformly converges to some \(u\in {\mathcal {D}}^{\textrm{ult}}({u_{X}},{u_{Y}})\) (up to taking a subsequence).

Proof

The following argument adapts the proof of [83, Lem. 3.3] to the current setting. For any \((x_0,y_0)\in {{\textrm{supp}}\hspace{0.55542pt}(\mu ) }\), there exist \(\varepsilon ,\delta >0\) and \(N\in {\mathbb {N}}\) such that for all \(n\geqslant N\)

$$\begin{aligned} C&\geqslant \biggl (\int _{X\times Y}(u_n(x,y))^{p_n}\mu _n(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\biggr )^{\!1/p_n}\\ {}&\geqslant \int _{X\times Y}u_n(x,y)\hspace{1.111pt}\mu _n\hspace{1.111pt}(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\\&\geqslant \int _{B_\varepsilon ^X(x_0)\times B_\varepsilon ^Y(y_0)}\! u_n(x,y)\,\mu _n(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\\ {}&\geqslant \int _{B_\varepsilon ^X(x_0)\times B_\varepsilon ^Y(y_0)}(u_n(x_0,y_0)-2\varepsilon )\,\mu _n(dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\\&\geqslant (u_n(x_0,y_0)-2\varepsilon )\bigl (\mu ( B_\varepsilon ^X(x_0)\hspace{1.111pt}{\times }\hspace{1.111pt}B_\varepsilon ^Y(y_0))-\delta \bigr ). \end{aligned}$$

Therefore, \(\{u_n(x_0,y_0)\}_{n\geqslant N}\) is uniformly bounded. By Lemma B.20, we have that \(\{u_n\}_{n\in {\mathbb {N}}}\) has a uniformly convergent subsequence.\(\square \)

Lemma B.22

Let XY be ultrametric spaces, then

$$\begin{aligned} \Lambda _\infty ({u_{X}},{u_{Y}}):X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\rightarrow {\mathbb {R}}_{\geqslant 0}\end{aligned}$$

is continuous with respect to the product topology (induced by \(\max \hspace{0.55542pt}({u_{X}},{u_{Y}}, {u_{X}},{u_{Y}})\)).

Proof

Fix \((x,y,x'\!,y')\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) and \(\varepsilon >0\). Choose \(0<\delta <\varepsilon \) such that \(\delta <u_X(x,x')\) if \(x\ne x'\) and \(\delta <u_Y(y,y')\) if \(y\ne y'\). Then, consider any point \((x_1,y_1,x_1',y_1')\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\) such that

$$\begin{aligned} {u_{X}}(x,x_1),{u_{Y}}(y,y_1),{u_{X}}(x'\!,x_1'),{u_{Y}}(y'\!,y_1')\leqslant \delta . \end{aligned}$$

For \({u_{X}}(x_1,x_1')\), we have the following two situations:

  1. (i)

    \(x=x'\): \({u_{X}}(x_1,x_1')\leqslant \max \hspace{0.55542pt}({u_{X}}(x_1,x),{u_{X}}(x,x_1'))\leqslant \delta <\varepsilon \);

  2. (ii)

    \(x\ne x'\): \({u_{X}}(x_1,x_1')\leqslant \max \hspace{0.55542pt}({u_{X}}(x_1,x),{u_{X}}(x,x'),{u_{X}}(x'\!,x_1'))={u_{X}}(x,x')\). Similarly, \({u_{X}}(x,x')\leqslant {u_{X}}(x_1,x_1')\) and thus \({u_{X}}(x,x')={u_{X}}(x_1,x_1')\).

Similar result holds for \({u_{Y}}(y_1,y_1')\).

This leads to four cases for \(\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))\):

  1. (i)

    \(x=x'\), \(y=y'\): In this case we have \({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1')< \varepsilon \). Then,

    $$\begin{aligned} \bigl |\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))&-\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\bigr |\\&=\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))\leqslant \varepsilon ; \end{aligned}$$
  2. (ii)

    \(x=x'\), \(y\ne y'\): Now \({u_{X}}(x_1,x_1')<\varepsilon \) and \({u_{Y}}(y_1,y_1')={u_{Y}}(y,y')\). If \({u_{Y}}(y,y')\geqslant \varepsilon >{u_{X}}(x_1,x_1')\), then

    $$\begin{aligned} \bigl |\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))&-\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\bigr |\\&=|{u_{Y}}(y,y')-{u_{Y}}(y,y')|=0. \end{aligned}$$

    Otherwise \({u_{Y}}(y,y')<\varepsilon \), which implies that \(\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))\leqslant \varepsilon \) and \(\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))={u_{Y}}(y,y')\leqslant \varepsilon \). Therefore,

    $$\begin{aligned}\bigl |\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))-\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\bigr |\leqslant \varepsilon ;\end{aligned}$$
  3. (iii)

    \(x\ne x'\), \(y=y'\): Similarly with (ii) we have

    $$\begin{aligned}\bigl |\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))-\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\bigr |\leqslant \varepsilon ;\end{aligned}$$
  4. (iv)

    \(x\ne x'\), \(y\ne y'\): Now \({u_{X}}(x_1,x_1')={u_{X}}(x,x')\) and \({u_{Y}}(y_1,y_1')={u_{Y}}(y,y')\). Therefore,

    $$\begin{aligned}\bigl |\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))-\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\bigr |=0.\end{aligned}$$

In conclusion, whenever \({u_{X}}(x,x_1),{u_{Y}}(y,y_1),{u_{X}}(x'\!,x_1'),{u_{Y}}(y'\!,y_1')\leqslant \delta \) we have that

$$\begin{aligned}\bigl |\Lambda _\infty ({u_{X}}(x_1,x_1'),{u_{Y}}(y_1,y_1'))-\Lambda _\infty ({u_{X}}(x,x'),{u_{Y}}(y,y'))\bigr |\leqslant \varepsilon .\end{aligned}$$

Therefore, \(\Lambda _\infty ({u_{X}},{u_{Y}})\) is continuous with respect to the metric \(\max \hspace{0.55542pt}({u_{X}},{u_{Y}}, {u_{X}},{u_{Y}})\).\(\square \)

1.5.3 \(u_{\textrm{GW},p}\) and the One Point Space

Below, we prove that \(u_{\textrm{GW},p}\), \(1\leqslant p\leqslant \infty \), between an arbitrary \({\mathcal {X}}\in {\mathcal {U}}^{\textrm{w}}\) and the one point ultrametric measure space \(*\) agrees with the p-diameter of \({\mathcal {X}}\) (see e.g., [60]): for \(1\leqslant p\leqslant \infty \) as \(\textrm{diam}_p({\mathcal {X}}):= \Vert d_X\Vert _{L^p({\mu _{X}}\otimes {\mu _{X}})}\).

Proposition B.23

Let \(*\in {\mathcal {U}}^{\textrm{w}}\) be the one-point space. Then, it holds for any \(1\leqslant p\leqslant \infty \) that \(u_{\textrm{GW},p}({\mathcal {X}},*) = \textrm{diam}_p({\mathcal {X}})\).

Proof

Note that in this case, for every \(x,x'\!\in X\) \(\Lambda _\infty (u_X(x,x'),u_*(*,*)) = \Lambda _\infty (u_X(x,x'),0) = u_X(x,x')\). Therefore, thanks to this observation, and the fact that \(\mu := \mu _X\hspace{0.55542pt}{\otimes }\hspace{1.111pt}\delta _*\) is the unique coupling between \(\mu _X\) and \(\delta _*\), (10) leads to the claim.\(\square \)

Technical Details from Sect. 4

1.1 Proofs from Sect. 4

In this section, we state the full proofs of the results from Sect. 4.

1.1.1 Proof of Theorem 4.1

Part 1. We observe that for any point x in an ultrametric space X, there always exists \(x'\!\in X\) such that \({u_{X}}(x,x')={\textrm{diam}}\hspace{0.55542pt}(X) \) (see [27]). Since by assumption \(\mu _X\) is fully supported, \(s_{X,\infty }\equiv {\textrm{diam}}\hspace{0.55542pt}(X) \) is a constant function. Therefore, \(\Lambda _\infty (s_{X,\infty }(x),s_{Y,\infty }(y))\equiv \Lambda _\infty ({\textrm{diam}}\hspace{0.55542pt}(X) ,{\textrm{diam}}\hspace{0.55542pt}(Y) )\) for all \(x\in X\) and \(y\in Y\). This implies that \({\textbf{FLB}}_{\infty }^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})=\Lambda _\infty ({\textrm{diam}}\hspace{0.55542pt}(X) ,{\textrm{diam}}\hspace{0.55542pt}(Y) )\). By [64, Cor. 5.3] and Corollary 3.16, we have that

$$\begin{aligned}u_{\textrm{GW},\infty }({\mathcal {X}},{\mathcal {Y}})\geqslant u_{\textrm{GH}}(X,Y)\geqslant \Lambda _\infty ({\textrm{diam}}\hspace{0.55542pt}(X) ,{\textrm{diam}}\hspace{0.55542pt}(Y) )={\textbf{FLB}}_{\infty }^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}}). \end{aligned}$$

Part 2. The proof for \(d_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})\geqslant \textbf{TLB}_p({\mathcal {X}},{\mathcal {Y}})\) in [60, Sect. 6] can be used essentially without any change for showing \(u_{\textrm{GW},p}({\mathcal {X}},{\mathcal {Y}})\geqslant {\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})\). Hence, it remains to show that \({\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})\geqslant {\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})\):

Proposition C.1

Let \({\mathcal {X}},{\mathcal {Y}}\in {\mathcal {U}}^{\textrm{w}}\) and let \(p\in [1,\infty ]\). Then, \( {\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})\geqslant {\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})\).

In order to prove Proposition C.1, we need the following technical lemma.

Lemma C.2

Let \({\mathcal {X}}={(X,{d_{X}},{\mu _{X}}) }\in {\mathcal {U}}^{\textrm{w}}\). Then, \({\textrm{spec}}\hspace{0.55542pt}(X):= \{{u_{X}}(x,x')\,{|}\, x,x'\!\in {\mathcal {X}}\}\) is a compact subset of \(({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\).

Proof

By Lemma A.7, we have that for each \(t>0\), \(X_t\) is a finite set. Let \(\{t_n\}_{n=1}^\infty \) be a positive sequence decreasing to 0. Then, it is easy to see that \({\textrm{spec}}\hspace{0.55542pt}(X)=\bigcup _{n=1}^\infty {\textrm{spec}}\hspace{0.55542pt}(X_{t_n})\). Since each \({\textrm{spec}}\hspace{0.55542pt}(X_{t_n})\) is a finite set, \({\textrm{spec}}\hspace{0.55542pt}(X)\) is a countable set.

Now, pick any \(0\ne t\in {\textrm{spec}}\hspace{0.55542pt}(X)\). Suppose t is a cluster point in \({\textrm{spec}}\hspace{0.55542pt}(X)\). Then, there exists infinitely many \(s\in {\textrm{spec}}\hspace{0.55542pt}(X)\) greater than t/2. However, this will result in \(X_{t/2}\) being an infinite set, which contradicts the fact that \(X_{t/2}\) is finite. Therefore, 0 is the only possible cluster point of \({\textrm{spec}}\hspace{0.55542pt}(X)\). By Lemma A.2, we have that \({\textrm{spec}}\hspace{0.55542pt}(X)\) is compact.\(\square \)

Next we demonstrate Proposition C.1 and hence finish the proof of Theorem 4.1.

Proof of Proposition C.1

We first prove the case when \(p<\infty \). Let \(dh_{\mathcal {X}}(x):= {u_{X}}(x,\hspace{0.55542pt}{\cdot }\hspace{1.111pt})_\#\,{\mu _{X}}\) and let \(dh_{\mathcal {Y}}(y):= {u_{Y}}(y,\hspace{0.55542pt}{\cdot }\hspace{1.111pt})_\#\,{\mu _{Y}}\). Further, define

$$\begin{aligned} dH_{\mathcal {X}}:= ({u_{X}})_\#\,({\mu _{X}}\hspace{1.111pt}{\otimes }\hspace{1.111pt}{\mu _{X}}), \quad dH_{\mathcal {Y}}:= ({u_{Y}})_\#\,({\mu _{Y}}\hspace{1.111pt}{\otimes }\hspace{1.111pt}{\mu _{Y}}). \end{aligned}$$

Lemma C.2 implies that the set \(S:= {\textrm{spec}}\hspace{0.55542pt}(X)\cup {\textrm{spec}}\hspace{0.55542pt}(Y)\) is a compact subset of \(({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )\). It is easy to see that \(\textrm{supp}\hspace{1.111pt}(dh_{\mathcal {X}}),\textrm{supp}\hspace{0.55542pt}(dh_{\mathcal {Y}}),\textrm{supp}\hspace{0.55542pt}(dH_{\mathcal {X}}),\textrm{supp}\hspace{0.55542pt}(dH_{\mathcal {Y}})\subseteq S\subseteq {\mathbb {R}}_{\geqslant 0}\). Now, recall by Proposition 4.4 that \({\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})=d^{\,(S,\Lambda _{\infty })}_{\textrm{W},p}(dH_{\mathcal {X}},dH_{\mathcal {Y}})\) and

$$\begin{aligned}{\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})=\biggl (\inf _{\pi \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})}\int _{X\times Y}\!\bigl (d^{\,(S,\Lambda _{\infty })}_{\textrm{W},p}(dh_{\mathcal {X}}(x),dh_{\mathcal {Y}}(y))\bigr )^p\hspace{1.111pt}\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\biggr )^{\!1/p}\!.\end{aligned}$$

Further, we observe for any \(x\in X\) and \(y\in Y\) that

$$\begin{aligned} d^{\,(S,\Lambda _{\infty })}_{\textrm{W},p}(dh_{\mathcal {X}}(x),dh_{\mathcal {Y}}(y))=\!\!\inf _{\pi _{xy}\in {\mathcal {C}}(dh_{\mathcal {X}}(x),dh_{\mathcal {Y}}(y))}\biggl (\int _{S\times S}\!\!\Lambda _\infty ^p(s,t)\,\pi _{xy}(ds\hspace{1.111pt}{\times }\hspace{1.111pt}dt)\biggr )^{\!1/p}\!. \end{aligned}$$

For the remainder of this proof, the metric on \(S\subseteq {\mathbb {R}}_{\geqslant 0}\) is always given by \(\Lambda _\infty \). Additionally, \({\mathcal {P}}(S)\) denotes the set of probability measures on S and we equip \({\mathcal {P}}(S)\) with the Borel \(\sigma \)-field with respect to the topology induced by weak convergence.

Claim 1

There is a measurable choice \((x,y)\mapsto \pi ^*_{xy}\) such that for each \((x,y)\in X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\), \(\pi ^*_{x,y}\) is an optimal transport plan between \(dh_{\mathcal {X}}(x)\) and \(dh_{\mathcal {Y}}(y)\).

Proof of Claim 1

Since both \(\Lambda _1\) and \(\Lambda _\infty \) induce the same topology on S, and thus the same Borel sets on S, \(d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _1)}\) and \(d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )}\) metrize the same weak topology on \({\mathcal {P}}(S)\). By [61, Rem. 2.5], the following two maps are continuous with respect to the weak topology and thus measurable:

$$\begin{aligned}\Phi _1:X\rightarrow {\mathcal {P}}(S),\; x\mapsto dh_{\mathcal {X}}(x)\;\;\text{ and }\;\;\Phi _2:Y\rightarrow {\mathcal {P}}(S),\;\; y\mapsto dh_{\mathcal {Y}}(y).\end{aligned}$$

Since S is compact, the space \(({\mathcal {P}}(S),d_{\textrm{W},p}^{\,(S,\Lambda _\infty )})\) is separable [91, Thm. 6.18]. This yields that \({\mathscr {B}}( {\mathcal {P}}(S)\hspace{1.111pt}{\times }\hspace{1.111pt}{\mathcal {P}}(S))={\mathscr {B}}( {\mathcal {P}}(S))\hspace{1.111pt}{\otimes }\hspace{1.111pt}{\mathscr {B}}( {\mathcal {P}}(S))\) [33, Prop. 1.5]. Hence, the product \(\Phi :X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\rightarrow {\mathcal {P}}(S)\hspace{1.111pt}{\times }\hspace{1.111pt}{\mathcal {P}}(S)\) of \(\Phi _1\) and \(\Phi _2\), defined by \((x,y)\mapsto (dh_{\mathcal {X}}(x),dh_{\mathcal {Y}}(y))\) is measurable [33, Prop. 2.4]. Then, a direct application of [91, Cor. 5.22] gives the claim. \(\square \)

Now, we have that for every \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\) that

$$\begin{aligned} \int _{X\times Y}&\!\bigl (d^{\,(S,\Lambda _{\infty })}_{\textrm{W},p}(dh_{\mathcal {X}}(x),dh_{\mathcal {Y}}(y))\bigr )^p\hspace{1.111pt}\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\\&=\int _{X\times Y}\!\int _{S\times S}\Lambda _\infty ^p(s,t)\,\pi ^*_{xy}(ds\hspace{1.111pt}{\times }\hspace{1.111pt}dt)\,\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\\&=\int _{S\times S}\!\Lambda ^p_\infty (s,t)\,\bar{\mu }(ds\hspace{1.111pt}{\times }\hspace{1.111pt}dt), \end{aligned}$$

by Fubini’s Theorem, where \(\bar{\mu }\in {\mathcal {P}}(S\hspace{1.111pt}{\times }\hspace{1.111pt}S)\) is defined as

$$\begin{aligned} \bar{\mu }(A):= \int _{X\times Y}\!\pi ^*_{xy}(A)\,\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy) \end{aligned}$$

for any measurable \(A\subseteq S\hspace{1.111pt}{\times }\hspace{1.111pt}S\). We remark that by Claim 1 the measure \(\bar{\mu }\) is well defined. Next, we verify that \(\bar{\mu }\in {\mathcal {C}}(dH_{\mathcal {X}},dH_{\mathcal {Y}})\). For any measurable \(A\subseteq (S,\Lambda _\infty )\) we have

$$\begin{aligned} \bar{\mu }(A\hspace{1.111pt}{\times }\hspace{1.111pt}S)&=\int _{X\times Y}\!\pi ^*_{x,y}(A\hspace{1.111pt}{\times }\hspace{1.111pt}S)\,\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\\ {}&=\int _{X\times Y}\!dh_{\mathcal {X}}(x)(A)\,\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)\\ {}&=\int _{X}\!dh_{\mathcal {X}}(x)(A)\,{\mu _{X}}(dx)\\&\overset{\mathrm{(i)}}{=}\int _X\!\int _X\!\mathbb {1}_{\{{d_{X}}(x,x')\in A\}}\,{\mu _{X}}(dx')\,{\mu _{X}}(dx)=dH_{\mathcal {X}}(A), \end{aligned}$$

where we have applied the marginal constraints for \(\pi _{xy}\) and \(\mu \). Further, (i) follows by the change-of-variables formula. The analogous arguments give that \(\bar{\mu }(S\hspace{1.111pt}{\times }\hspace{1.111pt}B)=dH_{\mathcal {Y}}(B)\) for any measurable \(B\subseteq S\). Thus, we conclude that for every \(\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})\)

$$\begin{aligned} \int _{X\times Y}\!\bigl (d^{\,(S,\Lambda _{\infty })}_{\textrm{W},p}(dh_{\mathcal {X}}(x),dh_{\mathcal {Y}}(y))\bigr )^p\hspace{1.111pt}\mu (dx\hspace{0.83328pt}{\times }\hspace{0.83328pt}dy)=\int _{S\times S}\!\Lambda ^p_\infty (s,t)\,\bar{\mu }(ds\hspace{1.111pt}{\times }\hspace{1.111pt}dt)\\ \geqslant \!\! \inf _{\pi \in {\mathcal {C}}(dH_{\mathcal {X}},dH_{\mathcal {Y}})}\int _{S\times S}\!\Lambda _\infty (s,t)\,\pi (ds\hspace{1.111pt}{\times }\hspace{1.111pt}dt)=\bigl (d^{\,(S,\Lambda _{\infty })}_{\textrm{W},p}(dH_{\mathcal {X}},dH_{\mathcal {Y}})\bigr )^p. \end{aligned}$$

This gives the claim for \(p<\infty \).

Next, we prove the assertion for the case \(p=\infty \). Note that for any \(p<\infty \)

$$\begin{aligned} {\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})&= \inf _{\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})} \bigl \Vert d^{\,(S,\Lambda _{\infty })}_{\textrm{W},p}(dh_{\mathcal {X}}(\hspace{1.111pt}{\cdot }\hspace{1.111pt}),dh_{\mathcal {Y}}(\hspace{1.111pt}{\cdot }\hspace{1.111pt}))\bigr \Vert _{L^p(\mu )}\\&\leqslant \inf _{\mu \in {\mathcal {C}}({\mu _{X}},{\mu _{Y}})}\bigl \Vert d^{\,(S,\Lambda _{\infty })}_{\textrm{W},\infty }(dh_{\mathcal {X}}(\hspace{1.111pt}{\cdot }\hspace{1.111pt}),dh_{\mathcal {Y}}(\hspace{1.111pt}{\cdot }\hspace{1.111pt}))\bigr \Vert _{L^\infty (\mu )}={\textbf{TLB}}_{\infty }^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}}), \end{aligned}$$

where the inequality holds since \(d^{\,(S,\Lambda _{\infty })}_{\textrm{W},p}\!\leqslant d^{\,(S,\Lambda _{\infty })}_{\textrm{W},\infty }\) and \(\Vert \hspace{1.111pt}{\cdot }\hspace{1.111pt}\Vert _{L^p(\mu )}\leqslant \Vert \hspace{1.111pt}{\cdot }\hspace{1.111pt}\Vert _{L^\infty (\mu )}\).

By [35, Prop. 3] we have that

$$\begin{aligned} {\textbf{SLB}}_{\infty }^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})&=d^{\,(S,\Lambda _{\infty })}_{\textrm{W},\infty }(dH_{\mathcal {X}},dH_{\mathcal {Y}})\\ {}&=\lim _{p\rightarrow \infty }d^{\,(S,\Lambda _{\infty })}_{\textrm{W},p}(dH_{\mathcal {X}},dH_{\mathcal {Y}})=\lim _{p\rightarrow \infty }{\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}}). \end{aligned}$$

Therefore,

$$\begin{aligned} {\textbf{SLB}}_{\infty }^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})&=\lim _{p\rightarrow \infty }{\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})\nonumber \\&\leqslant \limsup _{ p\rightarrow \infty }{\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})\leqslant {\textbf{TLB}}_{\infty }^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}}). \end{aligned}$$

\(\square \)

1.1.2 Proof of Proposition 4.4

We only prove the first statement for \(p\in [1,\infty )\). The case \(p=\infty \) as well as the second statement can be proven in a similar manner.

By directly using the change-of-variables formula, we have the following:

$$\begin{aligned} {\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})&=\!\!\inf _{\gamma \in {\mathcal {C}}({\mu _{X}}\otimes {\mu _{X}},{\mu _{Y}}\otimes {\mu _{Y}})}\Vert \Lambda _\infty ({u_{X}},{u_{Y}})\Vert ^p_{L^p(\gamma )}\\&=\!\!\inf _{\gamma \in {\mathcal {C}}({\mu _{X}}\otimes {\mu _{X}},{\mu _{Y}}\otimes {\mu _{Y}})}\Vert \Lambda _\infty \Vert ^p_{L^p(({u_{X}}\times {u_{Y}})_\#\,\gamma )}, \end{aligned}$$

where

$$\begin{aligned} {u_{X}}\hspace{1.111pt}{\times }\hspace{1.111pt}{u_{Y}}:X\hspace{1.111pt}{\times }\hspace{1.111pt}X\hspace{1.111pt}{\times }\hspace{1.111pt}Y\hspace{1.111pt}{\times }\hspace{1.111pt}Y\rightarrow {\mathbb {R}}_{\geqslant 0}\hspace{1.111pt}{\times }\hspace{1.111pt}{\mathbb {R}}_{\geqslant 0}\end{aligned}$$

maps \((x,x'\!,y,y')\) to \(({u_{X}}(x,x'),{u_{Y}}(y,y'))\). By Lemma A.5,

$$\begin{aligned} ({u_{X}}\hspace{1.111pt}{\times }\hspace{1.111pt}{u_{Y}})_\#\,{\mathcal {C}}({\mu _{X}}\hspace{1.111pt}{\otimes }\hspace{1.111pt}{\mu _{X}},{\mu _{Y}}\hspace{1.111pt}{\otimes }\hspace{1.111pt}{\mu _{Y}})={\mathcal {C}}(({u_{X}})_\#\,({\mu _{X}}\hspace{1.111pt}{\otimes }\hspace{1.111pt}{\mu _{X}}),({u_{Y}})_\#\,({\mu _{Y}}\hspace{1.111pt}{\otimes }\hspace{1.111pt}{\mu _{Y}})). \end{aligned}$$

Therefore,

$$\begin{aligned} {\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}},{\mathcal {Y}})&=\!\!\inf _{\gamma \in {\mathcal {C}}({\mu _{X}}\hspace{1.111pt}{\otimes }\hspace{1.111pt}{\mu _{X}},{\mu _{Y}}\hspace{1.111pt}{\otimes }\hspace{1.111pt}{\mu _{Y}})}\int _{{\mathbb {R}}_{\geqslant 0}\times {\mathbb {R}}_{\geqslant 0}}\!\!(\Lambda _\infty (s,t))^p\,({u_{X}}\hspace{1.111pt}{\times }\hspace{1.111pt}{u_{Y}})_\#\,\gamma (ds\hspace{1.111pt}{\times }\hspace{1.111pt}dt)\\&=\!\!\inf _{{\widetilde{\gamma }}\in {\mathcal {C}}(({u_{X}})_\#\,({\mu _{X}}\otimes {\mu _{X}}),({u_{Y}})_\#\,({\mu _{Y}}\otimes {\mu _{Y}}))}\int _{{\mathbb {R}}_{\geqslant 0}\times {\mathbb {R}}_{\geqslant 0}}\!\!(\Lambda _\infty (s,t))^p\hspace{1.111pt}{\widetilde{\gamma }}(ds\hspace{1.111pt}{\times }\hspace{1.111pt}dt)\\&= d_{\textrm{W},p}^{\,({\mathbb {R}}_{\geqslant 0},\Lambda _\infty )}\bigl (({u_{X}})_\#\hspace{1.111pt}({\mu _{X}}\hspace{1.111pt}{\otimes }\hspace{1.111pt}{\mu _{X}}),({u_{Y}})_\#\hspace{1.111pt}({\mu _{Y}}\hspace{1.111pt}{\otimes }\hspace{1.111pt}{\mu _{Y}})\bigr ). \end{aligned}$$

1.1.3 An Example: \({\textbf{SLB}}_{}^{\textrm{ult}}\) vs. \({\textbf{TLB}}_{}^{\textrm{ult}}\)

We will demonstrate that there are ultrametric measure spaces \({\mathcal {X}}_1\) and \({\mathcal {X}}_2\) such that \({\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}}_1,{\mathcal {X}}_2)=0\), while it holds \({\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}}_1,{\mathcal {X}}_2)>0\).

Consider the three point space \(\Delta _3(1)=(\{x_1,x_2,x_3\},u)\) where \(u(x_i,x_j)=1\) whenever \(i\ne j\). Construct two probability measures \(\mu _1:= \frac{2}{3}\delta _{x_1}+\frac{1}{6}\delta _{x_2}+\frac{1}{6}\delta _{x_3}\) and \(\mu _2:= \frac{1}{3}\delta _{x_1}+\bigl (\frac{1}{3}-\frac{1}{2\sqrt{3}}\bigr )\hspace{0.55542pt}\delta _{x_2}+ \bigl (\frac{1}{3}+\frac{1}{2\sqrt{3}}\bigr )\hspace{0.55542pt}\delta _{x_3}\). We then let \({\mathcal {X}}_1:= (\Delta _3(1),\mu _1)\) and \({\mathcal {X}}_2:= (\Delta _3(1),\mu _2)\). Obviously, \(u_\#\,(\mu _1\hspace{1.111pt}{\otimes }\hspace{1.111pt}\mu _1)=u_\#\,(\mu _2\hspace{0.55542pt}{\otimes }\hspace{1.111pt}\mu _2)=\delta _0/2+\delta _1/2\). Then, by Proposition 4.4 we immediately have that \({\textbf{SLB}}_{p}^{\textrm{ult}}({\mathcal {X}}_1,{\mathcal {X}}_2)=0\) for any \(p\in [1,\infty ]\). Now, note that \(u(x_1,\hspace{0.55542pt}{\cdot }\hspace{1.111pt})_\#\,\mu _1={2}\delta _0/3+\delta _1/3\), which is different from \(u(x_i,\hspace{0.55542pt}{\cdot }\hspace{1.111pt})_\#\,\mu _2\) for each \(i=1,2,3\). This implies (by Proposition 4.4) that \({\textbf{TLB}}_{p}^{\textrm{ult}}({\mathcal {X}}_1,{\mathcal {X}}_2)>0\) for any \(p\in [1,\infty ]\).

Note that this example works as well for showing that \({\textbf{TLB}}_{p}({\mathcal {X}}_1,{\mathcal {X}}_2)>{\textbf{SLB}}_{p}({\mathcal {X}}_1,{\mathcal {X}}_2)=0\).

Technical Details from Sect. 5

1.1 Technical Details from Sect. 5.2

Here, we list the precise results for the comparisons of the spaces \({\mathcal {X}}_i\), \(1\leqslant i\leqslant 4\), illustrated in Fig. 7. They are gathered in Tables 1 and  2.

Table 1 Comparison of different ultrametric measure spaces I:
Table 2 Comparison of different ultrametric measure spaces II:

1.2 Technical Details from Sect. 5.3

Here, we state more results for the comparison of the ultrametric measure spaces illustrated in Fig. 7 and give the precise construction of the ultrametric spaces \(Z_{k,t}^i\), \(2\leqslant k\leqslant 5\), \(t=0,0.2,0.4,0.4\), \(1\leqslant i\leqslant 15\).

The ultrametric measure spaces from Fig. 7 See Table 3 for the results of comparing the ultrametric dissimilarity spaces in Fig. 7 based on \(d_{\textrm{GW},1}\) and \({\textbf{SLB}}_{1}\).

Table 3 Comparison of different ultrametric measure spaces III:

Construction of \(Z_k\) For each \(k=2,3,4,5\) we first draw a sample with \(100\hspace{1.111pt}{\times }\hspace{1.111pt}k\) points from the distribution \(\sum _{i=0}^k U[1.5(k-1),1.5(k-1)+1]/k\), where U[ab] denotes the uniform distribution on [ab]. For each sample, we employ the single linkage algorithm to create a dendrogram, which then induces an ultrametric on the given sample. We further draw a 30-point subspace from each ultrametric space and denote it by \(Z_k\). These four spaces have similar diameter values between 0.5 and 0.6. Each space \(Z_k\) is equipped with the uniform probability measure and the resulting ultrametric measure space is denoted by \({\mathcal {Z}}_{k}=( Z_{k},u_{Z_k},\mu _{Z_k}) \), \(k=2,3,4,5\). We remark that k can be regarded as the number of blocks in the dendrogram representation of the obtained ultrametric measure spaces (see the top row of Fig. 8 for a visualization of three 3-block spaces).

Perturbations at level t . Given a perturbation level \(t\geqslant 0\) and an ultrametric space X, we consider the quotient space \(X_t\). Each equivalence class \([x]_t\subseteq X\) is an ultrametric subspace of X. If \(|[x]_t|>1\), we let \(m:= | {\textrm{spec}}\hspace{0.55542pt}([x]_t)|-1\) and write \({\textrm{spec}}\hspace{0.55542pt}([x]_t)=\{0<s_1<\cdots <s_m\}\). Let \(\delta := {\textrm{diam}}\hspace{0.55542pt}([x]_t) \). We generate m uniformly distributed numbers from \([0, t-\delta ]\) and sort them according to ascending order to obtain \(a_1\leqslant \cdots \leqslant a_m\). We then perturb \(u_{X}|_{[x]_t\hspace{1.111pt}{\times }\hspace{1.111pt}[x]_t}\) by replacing \(s_i\) with \(s_i+a_i\) for each \(i=1,\ldots ,m\). We do the same for all equivalence classes \([x]_t\) and thus obtain a new ultrametric on X.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mémoli, F., Munk, A., Wan, Z. et al. The Ultrametric Gromov–Wasserstein Distance. Discrete Comput Geom 70, 1378–1450 (2023). https://doi.org/10.1007/s00454-023-00583-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00454-023-00583-0

Keywords

Mathematics Subject Classification

Navigation