Coarse embeddability of Wasserstein space and the space of persistence diagrams

We prove an equivalence between open questions about the embeddability of the space of persistence diagrams and the space of probability distributions (i.e.~Wasserstein space). It is known that for many natural metrics, no coarse embedding of either of these two spaces into Hilbert space exists. Some cases remain open, however. In particular, whether coarse embeddings exist with respect to the $p$-Wasserstein distance for $1\leq p\leq 2$ remains an open question for the space of persistence diagrams and for Wasserstein space on the plane. In this paper, we show that embeddability for persistence diagrams implies embeddability for Wasserstein space on $\mathbb{R}^2$, with the converse holding when $p>1$. To prove this, we show that finite subsets of Wasserstein space uniformly coarsely embed into the space of persistence diagrams, and vice versa (when $p>1$).


INTRODUCTION
In this paper, we consider embeddings of two kinds of non-linear data: persistence diagrams, and probability distributions.A persistence diagram is an unordered set of points in the plane which arises as a summary of the topological information in a dataset (e.g. a point cloud or a grayscale image).Persistence diagrams have proven to capture important information in applications involving image data [7], geospatial data [10], time series data [15], and more.The set of persistence diagrams is endowed with a family of natural metrics called Wasserstein distances (see Section 2).In practice, the analysis of persistence diagrams is hampered by the fact that the space of persistence diagrams is not readily identifiable with a subset of Euclidean space, which limits the use of classical machine learning and statistical techniques.Hence, many vectorizations (i.e.maps from the set of persistence diagrams to a Hilbert space) have been introduced in recent years.Examples of vectorizations for persistence diagrams include persistence landscapes [4], persistence images [1] and persistence curves [8].Ideally, one would like these maps to be isometric embeddings; unfortunately, it can be shown theoretically that no isometric embedding of the space of persistence diagrams into Hilbert space exists.Even worse, even if one relaxes the isometric condition -say, to that of a coarse embedding -such an embedding is still theoretically proven not to exist in most cases (see Section 3 for a survey of such results).
This paper also studies probability distributions as objects.The space of all (Borel, finite moment) probability distributions on R n is equipped with a family of metrics, also called Wasserstein distances.Optimal transport and Wasserstein distances between distributions have been applied in a variety of areas including economics, machine learning, computer graphics and fluid dynamics.Like persistence diagrams, probability distributions, with Wasserstein metrics, are also difficult to embed into Euclidean space (see Section 3).
Despite the large number of negative results regarding embeddings, some important cases remain open.In the case of persistence diagrams, it is not known whether the set of persistence diagrams with the p-Wasserstein metric coarsely embeds into Hilbert space for 1 ≤ p ≤ 2. In the case of probability distributions, it is not known whether the set of probability distributions on R 2 with the p-Wasserstein metric coarsely embeds into Hilbert space for the same range of p values.These spaces are somewhat similar in that persistence diagrams can be thought of as discrete distributions, and one might expect that the answers to these two open questions should be related.In this paper, we confirm this by leveraging a result of Nowak on coarse embeddings of finite subsets [12].In particular, we show that all finite sets of distributions with the p-Wasserstein metric uniformly coarsely embed into the space of persistence diagrams with the p-Wasserstein metric (Proposition 18).If p > 1, we obtain the other direction: that finite sets of persistence diagrams embed into Wasserstein space (Proposition 20).As a corollary, we obtain that if the space of persistence diagrams with the p-Wasserstein metric embeds into Hilbert space, for 1 ≤ p < ∞, then so does the space of distributions (with finite p th moment) in R 2 with the p-Wasserstein metric; the converse holds if p > 1 (Theorem 23).

PRELIMINARIES
2.1.Wasserstein Space.We recall the following basic notions in optimal transport from [14].Definition 1.Let (X, d X ) be a complete separable metric space and let P p (X) denote the space of all Borel probability measures on X with finite p-th moments, for 1 ≤ p < ∞.The p-Wasserstein distance between α, β ∈ P p (X) is given by, Where U(α, β) is the set of Borel probability measures on X 2 with marginals α and β.The metric space (P p (X), W p ), which we will usually denote simply as P p (X), is called the Wasserstein p-space over (X, d X ).
While the p-Wasserstein distance is defined for general measures, in this paper we will be particularly focused on discrete measures.In this case there is an equivalent formulation of the distance which will be useful.Let α = m j=1 a j δ xj and β = n j=1 b j δ yj be discrete measures.The above definition reduces to Here U (a, b) is the set of n × m matrices such that P 1 m = a and P T 1 n = b.Moreover, if one additionally assumes that α and β have rational coefficients then by rewriting α = N j=1 1 N δ x ′ j and β = N j=1 1 N δ y ′ j it follows from the Birkhoff-von Neumann theorem that the distance can be re-expressed as, where P (N ) is the set of permutations on {1, . . ., N } [14].
2.2.The Space of Persistence Diagrams.Persistence diagrams typically appear in topological data analysis as a way to store topological information from a sequence of complexes.
< .Note that other definitions exist in the literature which allow, for example, countable multisets, or which for notational convenience define a persistence diagram to include infinitely many copies of the diagonal (e.g.[13]).We work with the above definition as it is the most convenient context for our results.Persistence diagrams are often also allowed to include points of the form (b, ∞).In this paper we do not allow infinite values, as we want all our distances to be finite.In applications, infinite values are often removed or replaced for the same reason.We now define Wasserstein distances between persistence diagrams.Definition 3. A partial matching between two persistence diagrams D 1 and D 2 is a triple The distance d in the above definition is a distance between points in the plane.Common choices are an ℓ q distance, for q ≥ 1, or ℓ ∞ distance.Since all ℓ q distances on R 2 are bi-Lipschitz equivalent, our results do not depend on the particular choice of d; when necessary we assume the ℓ ∞ distance.Now, for two persistence diagrams D 1 , D 2 we define the distance function Note that W p is a metric on persistence diagrams, which we call the p-Wasserstein metric on persistence diagrams.
Definition 5. Let D denote the collection of persistence diagrams.The metric space (D, W p ) is called the space of persistence diagrams with the p-Wasserstein distance.
We use W p to denote both the Wasserstein distance between probability distributions and the Wasserstein distance between persistence diagrams; the meaning will always be clear from the arguments.We also adopt the convention that R 2 is always quipped with the ℓ ∞ distance; other ℓ q distances are bi-Lipschitz equivalent so we do not lose any generality with this choice.Remark 6.To each persistence diagram D we can naturally associate the corresponding empirical distribution in the plane, i.e. the uniform distribution on the points (possibly with multiplicity) in D. Note that the Wasserstein distance between two persistence diagrams is not the same as the Wasserstein distance between the corresponding empirical distributions as a result of the partial matchings and the use of the diagonal as a universal point to match to.

Embeddings.
A metric embedding is a map between metric spaces which preserves distances in some sense.We will consider a number of different types of embeddings.Definition 7. Let (X, d X ) and (Y, d Y ) be metric spaces and let f : X → Y be a map.We say that f is • ǫ-quasi-isometric embedding if there exists an ǫ > 0 such that • a coarse embedding if there exist non-decreasing functions Moreover, the distortion of f , denoted dist(f ), is the infimum over all D such that the inequality above holds.If a map f : X → Y with distortion D exists, then X is said to embed into Y with distortion D. We introduce the notation c Y (X) = inf If θ ∈ (0, 1], then the θ-snowflake of a metric space (X, d) is the metric space (X, d θ ), where the metric is the obtained by raising d to the θ power.Following [2], a metric space X is said to be θ-snowflake universal if for every finite metric space

PAST RESULTS ON EMBEDDINGS
In this section we will give an overview of some embedding results for p-Wasserstein space and the space of persistence diagrams.In general, most of these spaces do not embed into Hilbert or Euclidean spaces except under severe restrictions.
Theorem 8 (Turner et al., 2014 [16]).(D, W p ) does not admit an isometric embedding into Hilbert space for any 1 ≤ p ≤ ∞ Theorem 9 (Carrière and Bauer, 2018 [6]).Let n ∈ N. Then for any N ∈ N and Here the D L N denotes a restricted space of persistence diagrams; namely the space of all diagrams which have at most N points and whose points all lie in the region [−L, L] 2 .To obtain a positive result, we need not only a cardinality restriction, but a relaxation of the embedding type.
Theorem 10 (Mitra and Virk, 2018 [11]).(D N , W p ) coarsely embeds into Hilbert space for 1 ≤ p ≤ ∞, where D N denotes the space of persistence diagrams with at most N points.
Without the cardinality restriction, even coarse embeddability is not possible in many natural cases.A first result in this direction showed that the space of all persistence diagrams fails to have Yu's Property A [19], a sufficient but not necessary condition for embeddability in Hilbert space originally introduced in Yu's work on the coarse Baum-Connes and Novikov Conjectures.
Later it was shown for all p > 2 that the space of persistence diagrams fails to coarsely embed into Hilbert space.
Turning to Wasserstein space, Andoni, Naor, and Neiman show that p-Wasserstein space on R 3 is 1 psnowflake universal.The proof of this result relies on an explicit embedding of any snowflake of a finite metric space into p-Wasserstein space on R 3 as a uniform measure.
Theorem 14 (Andoni-Naor-Neiman, 2018 [2]).If p ∈ (1, ∞) then for every finite metric space (X, d X ) we have As a corollary, Andoni-Naor-Neiman prove that p-Wasserstein space on R 3 fails to coarsely embed into Hilbert space for p > 1.The case for R 2 and p = 1 remains open.
Theorem 15 (Andoni-Naor-Neiman, 2018 [2]).If p > 1 then P p (R 3 ) does not admit a coarse embedding into any Banach space of nontrivial type.In particular, for p > 1, P p (R 3 ) does not admit a coarse embedding into Hilbert space.
To summarize, coarse embeddability in Hilbert space remains an open question for persistence diagrams when 1 ≤ p ≤ 2 and for Wasserstein space when the underlying space is the plane (the most closely related context to that of persistence diagrams).Our main results connect these two open questions.

MAIN RESULTS
The following characterization, by Nowak, of coarse embedabilty into Hilbert space by way of finite subsets will prove quite useful.
Nowak's characterization of coarse embeddings into Hilbert space allows one to restrict one's attention to maps on finite subsets.In light of this the proofs of Theorems 23 and 24 rely on finding low distortion maps on finite subsets of the relevant spaces.
We now proceed to show the existence of a suitable map from any finite subset of measures in P p (R 2 ), the p-Wasserstein space over R 2 , into the space of persistence diagrams.For convenience, we will use the notation d(•, •) for distances in both (D, W p ) and P p (R 2 ) for the remainder of this section.By a discrete rational measure we mean a discrete measure α = n i=1 a i δ xi where all the a i are rational numbers.Lemma 17. Suppose A = {α 1 , • • • , α n } is a finite set of discrete rational measures in P p (R 2 ).Then there exists an isometry f : A → (D, W p ).
Proof.Let N denote the common denominator of all coefficients in the measures α 1 , • • • , α n .We write each α i as a sum of uniformly weighted Dirac measures (possibly with duplication): Let D denote the diameter of the set { 1 N x i j } i,j in R 2 .Note that there exists an x ∈ R 2 such that From (1) we have that Now, since the diagrams are sufficiently far from the diagonal, we must have that the distance between the diagrams f (α) and f (β) is achieved by a perfect matching, which proves the result.
) and ǫ > 0 be given.It follows from standard results about distributions [17] that there exists a collection By Lemma 17 there exists an isometry f : B → D. Define f : A → D by α i → f (β i ).It is easy to check that this defines an ǫ-quasi-isometry.
We now consider the other direction, namely embedding persistence diagrams into Wasserstein space.This direction requires more care, with the main obstacle being the existence of matchings to the diagonal.Inspired by the explicit construction in [2], we now show how to construct an embedding of a finite subset of persistence diagrams into the space of discrete distributions by adding extra points along the diagonal.See Figure 1 for an illustration of our construction.
is a finite subset of diagrams whose points all have multiplicity one.Let p > 1, then for all ǫ > 0 there exists an N ∈ N and map f : D → P p (R 2 ) satisfying the following for sufficiently large s: Proof.Let D be as above and let ǫ > 0 be given.We will map each diagram to a subset of the plane, such that these subsets all have the same cardinality.We begin by fixing some notation.Let π ij denote the optimal partial matching between D i and D j , U ij denote those points in D i that are unmatched under this partial matching, N i = |D i |, N = max i N i .Furthermore, define the following: Here s > N is taken so large so that M−m s Finally, define f : D → P p (R 2 ) by sending each D i to the uniform measure on Di .We first check that f has the desired upper bound.Let be an optimal matching from U ij to a subset of I and τ ji denote an optimal matching from U ji to a (possibly different) subset of I. Now, note that, Assume that the sets I ∪ I i \ τ ji (U ji ) and I ∪ I j \ σ ij (U ij ) have been ordered as {x i } and {y i } respectively, with ascending coordinates, and let ω denote the bijection which sends x i to y i .We define a coupling, π, between the uniform measures f (D i ) and f (D j ) by, Thus, Putting it together, since π is a coupling between f (D i ) and f (D j ), x − y p ∞ dπ(x, y) where the last step follows from 2N • M−m s ≤ 2ǫ/3 from our choice of s.This completes the proof of the upper bound.For the lower bound, note that any bijective coupling of uniform measures f (D i ) and f (D j ) induces a partial matching on the diagrams D i and D j .Let π denote the optimal coupling and π ij the induced partial matching between diagrams D i and D j .Further let U ij denote those points unmatched under π ij , which must therefore be matched by π to something on the diagonal.Then we have, For a discrete measure α = a i δ xi and a real number r denote by λ r (α) the dilated measure λ r (α) = a i δ rxi .Note that by defining f = λ (N +s+1) 1/p (f ) we may obtain a map on finite subsets of diagrams where the coefficients in the inequality in Lemma 19 become 1.Moreover, for any diagram there exists a diagram whose points have multiplicity one and which is arbitrarily close to the original diagram.Thus, one may relax the restriction that each diagram has multiplicity one.This results in the following form of Lemma 19 which will be more useful.
Proposition 21.Let X and Y be metric spaces and suppose for all ǫ > 0 and all finite subsets A ⊂ X there exists a map f : for all x i , x j ∈ A. If Y coarsely embeds into a Hilbert space, then X coarsely embeds into a Hilbert space.
Proof.We proceed by applying Theorem 16.Let A be a finite subset of X.Then there exists ρ − , ρ Lemma 22.Let X and Y be metric spaces and suppose for all ǫ > 0 and all finite subsets A ⊂ X there exists a map f : A → Y satisfying, for all x i , x j ∈ A. Then X being 1 p -snowflake universal implies Y is 1 p -snowflake universal.
Proof.Let θ ∈ (0, 1) and (W, d θ W ) be the θ-snowflake of a finite metric space (W, d W ). Let ǫ > 0 be given and take δ < 1 so small so that 1+δ 1−δ < 1 + ǫ.Take ǫ 1 < δ 2 .Then by assumption there exists a map g : W → X and a constant k 1 such that, Let M = min xi =xj d(x i , x j ) and take ǫ 2 < δk1M 2 .Let f : g(W ) → Y be a map satisfying (4) with respect to ǫ 2 .We claim that f • g is the desired map.Indeed, Together, Proposition 18, Theorem 20 and Proposition 21 complete the proof our main result, which connects the embeddability questions for persistence diagrams and Wasserstein space.
Since snowflake universality depends only on finite subsets, Lemma 22 and Propositions 18, 20 yield the following.
We conclude this section with a corollary which follows from Theorems 12, 13, and 23.We note that a direct proof is also possible adapting techniques from Wagner [18].
Corollary 25.The space P p (R 2 ) does not admit a coarse embedding into a Hilbert space if p > 2.

CONCLUDING REMARKS
Wasserstein space and the space of persistence diagrams have many similarities, especially when viewing persistence diagrams as discrete distributions; this similarity has been explored elsewhere using partial optimal transport [9].The main difference between the spaces is the presence of the diagonal as a sink for unmatched points.Our results suggest that, for p > 1, this difference does not affect the coarse embeddability of these spaces.Obstructions to embeddability developed in either case will therefore work just as well for the other.On the other hand, for p = 1, our construction degenerates in a similar way to that in [2] and so we do not obtain an equivalence.The p = 1 case is important since it appears in many stability results for vectorizations [1,8].Note that stability is one half of coarse (or bi-Lipschitz) embeddability, as it bounds the distortion in Hilbert space in terms of the distance between diagrams.Our hope is that our results motivate the use of techniques for Wasserstein space to be used to resolve the question of embeddability of persistence diagrams for 1 ≤ p ≤ 2.
FIGURE 1. Sketch of the construction in Lemma 19 for n = 2.The two persistence diagrams on the left (blue triangles and orange circles respectively) are sent by f to the uniform distributions shown on the right, which have the same cardinality.