Bottleneck Profiles and Discrete Prokhorov Metrics for Persistence Diagrams

In topological data analysis (TDA), persistence diagrams have been a succesful tool. To compare them, Wasserstein and Bottleneck distances are commonly used. We address the shortcomings of these metrics and show a way to investigate them in a systematic way by introducing bottleneck profiles. This leads to a notion of discrete Prokhorov metrics for persistence diagrams as a generalization of the Bottleneck distance. They satisfy a stability result and bounds with respect to Wasserstein metrics. We provide algorithms to compute the newly introduced quantities and end with an discussion about experiments.


Introduction
The field of topological data analysis (TDA) is becoming a popular tool to study the structure of complex data.One of the major tools of TDA is persistent homology (PH) [EH10].Its pipeline takes a (often highly complex) point cloud in Euclidean space as input and produces a point cloud in the plane, the persistence diagram (PD), as output.Intuitively, persistence diagrams serve as a summary of the shape of the input data.As a consequence, one can compare different shapes indirectly, by comparing their PDs.The need for a robust and computationally efficient notion of distance for PDs arises.Classically, one uses the Bottleneck and Wasserstein distances to this end [KMN17].However, the Bottleneck distance only picks up the single biggest difference and the Wasserstein distance is prone to noise, as it picks up every difference no matter how small.This fact motivates our work to search for new metrics.Starting from the investigation of bottlenecks, we introduce the notion of the bottleneck profile of two PDs, which is a map R ≥0 → N ∪ {∞} (Definition 3.1).This tool summarizes metric information at varying scales and generalizes the Bottleneck distance.Also the Wasserstein distance can be, in special cases, computed from the bottleneck profile; in general, it can be bounded given a bottleneck profile.The bottleneck profiles arises naturally in a discrete version of the Prokhorov distance, which is a classical tool in probability theory.It turns out that the Bottleneck and the Prokhorov distance are just two instances of a whole family of Prokhorov-style metrics discussed in this paper (Definition 4.1).This family is parameterised by subclass of functions f : [0, ∞) → [0, ∞).Not every function f gives in fact rise to a genuine metric; we examine the conditions on f in which cases it does (Definition 4.2, such f are called admissible).
In addition to theoretical development, we discuss algorithms to compute the bottleneck profile and various Prokhorov-type distances.
We provide a run-time analysis and experiments on a number of data sets.The algorithms are provided as an open source implementation.

Measure Theory
Let (X, d) be a metric space.It is complete if every Cauchy sequence has a limit in X.It is separable if it has a countable dense subset.A complete separable metrizable topological space is called a Polish space.For example, all Euclidean spaces R n are Polish.Polish spaces are a convenient setting for measure or probability theory.In general, we endow X with the Borel Σ-algebra B(M ) and denote the set of probability measures by P(X).Let us recall an important inequality [Gra14, p. 6]: Lemma 2.1 (Chebychev's inequality).Let (X, Σ, µ) be a measure space and let f : X → R be a measurable function.Then for any p > 0 and t > 0,

Metrics for Probability Measures
There are various ways to compare different probability measures.
The 1-Wasserstein metric is also known as Kantorovich metric or earth mover's distance.The latter name is motivated by the idea of thinking about γ as a transport plan for moving a pile of earth µ into the pile ν.The cost of transportation equals the distance by which the earth is moved.
Intuitively, there are two different ways to "slightly change" a measure.The first one is to move all the mass by a tiny distance.The second one is to move a tiny part of the mass arbitrarily, possibly very far away.While the Wasserstein metric is stable under perturbations of the first kind, small changes of the second kind can result in large differences in the metric.The Prokhorov metric [Pro56] seeks to resolve this problem.It is constructed in such a way that an ε-neighborhood of a measure is characterized as follows: One may move ε of the mass arbitrarily and the rest by at most ε, see Figure 1 for an illustration.We now formalize this idea.For a Borel set A ⊂ X, the (open) ε-ball around A is The Prokhorov metric π for two probability measures µ, ν is defined as follows: By Strassen's Theorem (cf.Remark 1.29 in [Vil03], Appendix 1.4), an alternative characterization of the Prokhorov metric is given in terms of couplings γ which marginalize to µ and ν (compare Figure 2), π(µ, ν) = inf{inf{ε > 0 : γ({(x, y) : d(x, y) > ε}) < ε} : γ has marginals µ and ν}. (1) This allows for a discretization suitable for persistence diagrams, see Section 4.
).The only coupling with correct marginals is δ (x1,x2) .Then we have we write 1 ε<d(x1,x2) as a shorthand notation for the right hand side.Consequently, we have In their survey [GS02], Gibbs and Su show that the 1-Wasserstein can be related to the Prokhorov metric via where diam(X) is the diameter of the underlying space.We provide discrete analogues of this estimate in Propositions 4.11 and 4.16.
For more on metrics of probability measures, see the book [Rac91]; references for optimal transport include [PC+19] which takes on a computational perspective.

Persistent Homology
Definition 2.4.The category PersMod is the functor category from the reals as a poset category to finite dimensional vector spaces.Its objects are called pointwise finite dimensional (p.f.d.) persistence modules.A p.f.d.persistence module A = (A t ) t∈R comes with transition maps Definition 2.5.An interval module for an interval J ⊂ R is a p.f.d persistence module with Note that we do not specify whether the endpoints are contained in the interval; they may be ±∞.Interval modules are of special interest because p.f.d.persistence modules admit an interval decomposition.
Theorem 2.6 ([Cra15], Theorem 1.1).Let A ∈ PersMod.Then there exists a collection of intervals J such that Such an interval decomposition (sometimes called barcode) can be visualized via a persistence diagram.
Definition 2.7.A persistence diagram (PD) is multiset of points in R 2 , consisting of • points above the diagonal (b, d), b < d, each with finite multiplicity and • each point on the diagonal ∆ = {(s, s) ∈ R 2 } with countable multiplicity.
The convention to include diagonal points with infinite multiplicity will be useful for the construction of distances between persistence diagrams.
To obtain a PD from the above interval decomposition, collect the birth and death times of the intervals (where the angled brackets indicate that the endpoints may or may not be included); add all the points on the diagonal with countable multiplicities.Offdiagonal points have finite multiplicities since the persistence module is pointwise finite dimensional.We will freely identify off-diagonal points in the diagram with the corresponding interval.Points close to the diagonal have a short lifetime and are often regarded as noise.To compare persistence diagrams, we consider one-to-one correspondences between them.To take care of different cardinalities of off-diagonal points and to get rid of noisy, short-lifetime points, we allow them to be mapped to the diagonal.This explains the inclusion of the diagonal with infinite multiplicity in the above definition.
Definition 2.8.A matching η between persistence diagrams X and Y is a bijection which fixes all but finitely many diagonal points.The cardinality or size of a matching η, denoted by |η| is the number of points which are not fixed.
Definition 2.9.The bottleneck distance between two persistence diagrams where η ranges over all matchings.
The notation of Definition 2.10 has the advantage of being compact, but note that we have uncountably many summands.Although usually, only finitely many, namely |η| for the optimal matching η, will be non-zero.Similarly, also only finitely many elements of the uncountable set of which we take the supremum in Definition 2.9 are non-zero.
Definition 2.11.Let p ≥ 1.We say a persistence diagram X has finite pth moment, if the p-Wasserstein distance to the empty diagram is finite: W p (X, ∅) < ∞.
Except from section 4.2, the persistence diagrams in this paper are assumed to have finitely many off-diagonal points.Therefore, the infima in definitions 2.9 and 2.10 are actually minima.Notice the analogy between Definitions 2.2 and 2.10.We replace probability measures by counting measures and hence turn the integral into a sum.The infimum is taken over all matchings instead of all couplings.This observation will serve as a blueprint for the construction of the discrete Prokhorov metric for persistence diagrams in the Section 4. The motivation to compare persistence diagrams comes from topological data analysis, where they serve as a sumray statistic of topological information.
Example 2.12.Given a finite subset X of some metric space, we can consider the Vietoris-Rips complex, cf.[EH10] III.2.This is a filtered simplicial complex; for filtration value r > 0 it is given by VR(X)[r] = {S ⊂ X : diam(S) < 2r}.
Applying the homology functor gives rise to a persistence module: For r ≤ s, This is called the (Vietoris-Rips) persistent homology of X, denoted P H * (X) t .Summands in its interval decomposition are interpreted as topological features which are "born" at a certain point in the filtration and "persist" for some time.They are regarded to be more significant the longer the intervals are.The following theorem ascertains that this is a useful tool.
In other words, if we change the input point cloud by ε in the Gromov-Hausdorff metric, the resulting PDs differ by at most ε in the Bottleneck distance.

Bottleneck Profiles
The bottleneck distance W ∞ has a major drawback: It only captures the single most extreme difference between two persistence diagrams.This implies that the same bottleneck distance can be realized by different pairs of persistence diagrams, cf Figure 3.We introduce the notion of the bottleneck profile to address the topic of secondary, tertiary,... bottlenecks and their multiplicities.where | • | denotes the cardinality of the set.
For d : R 2 × R 2 → R ≥0 we take an p -metric d(x, y) = x − y p , where the choice of p might depend on the setting.For example, when comparing with the p-Wasserstein distance, one might like to choose this same p.Since the infimum is taken over a subset of the natural numbers, it is actually a minimum.To be consistent with the notation in definitions 2.9 and 2.10, we choose to adhere to the use of infimum.
The following observation is immediate: Lemma 3.2.The bottleneck profile D X,Y is monotonically decreasing.
Proof.Let η : X → Y be any matching realizing D X,Y (s) for some s.Let now t > s, then every distance longer than t is in particular longer than s and consequently Taking the infimum over all matchings decreases the left hand side and yields D X,Y (t).
Knowing this, it is interesting when the bottleneck profile becomes zero.
Proof.By definition, the bottleneck distance is the smallest t > 0 such that there is a matching mapping all points within distance t.In formulas, Thus we recover the bottleneck distance from the bottleneck profile.The bottleneck cost of a matching is the longest distance over which two points are matched.Minimizing the bottleneck cost over all matchings yields the bottleneck distance, which we can think of as the primary bottleneck.Similarly, the secondary bottleneck cost of a matching is the second longest distance over which two points are matched.Taking the minimum over all matchings here gives a notion of a secondary bottleneck, which equals inf{t > 0 : D X,Y (t) ≤ 1} by an argument analogous to the previous proof.This motivates the name bottleneck profile.
Example 3.4.Let X = {x} and Y = {y} both consist of one point each and assume that d(x, y) < d(x, x ) + d(y, y ), where the prime denotes the projection to the diagonal.That means that x → y is an optimal matching.Consequently, the bottleneck profile looks as follows: Example 3.5.If we take one of the persistence diagrams to be the empty one, there is only one choice of matching: everytthing is paired with the diagonal.As a consequence, This is also known as the stable rank function corresponding to the contour C(a, ε) = a + 2ε, introduced in [CR20], which counts the bars of X of length > 2t.Lemma 3.7.For all persistence diagrams X, Y, Z and all real numbers s, t ≥ 0, Proof.This follows from the triangle inequality on R 2 .Fix s, t ≥ 0, let η X,Y : X → Y and η Y,Z : Y → Z denote optimal matchings realizing D X,Y (s) and D Y,Z (t), respectively.Let η = η Y,Z • η X,Y : X → Z be the matching obtained by composition.It suffices to show that because the left hand side only decreases if we take the infimum over all matchings.Hence we have to investigate what happens when a point x is matched to η(x) which is farther apart than ), so we compare the distances of the matched points using the triangle inequality, Therefore, it cannot be that both d(x, η X,Y (x)) ≤ s and d(η X,Y (x), η(x)) ≤ t (compare Figure 5).That means, we have d(x, η X,Y (x)) > s or d(η X,Y (x), η(x)) > t or both.Using the principle of inclusion-exclusion, conclude x Note that D X,Y (t) = 0 for all t > 0 implies X = Y only under some finiteness assumptions.For example, consider a converging sequence (a n ) n∈N ⊂ R 2 above the diagonal with limit a ∈ (a n ), which is also above the diagonal.Set X to consist of all elements of the sequence {a n : n ∈ N}.Set Y to be X ∪ {a}.Then for all ε > 0 there exists η : Following [Blu+14], we denote by B the set of persistence diagrams such that for each ε > 0 there are finitely many points of persistence > ε.The next lemma is an immediate consequence of [Blu+14, Lemma 3.4].
Lemma 3.8.The bottleneck profile satisfies D X,X (t) = 0 for all Persistence diagrams X and t > 0.

Relation to Wasserstein distances
We have already seen how the bottleneck profile is related to the bottleneck distance.This is actually part of a more general result comparing it to p-Wasserstein metrics.
Lemma 3.9.Let X, Y be two persistence diagrams, and let p > 0. Then Proof.This follows from the Chebychev inequality (Lemma 2.1) for counting measures.To spell out the details, estimate that for every bijection η Now choosing η to minimize the right hand side, we have by definition of the Wasserstein distance an estimate for D X,Y : This is illustrated by Figure 6.Note that we recover Lemma 3.3 in the limit for  p → ∞: For 1-Wasserstein, we have a further estimate: Proof.Let η : X → Y be the matching realizing W 1 (X, Y ).We compute the area under the graph of the function t → |{x : d(x, η(x)) > t}|, which is piecewise constant.Decomposing it into rectangles of height one yields a width of inf{t > 0 : |{x : d(x, η(x)) > t}| < i} for i ≥ 1, cf. Figure 7.The width of the ith rectangle is the length of the ith longest edge in the matching.Summing over all i is therefore the same as summing the distances over which points are matched.In formulas: Proposition 3.11.If the bottleneck profile D X,Y (t) can be realized by the same matching η for all t > 0, then η realizes W 1 (X, Y ).
Proof.If η realizes D X,Y (t) for all t > 0, then the inequality in the proof of the previous lemma becomes an equality Combining this with Lemma 3.10, we obtain Consequently, the inequality ( * ) is actually an equality, which is what we wanted to prove.

Algorithms
Recall the definition and let η be the matching realizing the infimum.Then η also realizes the following supremum: sup  E) with e = {u, v} ∈ E if either of the following holds: Let M ⊂ E be a matching of maximal cardinality.Observe that such a matching corresponds to a bijection η : X → Y maximizing |{x : d(x, η(x)) ≤ t}|.
To estimate the run-time of this algorithm, let n = |X| + |Y |.We solve the unweighted maximum cardinality bipartite matching problem using the Hopcroft-Karp algorithm [HK73].Let us briefly recall this classical algorithm.The algorithm extends a partial matching M until it reaches a maximum one.It achieves this by augmenting paths: A path p that starts at an unmatched vertex in U and ending at an unmatched vertex in V such that edges from U to V are not in M but edges from V to U are. Removing edges from p ∩ M from the matching and instead inserting edges from p ∩ (E \ M ) increases the size of M by one.The Hopcroft-Karp algorithm finds vertex-disjoint augmenting paths in O(n 2 ) via the so-called layer subgraph, which is constructed via a depthfirst search in O(n 2 ).After extending the matching using all these augmenting paths, the algorithm starts over.The algorithm terminates after O( √ n) of these iterations.While this consequently takes O(n 2.5 ) in the worst case, we perform a variant which exploits the geometric nature of the setting, as suggested in [EIK01].Instead of building the layer graph explicitly, one can use a geometric data structure that allows for querying neighbors within a given distance, as well as removing points.Following [KMN17], k-d trees achieve this requiring O( √ n) for either of the two operations.Consequently, as noted by [KMN17] and [EIK01], our variant of the Hopcroft-Karp algorithm runs in O(n 2 ).Summarizing, we find the following: Proposition 3.12.Let X, Y be finite persistence diagrams and denote n = |X| + |Y |.The value of the bottleneck profile at t, D X,Y (t), can be computed in O(n 2 ).
Remark 3.13.Using k-d trees is useful in practice, but does not yield optimal theoretical run-times.Indeed, the more sophisticated data structure from [EIK01], Section 5.1, can be constructed in O(n log(n)).The two relevant operations on it require O(log(n)), so that the bottleneck profile could be evaluated in O(n 1.5 log(n)) using this method.
Remark 3.14.Instead of using Hopcroft-Karp, one can regard the matching problem as a linear program.For each x ∈ X and y ∈ Y , we have a binary variable f xy indicating whether the edge from x to y is in the matching.The coefficients (the cost of the edge) are given by

Discrete Prokhorov Metrics for Persistence Diagrams
A straight-forward discretization of the coupling characterization of the probabilistic Prokhorov metric (1) gives the main notion of this section.
Definition 4.1.Given two persistence diagrams X, Y , consider matchings η : X → Y to define their Prokhorov distance as Informally, we look at the intersection of the bottleneck profile with the diagonal.Similarly, we have already seen that the bottleneck distance arises as the intersection of D X,Y with the horizontal axis.This motivates the the question, what functions we can intersect the bottleneck profile with to obtain a sensible notion of distance.
Furthermore, the function f ≡ 1 is also said to be admissible.
Notice that such superadditive functions are monotonically non-decreasing.For example, any linear function with non-negative slope is admissible.Moreover, increasing convex functions f with f (0) = 0 are admissible For instance polynomials with non-negative coefficients and absolute term zero fulfill this criterion.
Definition 4.3.Given a fixed admissible function f : R ≥0 → R ≥0 , define for any two PDs X, Y their f -Prokhorov distance to be Plugging in f = id gives the Prokhorov distance, plugging in f ≡ 1 recovers the bottleneck distance (this is why this function is admissible even though it is not superadditive).Intuitively, for n ∈ N, plugging in f ≡ n (although this is not an admissible function) gives the nth bottleneck.For two Prokhorov-close PDs, we require the number (=counting measure) of unmatched points to be small.Points with small persistence get matched to the diagonal and thus do not blow up the Prokhorov distance.Hence it is robust with respect to noise.
Example 4.4.Assume f is invertible.Recall the situation of Example 3.4: X = {x} and Y = {y} both consist of one point each and we assume that d(x, y) < d(x, x ) + d(y, y ), where the prime denotes the projection to the diagonal.We saw that the bottleneck profile looks as follows: Proof.Note that D X,Y is right-continuous by construction.
The triangle inequality follows from Lemma 3.7.
Lemma 4.6.Fix an admissible function f : R ≥0 → R ≥0 .For any three persistence diagrams X, Y, Z, we have Proof.We make the following estimates: Here we used Lemma 3.7 for the first inequality, Lemma 4.5 for the second and superadditivity of f for the final one.Therefore, inf{t > 0 : the left hand side is the definition of π f (X, Z), as desired.
As the symmetry is clear, we have shown: Theorem 4.7.Fix an admissible function f : R ≥0 → R ≥0 .The discrete f -Prokhorov metric is an extended pseudometric.
Just like for the bottleneck distance, we need some finiteness property for the π f to be a genuine metric.Let B denote the persistence diagrams which for every ε > 0 have only finitely many points of persistence > ε.Then Lemma 3.8 implies: Lemma 4.8.Let f : R ≥0 → R ≥0 be admissible.For X, Y ∈ B, we have As the bottleneck profile is monotonically decreasing and lim t 0 f (t) = 0, this implies D X,Y (t) = 0 for all t > 0. By Lemma 3.8, this happens only if X = Y .
Our next task is to investigate how π f depends on the function f .While from a metric point of view, we need to fix f , the context of data science suggests a different perspective: For given training data (a fixed set of persistence diagrams) adjust f to obtain a metric that performs well on it (e.g. in a classification problem, cf.section 5).
Lemma 4.9.Let f, g : R ≥0 → R ≥0 such that f (t) ≤ g(t) for all t ≥ 0. Then for any two persistence diagrams X, Y , we have and by definition π g (X, Y ) ≤ π f (X, Y ).
For fixed persistence diagrams, the Prokhorov metric is continuous with respect to the functions in supremum metric.
By monotonicity of f we find that From a data science perspective, the preceding Lemma allows us to tune the parameter function f on a fixed training set of persistence diagrams.

Comparison with Wasserstein
Fix a persistence diagram X and consider Wasserstein metrics and Prokhorov distances to some other diagram Y .We can perturb Y by adding more "noise".More precisely, we add k points whose distance to the diagonal is less than π f (X, Y ) and denote this diagram by Y k .This does not affect the Prokhorov metric at all, while for all p ∈ [1, ∞), the value of W p (X, Y k ) goes to infinity when k does.This is what we mean when we say that the Prokhorov metric is more robust with respect to noise compared to the Wasserstein metric.In other (more mathematical) words, the identity map id : (Dgm, π f ) → (Dgm, W p ), where Dgm is the set of all persistence diagrams, is nowhere continuous for p ∈ [1, ∞)2 .In this section, we further explore the relation between Prokhorov and Wasserstein distances.
Similarly to the proofs in [GS02] for the measure-theoretic variants, we can bound our metric in terms of the Wasserstein distance.As we will explain, the metrics π t →t q are of special interest.
Proposition 4.11.Let p ≥ 1, q ≥ 0, c > 0 and f (t) = c • t q .For two persistence diagrams X, Y we have Proof.Recall from Lemma 3.9 that We now want to find a suitable value of t such that D X,Y (t) < c • t q to infer that
When comparing with the bottleneck distance, i.e. p = ∞ in the above setting, we can say even more: Proposition 4.13.For all admissible f and all persistence diagrams we have Proof.We recall by Lemma 3.3, and therefore Specializing to c = 1 and p ∈ {1, ∞} or q ∈ {0, 1}, we obtain: Corollary 4.14.The following inequalities hold: In particular, the Bottleneck Stability Theorem 2.13 implies stability for the new metrics by Proposition 4.13: Theorem 4.15.Let X, Y be finite metric spaces, fix some admissible function f and k ∈ N. Then we have where d GH is the Gromov-Hausdorff distance.
We can provide not only lower but also upper bounds for Wasserstein distances in terms of the Prokhorov distance.
Combining the two inequalities from Propositions 4.11 and 4.16, we obtain a comparison for different Wasserstein metrics.
Remark 4.18.Another inequality relating Wasserstein distances for different p and q ori-ginates from the Hölder inequality, given in [AGS20, Lemma 3.5]: For finite persistence diagrams X, Y and real numbers 1 ≤ q < p < ∞, we have where η is the matching realizing W p (X, Y ).Our inequality above yields a lower exponent for W p (X, Y ) at the cost of multiplying with the largest distance in the matching.In particular, for q = 1, p = 2, our formula reads with η realizing π t q (X, Y ), whereas the one of [AGS20] reads (with η realizing Depending on the size of W p (X, Y ) relative to the size of X and Y , our inequality can provide sharper bounds than the one of [AGS20].To investigate the size of max(d(x, η(x))) remains an interesting question for future work.One possible application of such inequalities is that they allow to infer stability results for vectorizations with respect to W p for p > 1 from the stability with respect to W 1 .Another use of Propositions 4.11 and 4.16 is that the bounds they provide for Wasserstein distances are easily computed, as we will see in Section 4.3 below.

Metric and Topological Properties
Using the comparison with Wasserstein (Section 4.1) and the results from [MMH11], we address questions of convergence and separability.We run into similar issues as [BV18, Theorems 4.20, 4.24, 4.25] and [Blu+14, section 3].In this section, we explicitly allow diagrams with a countably infite number of off-diagonal points under certain finiteness assumptions specified below.
Theorem 4.19.Let p ≥ 1.The space of persistence diagrams with finite pth moment endowed with the c • t q -Prokhorov metric is separable.
Proof.Let ε > 0, X a persistence diagram and p ≥ 1.Let S be a countable dense subset for the p-Wasserstein metric; this exists by [MMH11,Theorem 12].
In fact they show that we can take S to be the set of finite diagrams whose points have rational coordinates.Let X S ∈ S be a persistence diagram such that W p (X, X S ) < ε p+q p • c 1 p .Then by Proposition 4.11, we have Note that the assumptions in the previous Theorem are weaker than the ones usually considered for the bottleneck distance, compare [BV18, Theorem 4.18].
Recall that B denotes the persistence diagrams which for all ε > 0 have finitely many points of persistence > ε.The next Theorem is a consequence of [Blu+14, Theorem 3.5], which asserts that the bottleneck distance makes B into a Polish space.
Theorem 4.20.The space B endowed with the Prokhorov metric π f is Polish for all admissible f .Proof.Let (X n ) ⊂ B be a Cauchy sequence with respect to the Prokhorov metric As the bottleneck profile takes values in the integers, we conclude that D Xm,Xn (ε) = 0 and hence, by Lemma 3.3, we have ε ≥ W ∞ (X m , X n ).In particular, X n is a Cauchy sequence with respect to the bottleneck distance.By completeness of B with the bottleneck distance, there is a limit diagram X ∈ B to which the sequence converges.Finally by Lemma 4.13, convergence in bottleneck implies convergence in Prokhorov.Now for separability, consider a subset A ⊂ B which is dense with respect to the bottleneck distance.Let X ∈ B and ε > 0. Then by assumption, there is we also have π f (X, Y ) < ε.Therefore, A is dense in B with respect to π f as well.

Algorithms
In this section, all persistence diagrams are finite.Now we will provide an algorithm to compute π f (X, Y ) for continuous monotonically increasing functions f .In this case, there is always a single value t 0 ∈ [0, ∞) such that D X,Y (t) < f (t) for t > t 0 and D X,Y (t) > f (t) for t < t 0 .We can find its location by bisection.
Proof.First, observe that the Prokohorv distance takes its value among the pairwise distances of points in the persistence diagrams (if f crosses the bottleneck profile at one of its vertical gaps) or among preimages of integers under f (if f crosses the bottleneck profiles at one of its constant pieces), in formulas To perform a binary search, we sort the elements in T 1 as a preprocess, which has runtime complexity O(n 2 log(n)).In each iteration of the binary search we pick the median t ∈ T i .Next we compute the value of the bottleneck profile D X,Y (t) using Proposition 3.12, taking O(n 2 ).Then we compute f (t), which by assumption takes O(1).
Hence we obtain a runtime of O(n 2 log n) for the binary search as well.
Procedure 1 The binary search to compute π f (X, Y )  In particular, if one uses a more efficient geometric data structure to improve the runtime of the matching algoritthm, the sorting preprocessing dominates the runtime.Compare [EIK01], Theorem 3.2 and the preceding discussion therein for more details and possible improvements of the runtime complexity.Please refer to Section 6 for details about our implementation and its availability.There is an easy modification to the above algorithm to approximate π f up to an additive error of ε.Instead of performing the binary search on the indicated discrete set (which needs to be sorted or otherwise pre-processed in a costly way, as noted), one can run it on an interval [0, M ].Here, M is some upper bound, for example the sum of the longest lifespans of points in X and Y respectively (which is computed in O(n)).We bisect the interval until we arrive at one of length less than 2ε.Its midpoint is guaranteed to be less than ε away from the true value of π f (X, Y ).

Experiments
A simple application of the bottleneck profile, based on simple synthetic persistence diagrams, was already presented in Example 3.6.

Highlighting Geometric Intuition
This experiment is a toy example, showing how the Prokhorov distance can capture our geometric intuition more accurately than bottleneck or Wasserstein.Consider three different shapes in R 2 : a) a big circle (r = 6), b) a big (r = 6) and a medium circle (r = 4), c) a big (r = 6), a medium (r = 4) and small circle (r = 2).We take five samples with noise from each shape according to Table 1 For each point cloud we compute the first persistent homology modules of it alpha complex filtration and represent them as PDs (see Figure 8).We can look at the averaged D-function for each pair of shapes (Figure 9).After careful inspection of this figure and some trial and error, we come up with the choice of f (t) = t 3 • 20 t to separate three bottleneck profiles in a most efficient way: Between around 0.55 and 0.65, the averaged bottleneck profiles involving shape c) with the small circle decrease, while the one comparing a) and b) stays constant.
Intersecting with a function in this interval will provide a good choice for the Prokhorov distance: It puts the two and three circles closest to each other and one and three circles the farthest apart.In data science tasks, we will of course need an automated way to find a good parameter function f , we will discuss this in more detail below.Now we want to compare the Bottleneck, Prokhorov and Wasserstein distances.The bottleneck distance between shapes a) and both b) and c) is roughly the same.This distance does not take the presence of the additional small circle in shape c).By blowing up the sample size and the noise in shape b), the Wasserstein distance from a) and c) to it are artificially blown up (Figures 10  and 10).The Prokhorov distance is built to avoid these pitfalls and nicely captures the geometry of the setting.The MDS plot for Prokhorov agrees with our intuition and places b) between a) and c) (Figures 10).

Classification Experiments
We now turn to more sophisticated data sets to illustrate the usage and advantages of the Prokhorov distance.In particular, we consider persistence diagrams that actually arise in applications of TDA.We use the library [Ped+11] for standard machine learning algorithms (in particular K-Neighbors).For the Bottleneck and Wasserstein metrics we use the Gudhi library [God21] and [DCR21].To score the different metrics, we use K-neighbors classification accuracy as well as classification accuracy based on K-Medoids clustering with the "build" initialization [Sch], [SR20].In the latter case, points are assigned to the class of the medoid of their cluster.We split the data sets into training and testing with 50% of the points each.All computations were carried out on a laptop with an Intel i5-8265U CPU with 1.60 GHz and 8 GB memory.The code to reproduce the experiments is available online3 .

Parameter tuning -choosing f
One needs to specify an admissible function f as a parameter for the Prokhorov distance π f .The set off all such functions is vast, therefore it is sensible to restrict to a smaller subset.In the experiments below, we choose f from linear functions with integer slope ∈ [10, 100].We do this by performing a grid search over the parameters and evaluating them by five-fold cross-validation.By selecting this subset of parameters, we reduce the risk of overfitting and are able to run the parameter selection in reasonable time.We leave it as a problem for further investigation to find better means to run the parameter selection, but note that the fact that the bottleneck profile is piecewise constant obstructs the use of gradient descent.

Prokhorov Distance for Cubical Complexes with Outlier Pixels
We generate4 100 × 100 pixel greyscale images according to the following procedure, cf. Figure 12.Initializinng every pixel with 0, we choose n points at random, at which we add a Gaußian with σ = 3.We normalize the values to [0, 2] and then shift them up by 64.The goal is to distinguish images with n = 15 from images with n = 20.The obstacle is that we superimpose a particular kind Figure 12: The underlying Gaußians, the superimposed noise and the resulting persistence diagram noise, similar to salt-and-pepper noise.We choose k pixels randomly at which we set the value to a random integer from [1, 128]; the eight surrounding pixels are set to zero.For each of the four combinations n ∈ {15, 20} and k ∈ {3, 5} we sample 50 greyscale images.We then create a cubical complex from each using the pixels as top-dimensional cells (lower-star filtration) and compute persistent homology in dimensions 0 and 1.We proceed as indicated at the beginning of this section to assess the accuracy of the different metrics.The results are summarized in Table 2.Both in dimmension 0 and 1, the K-Neighbors classifier is inconclusive in the setting of Bottleneck and Wasserstein.With a suitable Prokhorov metric, we are able to achieve an accuracy of more than 80%.In the K-Medoids approach, the story is similar but less pronounced: Bottleneck and Wasserstein are inconclusive, but Prokhorov achieves around 60% accuracy.dim f (t)

3D Segmemtation
We adapt an example from [Car] and [DCR21], which is based on the dataset [CGF09].The task is to classify 3D-meshes based on the persistence diagrams of certain functions defined on them.The shapes are for example airplanes, hands, chairs ... The results of classification are presented in the Tables 3.All the considered metrics yield a similar accuracy.Prokhorov is the fastest, however at the cost of first having to find the suitable parameter, which took moore than ten hours in this case.

Synthetic Dataset
Finally, we consider the dataset introduced by [Ada+17, Section 6.1].It contains six shape classes: A sphere, a torus, clusters, clusters within clusters, a circle and the unit cube.From each class take 25 samples of 500 points.Then add two levels of Gaussian noise (η = 0.05, 0.1) and the zeroth and first persistent homology of the Vietoris-Rips filtration are computed.We compute the distance matrices and evaluate them based on the K-neighbors and K-medoids classifiers.The results are displayed in Table 4.We find that Prokhorov performs better Bottleneck and only slightly worse than Wasserstein.Prokhorov takes at most similarly long as 1-Wasserstein; Bottleneck is faster and 2-Wasserstein is slower.

Discussion
First and foremost, we found that Prokhorov is able to produce good results in situations where the classical tools of Bottleneck and Wasserstein fail.In order to explain the differences in the computation time, we note the size of the persistence diagrams in the various settings: By inspecting Table 5 wee see that the 3D segmentation dataset contains way smaller diagrams, on which the Prokhorov metric seems to perform well, both in terms of runtime and score.On the bigger diagrams from the synthetic dataset, the Wasserstein metrics yield the highest scores.Bottleneck in the scores at the cost of higher runtimes.The difference in the computation time is caused by the evaluation of f (t), which is the only difference between the Bottleneck and Prokhorov implementations.Bottleneck -and to some extend also Prokhorov -work less well on zerodimensional PDs.There, every class is born at time zero, hence the PD is intrinsically one-dimensional and points are matched in linear order.The bottleneck distance is less meaningful in this setting.Moreover, the Prokhorov (and even more the Bottleneck) distance do not take points matched over a small distance into account.This is a consequence of being designed to be robust against noise.However, this data can actually contain meaningful information, which is picked up by the Wasserstein distances.This is a possible explanation for the fact that Wasserstein yields better scores in the synthetic dataset.Hence, the Prokhorov metric works best on rather small diagrams and runs fastest with simple (e. g. linear) parameter functions f .Even then, one needs to take the additional time for tuning the parameter f into account.

Discussion and Outlook
Summarizing the results from the previous section, we find that the Prokhorov metric is well-suited for small persistence diagrams.Large scale computations can be improved by the technique of entropic regularization from the theory of optimal transport [LCO18].As the classical Prokhorov metric admits an optimal transport characterization, our discrete variant might be tractable using similar techniques.A major aspect of importance of the Bottleneck distance is its algebraic formulation in terms of interleavings.This theory generalizes to incorporate the family of Prokhorov metrics.An algebraic formulation would also provide a perspective on generalizations to multiparameter persistence.Our results in section 4.2 establish that our construction yields a Polish space.This makes it suitable for statistical inference.In a similar vein, one can also investigate bottleneck profiles persistence diagrams arising from random geometric complexes.What kind of limit objects appear in this context?Can they be used to perform statistical testing?Morally, stability theorems should involve related metrics on the input point cloud and on the persistence diagram side.This motivates to investigate Prokhorov-type distances for point clouds in R n .Such distances might be useful throughout data science.It asks for three inputs: diagram_1, diagram_2 and coef.The two diagrams need to be presented as 2D numpy arrays.The third parameter is a 1D numpy array representing the coefficients of a polynomial to be used as f .Note that the zeroth entry needs to be zero in order to obtain a metric, compare Lemma 4.8.However, the polynomial to be a constant integer one recovers the values of D X,Y , which is a feature.In the technical details, our approach follows [God21], which follows [KMN17].
In addition, we also add the Prokhorov metric to [DCR21], allowing for parallel computations of distance matrices and integration with sklearn.

Figure 1 :
Figure 1: Illustration of two Prokhorov-close measures which are not Wassersteinclose.

Figure 2 :
Figure 2: Illustration of two measures µ and ν and a coupling γ of them.
otherwise, for s ≤ t.The start and endpoint of J are referred to as birth time b(J) and death time d(J), respectively.Their difference d(J) − b(J) is called the persistence or lifetime of an interval.

Figure 3 :
Figure 3: Four bottlenecks on the left, a single bottleneck on the right, realizing almost the same bottleneck distance.

Example 3. 6 .
Consider some particular simple persistence diagrams.The first three parts of Figure4each show a base diagram ("Diagram X", in blue) with four points and perturbations of it: The orange diagram ("Diagram Y ") in the first image is obtained by shifting the blue one by three.The green diagram ("Diagram Z") shifts the top point of X by three, the next point by two, the third by one and leaves the lowest point unchanged.For the yellow diagram ("Diagram W ") in the third image, we only shift two points from X by three and leave the other two untouched.Clearly, the bottleneck distance between the base diagram and each of the shifted versions is three.But the amount of shifted points is reflected in the bottleneck profile: While D X,Y (t) is four, D X,Z (t) is two (i.e. the multiplicity of the bottleneck) for 0 < t < 3.And D X,W displays more steps, reflecting the fact that there are secondary and tertiary bottlenecks.

Figure 4 :
Figure 4: The PD X has bottleneck distance 3 to each of the PDs Y, Z, W (first three images).However, it is attained with different multiplicities, which one can read off from the bottleneck profile (right-most image)

Figure 5 :
Figure 5: The situation in the proof of Lemma 3.7 0 by Lemma 3.3.Now for X, Y ∈ B, this only happens if X = Y by [Blu+14, Lemma 3.4].

Figure 6 :
Figure 6: An example for the relation between D X,Y and the Wasserstein distance.

Figure 7 :
Figure 7: Illustrating the proof of Lemma 3.10: Decomposing the area under the graph into rectangles.

η
|{x : d(x, η(x)) ≤ t}|, and consequently D X,Y (t) = |η| − sup η |{x : d(x, η(x)) ≤ t}|.Here, |η| denotes the number of matched pairs which involve at least one offdiagonal point.The computation of sup η |{x : d(x, η(x)) ≤ t}| is a version of the unweighted maximum cardinality bipartite matching problem.First, set up the following notation (following [EH10, chapter VIII.4]).Denote by X 0 the off-diagonal points of X and by X 0 their projections to the diagonal (and analogously for Y ).w Set U = X 0 ∪ Y 0 and V = Y 0 ∪ X 0 and consider the bipartite graph G = (U ∪ V,

c
xy f xy subject to ∀x ∈ X : y f xy = 1, ∀y ∈ Y : x f xy = 1. then while 13: return T [L]

Figure 8 :Figure 9 :
Figure 8: One two and three noisy circles and their PDs for the first persistent homology.

Figure 10 :
Figure 10: MDS plots of the dataset in Section 5.1.

Figure 11 :
Figure 11: Distance matrices of the dataset in Section 5.1.

Table 1 :
. The three shapes: one two and three circles.

Table 2 :
Classification scores for the synthetic dataset.

Table 3 :
Classification scores for the 3d segmentation dataset.

Table 5 :
Cardinalities of the persistence diagrams for the considered experiments.