Abstract
Given a distribution \(\rho \) on persistence diagrams and observations \(X_{1},\ldots ,X_{n} \mathop {\sim }\limits ^{iid} \rho \) we introduce an algorithm in this paper that estimates a Fréchet mean from the set of diagrams \(X_{1},\ldots ,X_{n}\). If the underlying measure \(\rho \) is a combination of Dirac masses \(\rho = \frac{1}{m} \sum _{i=1}^{m} \delta _{Z_{i}}\) then we prove the algorithm converges to a local minimum and a law of large numbers result for a Fréchet mean computed by the algorithm given observations drawn iid from \(\rho \). We illustrate the convergence of an empirical mean computed by the algorithm to a population mean by simulations from Gaussian random fields.
Introduction
There has been a recent effort in topological data analysis (TDA) to incorporate ideas from stochastic modeling. Much of this work involved the study of random abstract simplicial complexes generated from stochastic processes [10–12, 14, 22, 23] and nonasymptotic bounds on the convergence or consistency of topological summaries as the number of points increase [2, 4, 6, 19, 20]. The central idea in these papers has been to study statistical properties of topological summaries of point cloud data.
In [16] it was shown that a commonly used topological summary, the persistence diagram [8], admits a well defined notion of probability distributions and notions such as expectations, variances, percentiles and conditional probabilities. The key contribution of this paper is characterizing Fréchet means and variances of finitely many persistence diagrams and providing an algorithm for estimating them. Existence of these means and variances was previously shown. However, a procedure to compute means and variances was not provided.
In this paper we state an algorithm which when given an observed set of persistence diagrams \(X_{1},\ldots ,X_{n}\) computes a new diagram which is a local minimum of the Fréchet function of the empirical measure corresponding to the empirical distribution \(\rho _{n} := n^{1} \sum _{i=1}^{n} \delta _{X_{i}}\). In the case where the diagrams are sampled independently and identically from a probability measure that is a finite combination of Dirac masses we provide a (weak) law of large numbers for the local minima computed by the algorithm we propose.
Persistence Diagrams and Alexandrov Spaces with Curvature Bounded from Below
In this section we state properties of the space of persistence diagrams that we will use in the subsequent sections. We first define persistence diagrams and the \(L^{2}\)Wasserstein metric on the set of persistence diagrams. Note that this is not the same metric as was used in [16]. We discuss the relation between the two metrics and why we work with the \(L^{2}\)Wasserstein metric later in this section. We then show that the space of persistence diagrams is a geodesic space and specifically an Alexandrov space with curvature bounded from below. We show that the Fréchet function in this space is semiconcave which allows us to define supporting vectors which will serve as an analog of the gradient. The supporting vectors will be used in the algorithm developed in the following section to find local minima—the algorithm is a gradient descent based method.
Persistent Homology and Persistence Diagrams
Consider a topological space \(\mathbb { X}\) and a bounded continuous function \(f: \mathbb { X}\rightarrow \mathbb { R}\). For a threshold \(a\) we define sublevel sets \(\mathbb { X}_{a} = f^{1}(\infty ,a]\). For \(a \le b\) inclusions \(\mathbb { X}_{a} \subset \mathbb { X}_{b}\) induce homomorphisms of the homology groups of sublevel sets:
for each dimension \(\ell \). We assume the function \(f\) is tame which means that \(\mathbf {f}_\ell ^{c\delta ,c}\) is not an isomorphism for any \(\delta >0\) at only a finite number of \(c\)’s for all dimensions \(\ell \) and \(\mathbf {H}_\ell (\mathbb { X}_{a})\) is finitely generated for all \(a \in \mathbb { R}\). We also assume that the homology groups are defined over field coefficients, e.g. \(\mathbb {Z}_{2}\).
By the tameness assumption the image \(\mathbf {F}_\ell ^{a,b} := \text{ Im } \mathbf {f}_\ell ^{a\delta ,b} \subset \mathbf {H}_\ell (\mathbb { X}_{b})\) is independent of \(\delta >0\) if \(\delta \) is small enough. The quotient group
is the cokernel of \(\mathbf {f}_\ell ^{a\delta ,a}\) and captures homology classes which did not exist in sublevel sets preceding \(\mathbb { X}_{a}\). This group is called the \(\ell \)th birth group at \(\mathbb { X}_{a}\) and we say that a homology class \(\alpha \in \mathbf {H}_\ell (\mathbb { X}_{a})\) is born at \(\mathbb { X}_{a}\) if its projection onto \(\mathbf {B}_\ell ^{a}\) is nontrivial.
Consider the map
and denote its kernel as \(\mathbf {D}_\ell ^{a,b}\). The kernel captures homology classes that were born at \(\mathbb { X}_{a}\) but at \(\mathbb { X}_{b}\) are homologous to homology classes born before \(\mathbb { X}_{a}\). We say that a homology class \(\alpha \in \mathbf {H}_\ell (\mathbb { X}_{a})\) that was born at \(\mathbb { X}_{a}\) dies entering \(\mathbb { X}_{b}\) if its projection onto \(\mathbf {D}_\ell ^{a,b}\) is \(0\) but its projection to \(\mathbf {D}_{\ell }^{a,b\delta }\) is nontrivial for all sufficiently small \(\delta >0\). We also call \(b\) a degree\(r\) death value of \(\mathbf {B}_\ell ^a\) if \(\mathrm {rank}\mathbf {D}_\ell ^{a,b}\mathrm {rank}\mathbf {D}_\ell ^{a,b\delta }=r>0\) for all sufficiently small \(\delta >0\).
If a homology class \(\alpha \) is born at \(\mathbb { X}_{a}\) and dies entering \(\mathbb { X}_{b}\) we set \(\mathrm {b}(\alpha ) = a\) and \(\mathrm {d}(\alpha ) = b\) and represent the births and deaths of \(\ell \)dimensional homology classes by a multiset of points in \(\mathbb { R}^{2}\) with the horizontal axis corresponding to the birth of a class, the vertical axis corresponding to the death of a class, and the multiplicity of a point being the degree of the death value. The idea of a persistence diagram is to consider a basis of persistent homology classes \(\{\alpha \}\) and to represent each persistent homology class \(\alpha \) by a point \((b(\alpha ), d(\alpha ))\).
The persistence of \(\alpha \) is the difference \(\text{ pers }(\alpha ) = \mathrm {d}(\alpha )  \mathrm {b}(\alpha )\). In the general setting we could have points with infinite persistence which corresponds to points of the form \((\infty , y)\) or \((x,\infty )\). These points are infinitely far from all points on finite persistence and hence would have to be treated separately. The space of persistence diagrams would be forced to be disconnected with each component corresponding to the number of points at infinity. For the sake of clarity we will restrict ourselves to the case where all classes have finite persistence. This can be achieved by considering extended persistence but for simplicity we can simply kill everything by setting \(\mathbf {g}_\ell ^{a,b} = 0\) if \(b \ge \sup _{x \in \mathbb { X}} f(x)\).
After establishing some notation we can define persistence diagrams and the distance between two diagrams. Let \(\Delta =\{(x,y)\in \mathbb { R}^{2} \mid x=y\}\) be the diagonal in \(\mathbb { R}^{2}\). Let \(\Vert xy \Vert \) be the usual Euclidean distance if \(x\) and \(y\) are off diagonal points. With a slight abuse of notation let \(\Vert x\Delta \Vert \) denote the perpendicular distance between \(x\) and the diagonal and \(\Vert \Delta \Delta \Vert =0\).
Definition 2.1
A persistence diagram is a countable multiset of points in \(\mathbb { R}^{2}\) along with the infinitely many copies of the diagonal \(\Delta =\{(x,y)\in \mathbb { R}^{2} \mid x=y\}\). We also require for the countably many points \(x_{j}\in \mathbb { R}^{2}\) not lying on the diagonal that \(\sum _{j} \Vert x_{j}\Delta \Vert <\infty \).
Each point \(p=(a,b)\) in a persistence diagram corresponds to some homology class \(\alpha \) with \(\mathrm {b}(\alpha )=a\) and \(\mathrm {d}(\alpha )=b\). As a slight abuse of notation we say that \(p\) is born at \(\mathrm {b}(p):=\mathrm {b}(\alpha )\) and dies at \(\mathrm {d}(p):=\mathrm {d}(\alpha )\).
We denote the set of all persistence diagrams by \(\mathcal {D}\). One metric on \(\mathcal {D}\) is the \(L^{2}\)Wasserstein metric
Here we consider all the possible bijections \(\phi \) between the off diagonal points and copies of the diagonal in \(X\) and the off diagonal points and copies of the diagonal in \(Y\). Bijections always exist as any point can be paired to the diagonal. We will call a bijection optimal if it achieves this infimum.
In much of the computational topology literature the following \(p\)th Wasserstein distance between two persistence diagrams, \(X\) and \(Y\), is used
In [16] the above metric was used to define the following space of persistence diagrams
with \(p \ge 1\) and \(\emptyset \) is the diagram with just the diagonal. It was shown in [16, Theorems 6 and 10] that \(\mathcal {D}_{p}\) is a complete separable metric space and probability measures on this space can be defined. Given a probability measure \(\rho \) on \(\mathcal {D}_{p}\) the existence of a Fréchet mean was proven under restrictions on the space of persistence diagrams \(\mathcal {D}_{p}\) [16, Theorem 21 and Lemma 27]. The basic requirement is that \(\rho \) has a finite second moment and the support of \(\rho \) has compact support or is concentrated on a set with compact support.
In this paper we focus on the \(L^{2}\)Wasserstein metric since it leads to a geodesic space with some known structure. Thus we consider the space of persistence diagrams
The results stated in the previous paragraph will also hold for \(\mathcal {D}_{L^{2}}\) with metric \(d_{L^{2}}\), including existence of Fréchet means. This follows from the fact that for any \(x,y \in \mathbb { R}^{2}\)
so \(d_{W_2}(X,Y)\le d_{L^{2}}(X,Y) \le \sqrt{2}d_{W_2}(X,Y)\). This inequality coupled with the results in [7] implies the following stability result for the \(L^{2}\)Wasserstein distance.
Theorem 2.2
Let \(\mathbb { X}\) be a triangulable, compact metric space such that \(d_{W_k}(\text{ Diag }(h), \emptyset )^k\le C_{\mathbb { X}}\) for any tame Lipschitz function \(h:\mathbb { X}\rightarrow \mathbb { R}\) with Lipschitz constant \(1\), where \(\text{ diag }(h)\) denotes the persistence diagram of \(h\), \(k\in [1,2)\), and \(C_{\mathbb { X}}\) is a constant depending only on the space \(\mathbb { X}\). Then for two tame Lipschitz functions \(f, g : \mathbb {X} \rightarrow \mathbb {R}\) we have
where \(C = C_{\mathbb { X}} \max \{\text{ Lip }(f)^k,\text{ Lip }(g)^k\}\).
For ease of notation in the rest of the paper we denote \(d_{L^{2}}(X,Y)^{2}\) as \(d(X,Y)^{2}\).
Proposition 2.3
For any diagrams \(X, Y\in \mathcal {D}_{L^{2}}\) the infimum in (1) is always achieved.
We prove this proposition in the Appendix.
We now show that the space of persistence diagrams with the above metric is a geodesic space. A rectifiable curve \(\gamma : [ 0, l] \rightarrow X\) is called a geodesic if it is locally minimizing and parametrized proportionally to the arc length. If \(\gamma \) is also globally minimizing, then it is said to be minimal. \(\mathcal {D}_{L^{2}}\) is a geodesic space if every pair of points is connected by a minimal geodesic. Now consider diagrams \(X=\{x\}\) and \(Y=\{y\}\) and some optimal pairing \(\phi \) between the points in \(X\) and \(Y\). Let \(\gamma :[0,1]\rightarrow \mathcal {D}_{L^{2}}\) be the path from \(X\) to \(Y\) where \(\gamma (t)\) is the diagram with points which have travelled in a straight line from the point (which can be a copy of the diagonal) \(x\) to the point (which can be a copy of the diagonal) for a distance of \(t\Vert x\phi (x)\Vert \). In other words, the diagram with points \(\{(1t)x+t\phi (x)\,\, x\in X\}\).^{Footnote 1} \(\gamma \) is a geodesic from \(X\) to \(Y\). The proof of this is the observation that \(\phi _t^{X}:X\rightarrow \gamma (t)\) where
is optimal.
Gradients and Supporting Vectors on \(\mathcal {D}_{L^{2}}\)
We will propose a gradient descent based algorithm to compute Fréchet means. To analyze and understand the algorithm we will need to understand the structure of \(\mathcal {D}_{L^{2}}\). We will show that \(\mathcal {D}_{L^{2}}\) is an Alexandrov space with curvature bounded from below (see [5] for more information on these spaces). This result is not so surprising since there are known relations between \(L^{2}\)Wasserstein spaces and Alexandrov spaces with curvature bounded from below [13, 21]. The motivating idea behind these spaces was to generalize the results of Riemannian geometry to metric spaces without Riemannian structure.
The property and behavior of Fréchet means is closely related to the curvature of the space. For metric spaces with curvature bounded from above, called \(CAT\)spaces,^{Footnote 2} properties of Fréchet means have been investigated and there exist algorithms to compute Fréchet means [25]. \(\mathcal {D}_{L^{2}}\) is not a \(CAT\)space, see Proposition 2.4. \(\mathcal {D}_{L^{2}}\) is however an Alexandrov space with curvature bounded from below. Less is known about properties of Fréchet means in these spaces as well as algorithms to compute Fréchet means. We use the structure of Alexandrov spaces with curvature bounded from below to compute estimates of Fréchet means and provide some analysis of these estimates. Note that Fréchet means are the same as barycenters which is what is referred to in much of the literature.
We first confirm that \(\mathcal {D}_{L^{2}}\) is not a \(CAT\)space.
Proposition 2.4
\(\mathcal {D}_{L^{2}}\) is not in \(\text{ CAT }(k)\) for any \(k>0\).
Proof
If \(\mathcal {D}_{L^{2}} \in \text{ CAT }(k)\) then for all \(X,Y\in \mathcal {D}_{L^{2}}\) with \(d(X,Y)^{2}<\pi ^{2}/k\) there is a unique geodesic between them [3, Proposition 2.11]. However, we can find \(X,Y\) arbitrarily close with two distinct geodesics. One example is taking \(X\) to be a diagram with two diagonally opposite corners of a square and \(Y\) a diagram with the other two corners. The horizontal and vertical paths are equally optimal and we may choose the square to be as small as we wish. \(\square \)
The following inequality characterizes Alexandrov spaces with curvature bounded from below by zero [21]. Given a geodesic space \(\mathbb { X}\) with metric \(d'\) for any geodesic \(\gamma :[0,1] \rightarrow \mathbb { X}\) from \(X\) to \(Y\) and any \(Z \in \mathbb { X}\)
We now show that \(\mathcal {D}_{L^{2}}\) is a nonnegatively curved Alexandrov space.
Theorem 2.5
The space of persistence diagrams \(\mathcal {D}_{L^{2}}\) with metric \(d\) given in (1) is a nonnegatively curved Alexandrov space.
Proof
First observe that \(\mathcal {D}_{L^{2}}\) is a geodesic space. Let \(\gamma :[0,1] \rightarrow \mathcal {D}_{L^{2}}\) be a geodesic from \(X\) to \(Y\) and let \(Z \in \mathcal {D}_{L^{2}}\) be any diagram. We want to show that the inequality (4) holds.
Let \(\phi \) be an optimal bijection between \(X\) and \(Y\) which induces the geodesic \(\gamma \). That is \(\gamma (t)=\{(1t)x+t\phi (x)\,\,x\in X\}\) and defined \(\phi _{t(x)} =tx+(1t)\phi (x)\) as done in (3). Let \(\phi _{Z}^{t}: Z \rightarrow \gamma (t)\) be optimal. Construct bijections \(\phi _{Z}^{X}:Z\rightarrow X\) and \(\phi _{Z}^{Y}: Z\rightarrow Y\) by \(\phi _{Z}^{X}= (\phi _t)^{1}\circ \phi _{Z}^{t}\) and \(\phi _{Z}^{Y}=\phi \circ \phi _{Z}^{X}\). There is no reason to suppose that either bijections \(\phi _{Z}^{X}\) or \(\phi _{Z}^{Y}\) are optimal. Note that if \(\phi _{Z}^{t}(z)=\Delta \) then \(\phi _{Z}^{X}(z)=\Delta \) and \(\phi _{Z}^{Y}(z)=\Delta \).
From the formula for the distance in \(\mathcal {D}_{L^{2}}\) we observe
Euclidean space has everywhere curvature zero so for each \(z\) in the diagram \(Z\), and all \(t\in [0,1]\), we have
Combining these equalities with inequalities (5) gives us the desired result. \(\square \)
Properties of the Fréchet Function
Given a probability distribution \(\rho \) on \(\mathcal {D}_{L^{2}}\) we can define the corresponding Fréchet function to be
The Fréchet mean set of \(\rho \) is the set of all the minimizers of the map \(F\) on \(\mathcal {D}_{L^{2}}\). If there is a unique minimizer then this is called the Fréchet mean of \(\rho \). The variance is then defined to be the infimum of the above functional.
We will show that the Fréchet function has the nice property of being semiconcave. For an Alexandrov space \(\Omega \), a locally Lipschitz function \(f : \Omega \rightarrow \mathbb { R}\) is called \(\lambda \) concave if for any unit speed geodesic \(\gamma \) in \(\Omega \), the function
is concave. A function \(f : \Omega \rightarrow \mathbb { R}\) is called semiconcave if for any point \(x\in \Omega \) there is a neighborhood \(\Omega _x\) of \(x\) and \(\lambda \in \mathbb { R}\) such that the restriction \(f_{\Omega _x}\) is \(\lambda \)concave.
Proposition 2.6
If the support of \(\rho \) is bounded \((\)as in has bounded diameter\()\) then the corresponding Fréchet function is semiconcave.
Proof
We will first show that if the support of a probability distribution \(\rho \) is bounded then the corresponding Fréchet function is Lipschitz on any set with bounded diameter. We then show that for any unit length geodesic \(\gamma \) and any \(X \in \mathcal {D}_{L^{2}}\) the function
is concave. We then complete the proof by showing the Fréchet function \(F\) is 2concave at every point (and hence \(F\) is semiconcave) by considering \(F(\gamma (s))s^{2}\) as \(\int g_{X}(s) d\rho (X)\).
Let \(U\) be a subset of \(\mathcal {D}_{L^{2}}\) with bounded diameter. This means that there is some \(K\) such that for any \(Y \in U\) we have \(\int d(X,Y) d\rho (X) \le K\). Here we are also using that the support of \(\rho \) is bounded. Let \(Y,Z \in U\). Then
Let \(\gamma \) be a unit speed geodesic and \(X\in \mathcal {D}_{L^{2}}\). Consider the function
We want to show that \(g_{X}\) is concave which means that \(g_{X}(tx+(1t)y)\ge tg_{X}(x) + (1t)g_{X}(y)\). Let \(\tilde{\gamma }(t)\) be the geodesic from \(\gamma (x)\) to \(\gamma (y)\) traveling along \(\gamma \) so that \(\gamma ((1t)x+ ty) = \tilde{\gamma }(t)\) for \(t \in [0,1]\) and
The inequality comes from the defining inequality (4) that makes \(\mathcal {D}_{L^{2}}\) a nonnegatively curved Alexandrov space.
By the construction of \(g_{X}\) we can think of \(F(\gamma (s))s^{2}\) as \(\int g_{X}(s) d\rho (X)\). This means that we can write
The concavity of \(g_{X}\) ensures that \( tg_{X}(x)+(1t)g_{X}(y)\le g_{X}(tx+(1t)y)\) and hence
\(\square \)
We now define the additional structure on Alexandrov spaces with curvature bounded from below that we will need to define gradients and supporting vectors. This exposition is a summary of the content in [21, 24].
Given a point \(Y\) in an Alexandrov space \(\mathcal {A}\) with nonnegative curvature we first define the tangent cone \(T_{Y}\). Let \(\widehat{\Sigma }_{Y}\) be the set of all nontrivial unitspeed geodesics emanating from \(Y\). For \(\gamma , \eta \in \widehat{\Sigma }_{Y}\) the angle between them defined by
when the limit exists. We define the space of directions \((\Sigma _{Y}, \angle _{Y})\) at \(Y\) as the completion of \(\widehat{\Sigma }_{Y}/ \sim \) with respect to \(\angle _{Y}\), where \(\gamma \sim \eta \) if \(\angle _{Y}(\gamma ,\eta )=0\). The tangent cone \(T_{Y}\) is the Euclidean cone over \(\Sigma _{Y}\):
The inner product of \(\mathbf {u} = (\gamma ,s), \mathbf {v} = (\eta ,t) \in T_{Y}\) is defined as
A geometric description of the tangent cone \(T_{Y}\) is as follows. \(Y \in \mathcal {D}_{L^{2}}\) has countably many points \(\{y_{i}\}\) off the diagonal. A tangent vector is a set of vectors \(\{v_{i} \in \mathbb { R}^{2}\}\) one assigned to each \(y_{i}\) along with countably many vectors at points along the diagonal pointing perpendicular to the diagonal such that the sum of the squares of the lengths of all these vectors is finite. Observe that there can exist tangent vectors such that the corresponding geodesic may not exist for any positive amount of time. The angle between two tangent vectors is effectively a weighted average of all the angles between the pairs of vectors.
We now define differential structure as a limit of rescalings. For \(s > 0\) denote the space \((\mathcal {A},s\cdot d)\) by \(s \mathcal {A}\) and define the map \(i_{s}: s \mathcal {A} \rightarrow \mathcal {A}\). For an open set \(\Omega \subset \mathcal {A}\) and any function \(f : \Omega \rightarrow \mathbb { R}\) the differential of \(f\) at a point \(p \in \Omega \) is a map \(T_{p} \rightarrow \mathbb { R}\) is defined by
For semiconcave functions the above differential is well defined and we can study gradients and supporting vectors.
Definition 2.7
(Gradients and supporting vectors) Given an open set \(\Omega \subset \mathcal {A}\) and a function \(f: \Omega \rightarrow \mathbb { R}\) we denote by \(\nabla _{p} f\) the gradient of a function \(f\) at a point \(p \in \Omega \). \(\nabla _{p} f\) is the vector \(v \in T_{p}\) such that

(i)
\(d_{p} f(x) \le \langle v, x\rangle \) for all \(x\in T_{p}\)

(ii)
\(d_{p} f(v)=\langle v,v\rangle \).
For a semiconcave \(f\) the gradient exists and is unique (Theorem 1.7 in [15]). We say \(s\in T_{p}\) is a supporting vector of \(f\) at \(p\) if \(d_{p} f(x) \le  \langle s, x\rangle \) for all \(x\in T_{p}\). Note that \(\nabla _{p} f\) is a supporting vector if it exists in the tangent cone at \(p\).
Lemma 2.8

(i)
If \(s\) is a supporting vector then \(\Vert s\Vert \ge \Vert \nabla _{p} f\Vert \).

(ii)
If \(p\) is local minimum of \(f\) and \(s\) is a supporting vector of \(f\) at \(p\) then \(s=0\).
Proof

(i)
First observe that from the definitions of \(\nabla _{p} f\) and supporting vectors we have
$$\begin{aligned} \langle \nabla _{p} f, \nabla _{p} f \rangle = d_{p} f(\nabla _{p} f)\le \langle s, \nabla _{p} f\rangle . \end{aligned}$$We also know that
$$\begin{aligned} 0\le \langle \nabla _{p} f +s, \nabla _{p} f+s\rangle = \langle \nabla _{p} f, \nabla _{p} f\rangle + 2\langle \nabla _{p} f,s\rangle + \langle s, s\rangle . \end{aligned}$$These inequalities combined tell us that \(0\le \langle \nabla _{p} f, \nabla _{p} f\rangle + \langle s, s\rangle .\)

(ii)
If \(p\) is a local minimum of \(f\) then \(d_{p}f(x)\ge 0\) for all \(x\in T_{p}\). In particular \(d_{p}(s)\ge 0\). Since \(s\) is a supporting vector \(\langle s, s \rangle \ge d_{p} f(s) \ge 0\). This implies \(\langle s,s\rangle =0\) and hence \(s=0\).\(\square \)
We care about gradients and supporting vectors because they can help us find local minima of the Fréchet function. Indeed a necessary condition for \(F\) to have local minimum at \(Y\) is \(s=0\) for any supporting vector \(s\) of \(F\) at \(Y\). Since the tangent cone at \(Y\) is a convex subset of a Hilbert space we can take integrals over probability measures with values in \(T_{Y}\). This allows us to find a formula for a supporting vector of the Fréchet function \(F\).
Proposition 2.9
Let \(Y\in \mathcal {D}_{L^{2}}\). For each \(X\in \mathcal {D}_{L^{2}}\) let \(F_{X}:Z \mapsto d(X, Z)^{2}\).

(i)
If \(\gamma \) is a distance achieving geodesic from \(Y\) to \(X\), then the tangent vector to \(\gamma \) at \(Y\) of length \(2d(X,Y)\) is a supporting vector at \(Y\) for \(F_{X}\).

(ii)
If \(s_{X}\) is a supporting vector at \(Y\) for the function \(F_{X}\) for each \(X\in \text {supp}(\rho )\) then \(s=\int s_{X}d\rho (X)\) is a supporting vector at Y of the Fréchet function \(F\) corresponding to the distribution \(\rho \).
Proof
(i) Let \(\gamma \) be a unit speed geodesic from \(Y\) to \(X\). Consider the tangent vector \(s_{X}=(\gamma , 2d(X,Y))\). Let \(\gamma (t)_{i}\) denote the point in \(\gamma (t)\) that is sent to \(x_{i} \in X\). Since \(\gamma \) is a distance achieving geodesic we know that
To show \(d_{Y}F_{X}(v)\le \langle s_{X}, v\rangle \) for all \(v\in T_{Y}\) it is sufficient to consider vectors of the form \((\tilde{\gamma }, 1)\) where \(\tilde{\gamma }\) is a unit speed geodesic starting at \(Y\). Let \(\tilde{\gamma }(t)_{i}\) denote the point in \(\tilde{\gamma }(t)\) which started at \(\gamma (0)_{i}\). This means that \(x_{i} \mapsto \tilde{\gamma }(t)_{i}\) is a bijection from \(X\) to \(\tilde{\gamma }(t)\) and
where \(\theta _{i}\) is the angle between the paths \(s\mapsto \gamma (s)_{i}\) and \(t\mapsto \tilde{\gamma }(t)_{i}\) in the plane. Now
for all \(s>0\) and \(\Vert \tilde{\gamma }(0)_{i}\tilde{\gamma }(t)_{i}\Vert ^{2}=t^{2}\Vert \tilde{\gamma }(0)_{i}\tilde{\gamma }(1)_{i}\Vert ^{2}\) for all \(t\). This implies that
Recall from our construction of the tangent cone that
By comparing these equations we get \(d_{Y} F_{X}(v) \le \langle v, s_{X}\rangle \) and thus we can conclude \(s_{X}\) is a supporting vector.
(ii) Now let \(s_{X}\) be any supporting vector of \(F_{X}\). By its definition we know that \(d_{Y} F_{X}(v) \le  \langle s_{X}, v\rangle \) for all \(v\in T_{Y}\) and hence
\(\square \)
In the following section we provide an algorithm that computes a local minimum of a Fréchet function using a gradient descent procedure. The above results will be used since computing a supporting vector of \(Z \mapsto d(X,Z)^{2}\) can be significantly easier and faster than computing a supporting vector of \(F\) itself
Finding Local Minima of the Fréchet Function
In this section we state an algorithm that computes a Fréchet mean of a finite set of persistence diagrams with finitely many off diagonal points, and examine convergence properties of this algorithm. We will restrict our attention to diagrams with only finitely many offdiagonal points with multiplicity of the points allowed.
Given a set of persistence diagrams \(\{X_{i}\}_{i=1}^{m}\) a Fréchet mean \(Y\) is a diagram that satisfies
with the empirical measure \(\rho _{m} := m^{1} \sum _{i=1}^{m} \delta _{X_{i}}\).
We employ a greedy search algorithm based on gradient descent to find a local minimum. A key component of this greedy algorithm (see Algorithm 1) consists of a variant of the Kuhn–Munkres (Hungarian) algorithm [18].
The Hungarian algorithm finds the least cost assignment of tasks to people under the assumption that the number of tasks and people are the same. The input is the cost for each person to do each of the tasks. Suppose we have two diagrams \(X\) and \(Y\) each with only finitely many off diagonal points. Consider as many copies of the diagonal in \(X\) and \(Y\) to allow the option of matching every off diagonal point with the diagonal. We can think of the points and copies of the diagonal in \(X\) as the people and the points and copies of the diagonal in \(Y\) as tasks. The cost of \(x\in X\) doing task \(y\in Y\) is \(\Vert xy\Vert ^{2}\). The total cost of an assignment (or in other words bijection) \(\phi \) of tasks to people is \(\sum _{x\in X} \Vert x\phi (x)\Vert ^{2}\). The Hungarian algorithm gives us a bijection \(\phi \) that minimizes this cost. This means it gives an optimal pairing between \(X\) and \(Y.\)
We would like to use the arithmetic mean of points in the plane and some number of copies of the diagonal. If \(x_{1}, \ldots , x_{m}\) are points in \(\mathbb { R}^{2}\) then there arithmetic mean \(w=\frac{1}{n}\sum _{i=1}^{m} x_{i}\) is the choice of \(z\) that minimizes the sum \(\sum _{i=1}^{m} \Vert zx_{i}\Vert ^{2}\). If \(x_{i}=\Delta \) for all \(i\) then the arithmetic mean is set to be \(\Delta \). The final case, without loss of generality, is when \(x_{1}, \ldots , x_k\) are all off diagonal points and \(x_{k+1}, \ldots , x_{m}\) are all the diagonal. Let \(w\) be the normal arithmetic mean of \(x_{1}, \ldots , x_k\) and let \(w_\Delta \) be the closest point on the diagonal to \(w\). We set
to be the arithmetic mean of \(x_{1}, \ldots , x_{m}\). This is the choice of \(z\) that minimizes \(\sum _{i=1}^{m} \Vert zx_{i}\Vert ^{2}\). We use an operation \(\text{ mean }_{i=1,..,m}(x_{i}^{j})\) that computes the arithmetic mean for each pairing over the diagrams.
Suppose \(Y\) is our current estimate for the Fréchet mean. Using the Hungarian algorithm we compute optimal pairings between \(Y\) and each of the \(X_{i}\). We denote these pairings as \(\{(y^{j}, x_{i}^{j})\}_{j=1}^{J_{i}}\) where \(J_{i}\) is the number of off diagonal in \(X_{i}\) and \(Y\) combined. For each \(y_{j}\ne \Delta \) we then consider all the \(x_{ij}\). Let \(\tilde{y^{j}}\) be the arithmetic mean of the \(x_{ij}\). Whenever in our pairings \(\{(y^{j}, x_{i}^{j})\}_{j=1}^{J_{i}}\) we see a \((\Delta , x_{i}^{j})\) we think this as a different copy of the diagonal as in any pairing between \(Y\) and \(X_k\) with \(k\ne i\). We would be using the arithmetic mean of \(m1\) copies of the diagonal and \(x_{i}^{j}\). Let \(Y'\) be the diagram with points \(\tilde{y^{j}}\). We will show later that if \(Y=Y'\) then \(Y\) is a local minimum of the Fréchet function. Otherwise we chose \(Y'\) to be our current estimate.
The basic steps of Algorithm 1 is to:

(a)
randomly initialize the mean diagram. For example we can start at one of the \(m\) persistence diagrams or the midway point of two of the \(m\) diagrams;

(b)
use the Hungarian algorithm to compute optimal pairings between the estimate of the mean diagram and each of the persistence diagrams;

(c)
update each point in the mean diagram estimate with the arithmetic mean over all diagrams—each point in the mean diagram is paired with a point (possibly on the diagonal) in each diagram;

(d)
if the updated estimate locally minimizes \(F_{m}\) then return the estimate otherwise return to step (b).
An alternative to the above greedy approach would be a brute force search over point configurations to find a Fréchet mean. One way to do this is to list all possible pairings between points in each pair of diagrams. Then compute the arithmetic mean for all such pairings. One of these means will be a Fréchet mean. While this approach will find the complete mean set its combinatorial complexity is prohibitive.
Convergence of the Greedy Algorithm
The remainder of this section provides convergence properties for Algorithm 1. By convergence we mean that the algorithm will terminate at some point having found a local minimum. The reason for this is that at each iteration the cost function \(F_{m}\) decreases, at each iteration the algorithm uses a new set of pairings, and there are only finitely many combinations of pairings between points in the diagrams.
We first develop necessary and sufficient conditions for a diagram \(Y\) to be a local minimum of a set of persistence diagrams. We define \(F_{i} (Z):= d(Z,X_{i})^{2}\), the Fréchet function corresponding to \(\delta _{X_{i}}\). This allows us to define the Fréchet function as \(F= \frac{1}{m} \sum _{i=1}^{m} F_{i}\) corresponding to the the distribution \(\frac{1}{m}\sum _{i=1}^{m} \delta _{X_{i}}\).
The following lemma provides a necessary condition for a diagram to be a local minimum of \(F\). This condition is the stopping criterion in Algorithm 1.
Lemma 3.1
If \(W = \{w_{i}\}\) is a local minimum of the Fréchet function \(F = \frac{1}{m} \sum _{j=1}^{m} F_{j}\) \(F\) then there is a unique optimal pairing from \(W\) to each of the \(X_{j}\) which we denote as \(\phi _{j}\) and each \(w_{i}\) is the arithmetic mean of the points \(\{\phi _{j}(w_{i})\}_{j=1, 2, \ldots , m}\). Furthermore if \(w_k\) and \(w_l\) are offdiagonal points such that \(\Vert w_kw_l\Vert =0\) then \(\Vert \phi _{j}(w_k)\phi _{j}(w_l)\Vert =0\) for each \(j\).
Proof
Let \(\phi _{j}\) be some optimal pairings (not yet assumed to be unique) between \(Y\) and \(X_{j}\) and let \(s_{j}\) be the corresponding vectors in the tangent cone at \(Y\) that are tangent to the geodesics induced by \(\phi _{j}\) and are of length \(d(X_{j},Y)\). The \(2s_{j}\) are supporting vectors for the functions \(F_{j}(Y)= d(Y, X_{j})^{2}\) by Proposition 2.9, so we have \(\frac{2}{m}\sum _{j=1}^{m} s_{j}\) is a supporting vector of \(F\).
From Lemma 2.8 we know that \(\frac{2}{m}\sum _{j=1}^{m} s_{j}=0\). Since at each \(w_{i}\) the \(s_{j}\) gives the vector from \(w_{i}\) to \(\phi _{j}(w_{i})\), \(\sum _{j=1}^{m} s_{j}=0\) implies that \(w_{i}\) is the arithmetic mean of the points \(\{\phi _{j}(w_{i})\}_{j=1, 2,\ldots ,m}\).
Now suppose that \(\phi _k\) and \(\tilde{\phi _k}\) are both optimal pairings. By the above reasoning we have \(\frac{1}{m}(\tilde{s_k} + \sum _{j=1, j\ne k}^{m} s_{j})=0 =\frac{1}{m}\sum _{j=1}^{m} s_{j}\) and hence \(\tilde{s_k} = s_k\). This implies that \(\Vert \tilde{\phi _k}(w_{i}) \phi _k(w_{i})\Vert =0\) for all \(w_{i}\in W\). In particular, for offdiagonal points \(w_k\) and \(w_l\) with \(\Vert w_kw_l\Vert =0\) and \(\phi _k\) an optimal pairing, we can consider the pairing \(\tilde{\phi }_k\) with \(w_k\) and \(w_l\) swapped. Since \(\Vert \tilde{\phi _k}(w_{i}) \phi _k(w_{i})\Vert =0\) for all \(w_{i}\in W\) we can conclude that \(\Vert \phi _{j}(w_k)\phi _{j}(w_l)\Vert \). \(\square \)
We now prove that the above is also a sufficient condition for \(W\) to be a local minimum of \(F\) when \(F\) is the Fréchet function for the measure \(\frac{1}{m}\sum _{i}\delta _{X_{i}}\) withe the diagrams \(X_{i}\) each with finitely many offdiagonal points. This requires a result about a local extension of optimal pairings.
Proposition 3.2
Let \(X\) and \(Y\) be diagrams, each with only finitely many off diagonal points, such that there is a unique optimal pairing \(\phi _{X}^{Y}\) between them and no off diagonal point in \(X\) matches the diagonal in \(Y\). We further stipulate that if \(y_k\) and \(y_l\) are offdiagonal points with \(\Vert y_ky_l\Vert =0\) then \(\Vert (\phi _{X}^{Y})^{1}(y_k)(\phi _{X}^{Y})^{1}(y_l)\Vert =0\). There is some \(r>0\) such that for every \(Z \in B(Y,r)\) there is a unique optimal pairing between \(X\) and \(Z\) and this optimal pairing is induced from the one from \(X\) to \(Y\). By this we mean there is a unique optimal pairing \(\phi _{Y}^Z\) from \(Y\) to \(Z\) and that the unique optimal pairing from \(X\) to \(Z\) is \(\phi _{Y}^Z \circ \phi _{X}^{Y}\).
Furthermore, if \(X_{1}, X_{2}, \ldots , X_{m}\) and \(Y\) are diagrams with finitely many offdiagonal points such that there is a unique optimal pairing \(\phi _{X_{i}}^{Y}\) between \(X_{i}\) and \(Y\) for each \(i\) with the same conditions as above, then there is some \(r>0\) such that for every \(Z \in B(Y,r)\) there is a unique optimal pairing between each \(X_{i}\) and \(Z\) and this optimal pairing is induced by the one from \(X_{i}\) to \(Y\).
Proof
Since \(Y\) has only finitely many offdiagonal points there is some \(\varepsilon >0\) such that for every diagram \(Z\) with \(d(Y,Z)<\varepsilon \) there is a unique geodesic from \(Y\) to \(Z\).
For each bijection \(\phi \) of points in \(X\) to points in \(Y\), define the function \(g_\phi \) between \(X\) and points in \(B(Y,\varepsilon )\) by setting
where \(\phi _{Y}^Z\) is the optimal pairing that comes from the unique geodesic from \(Y\) to \(Z\). First note that \(g_\phi (X,Z) \le \sum _{x\in X} \Vert x  \phi _{Y}^Z(\phi (x))\Vert ^{2} + d(Y,Z)^{2}\). Since there are only finitely many points in \(X\) and \(Y\) there is a bound \(M\) on \( \Vert x  \phi (x)\Vert + \varepsilon \). \(M\) is a bound on \(\Vert x  \phi _{Y}^Z(\phi (x))\Vert \) for all \(x\) and all \(\phi \). We also know \(\Vert \phi _{Y}^Z(\phi (x))\phi (x)\Vert \le d(Y,Z)\) for all \(x \in X\). Let \(K\) be the number of offdiagonal points in diagrams \(X\) and \(Y\) combined.
Similarly
Let \(\phi _{X}^{Y}\) be the optimal pairing from \(X\) to \(Y\) which is assumed to be unique in the statement of the proposition. Let \(\hat{\phi }\) be another bijection of points in \(X\) to points in \(Y\). Since there are only finitely many offdiagonal points in \(X\) and \(Y\) there are only finitely many possible \(\hat{\phi }\). Set
which must be positive as \(\phi _{X}^{Y}\) is uniquely optimal by assumption.
Choose \(r>0\) such that \(4r^{2}+4MKr<\beta \). Now suppose that \(g_\phi (Z,X)\le g_{\phi _{X}^{Y}}(Z,X)\) for some \(Z \in B(Y,r)\). This will imply that
which contradicts our choice of \(\beta \).
Now suppose \(X_{1}, X_2, \ldots , X_{m}\) and \(Y\) are diagrams with finitely many off diagonal points such that there is a unique optimal pairing \(\phi _{X_{i}}^{Y}\) between \(X_{i}\) and \(Y\) for each \(i\). By the above argument there are some \(r_{1}, r_2,\ldots ,r_{m}>0\) such that for each \(i\) and for every \(Z \in B(Y,r_{i})\) there is a unique optimal pairing between each \(X_{i}\) and \(Z\) and this optimal pairing is induced by the one from \(X_{i}\) to \(Y\). Take \(r=\min \{r_{i}\}\) which is positive. \(\square \)
The following theorem states that Algorithm 1 will find a local minimum on termination.
Theorem 3.3
Given diagrams \(\{X_{1},\ldots ,X_{m}\}\) and the corresponding Fréchet function \(F\), then \(W = \{w_{i}\}\) is a local minimum of \(F\) if and only if there is a unique optimal pairing from \(W\) to each of the \(X_{j}\) denoted as \(\phi _{j}\) and each \(w_{i}\) is the arithmetic mean of the points \(\{\phi _{j}(w_{i})\}_{j=1,2,\ldots ,m}\).
Proof
In Lemma 3.1 we showed that it it is a necessary condition.
Given \(m\) points in the plane or copies of the diagonal, \(\{x_{1}, x_{2}, \ldots , x_{m}\}\), the choice of \(y\) which minimizes \(\sum _{i=1}^{m} \Vert x_{i}y\Vert ^{2}\) is the arithmetic mean of \(\{x_{1}, \ldots , x_{m}\}\). As a result we know that \(F(Z)>F(W)\) for all \(Z\) with the same optimal pairings as \(W\) to \(X_{1}, X_{2}, \ldots , X_{m}\). Since there is some ball \(B(W,r)\) such that every \(Z\in B(W,r)\) has the same optimal pairings as \(W\), by Proposition 3.2, we know that \(F(Z)>F(W)\) for all \(Z\) in \(B(W,r)\). Thus we can conclude that \(W\) is a local minimum. \(\square \)
Law of Large Numbers for the Empirical Fréchet Mean
In this section we study the convergence of Fréchet means computed from sampling sets to the set of means of a measure. Consider a measure \(\rho \) on the space of persistence diagrams \(\mathcal {D}_{L^{2}}\). Given a set of persistence diagrams \(\{X_{i}\}_{i=1}^{n} \mathop {\sim }\limits ^{iid} \rho \) one can define an empirical measure \(\rho _{n}=\frac{1}{n}\sum _{k=1}^n \delta _{X_k}\). We will examine the relation between the two sets
where \(\mathbf {Y}\) and \(\mathbf {Y}_{n}\) are the Fréchet mean sets of the measures \(\rho \) and \(\rho _{n}\) respectively. We would like prove convergence of \(\mathbf {Y}_{n}\) to \(\mathbf {Y}\) asymptotically with \(n\)—a law of large numbers result.
There exist weak and strong laws of large numbers for general metric spaces (for example see [17, Theorem 3.4]). These results hold for global minima of the Fréchet and empirical Fréchet functions \(F\) and \(F_{n}\), respectively. It is not clear to us how to adapt these results to the case of Algorithm 1 where we can only ensure convergence to a local minimum. It is also not clear how we can adapt these theorems to get rates of convergence of the sample Fréchet mean set to the population quantity.
In this section we provide a law of large number result for the restricted case where \(\rho \) is a combination of Dirac masses
where \(Z_{i}\) are diagrams with only finitely many off diagonal points and we allow for multiplicity in these points. The proof is constructive and we provide rates of convergence.
The main results of this section, Theorem 4.1 and Lemma 4.2, provide a probabilistic justification for Algorithm 1. Theorem 4.1 states that with high probability local minima of the empirical Fréchet function \(F_{n}\) will be close to local minima of the Fréchet function \(F\). Ideally we would like the above convergence to hold for global minima, the Fréchet mean set. The condition of Lemma 4.2 states that the number of local minima of \(F_{n}\) is finite and not a function of \(n\). This suggests that applying Algorithm 1 to a random set of start conditions can be used to explore the finite set of local minima.
Theorem 4.1
Set \(\rho =\frac{1}{m}\sum _{i=1}^{m} \delta _{Z_{i}}\) where \(Z_{i}\) are diagrams with finitely many off diagonal points with multiplicity allowed. Let \(F\) be the Fréchet function corresponding to \(\rho \) and \(Y\) be a local minimum of \(F\). Set \(\{X_{i}\}_{i=1}^n \mathop {\sim }\limits ^{iid} \rho \), and denote the corresponding empirical measure \(\rho _{n}=\frac{1}{n}\sum _{k=1}^n \delta _{X_k}\) and Fréchet mean function \(F_{n}\). There exists a local minimum \(Y_{n}\) of \(F_{n}\) such that with probability greater than \(1\delta \)
for \(n \ge 8m \ln \frac{m}{\delta }\) and \(\frac{m^{2} F(Y)}{n} \ln \left( \frac{m}{\delta }\right) < r^{2}\) where \(r\) characterizes the separation between the local minima of \(F\).
Proof
The empirical distribution is
where \(\xi _{i}\) is the random variable that states the multiplicity of each \(Z_{i}\) appearing in the empirical measure, \(\{k:X_k=Z_{i}\}\). Observe that \(\xi _{1}, \xi _2, \ldots , \xi _{m}\) can be stated as a multinomial distribution with parameters \(n\) and \(p=\left( \frac{1}{m}, \frac{1}{m}, \ldots , \frac{1}{m}\right) \).
We will bound the probability that \(\xi _{i}\frac{n}{m}>\varepsilon \frac{n}{m}\) for any \(i=1,2,\ldots ,m\). We then will show that under the assumption that \(\xi _{i}\frac{n}{m}\le \varepsilon \frac{n}{m}\) for all \(i=1,2,\ldots ,m\) for sufficiently small \(\varepsilon >0\) there is a local minimal \(Y_{n}\) with \(d(Y,Y_{n})^{2}<\frac{\varepsilon ^{2} m F(Y)}{(1\varepsilon )^{2}}\).
For each \(i\), \(\xi _{i} \sim \text{ Bin }(n,\frac{1}{m})\) and \(n\xi _{i} \sim \text{ Bin }(n, 1\frac{1}{m})\). Using Hoeffding’s inequality we obtain \(\Pr \left[ \xi _{i}\frac{n}{m}\le \varepsilon \frac{n}{m}\right] \le \frac{1}{2}\exp (2\frac{\varepsilon ^{2} n}{m})\) and
Together they show that \(\Pr \left[ \xi _{i}\frac{n}{m}\ge \varepsilon \frac{n}{m}\right] \le \exp (2\frac{\varepsilon ^{2} n}{m})\) implying the bound
From now on we will assume that \(\xi _{i}\frac{n}{m}<\varepsilon \frac{n}{m}\) for all \(i=1, 2, \ldots , m\). Let us consider our algorithm for finding a local minimal of \(F_{n}\) starting at the point \(Y\). We first define some notation. We denote the points in \(Y\) by \(\{y_{j}\}_{j=1}^J\). We denote by \(z_{i}^{j} := \phi _{Y}^{Z_{i}}(y_{j})\) the point in \(Z_{i}\) that \(y_{j}\) is paired to in the (unique) optimal bijection between \(Y\) and \(Z_{i}\). Recall that the \(z_{i}^{j}\) could be the diagonal but from our assumption that \(Y\) is a local minimum no off diagonal point in any \(Z_{i}\) is paired with the diagonal in \(Y\).
Let \((a_{i}^{j}, b_{i}^{j})\) be the coefficients of the vector from \(y_{j}\) to \(z_{i}^{j}\) in the basis of \(\mathbb { R}^{2}\) given by \((\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}})\) and \((\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}})\). This basis has the advantage that when \(z_{i}^{j}\) is the diagonal then \(a_{i}^{j}=0\) and \(b_{i}^{j}=d(y_{j},\Delta )\). From our assumption that \(Y\) is a local minimum we know that \(\sum _{i=l}^{m} a_{i}^{j}=0\) and \(\sum _{i=l}^{m} b_{i}^{j}=0\) for all \(j\) and
For the moment fix \(j\). Without loss of generality reorder the \(Z_{i}\) so that the first \(k\) (with \(1\le k \le m\)) of the \(z_{i}^{j}\) are off the diagonal and the remained are copies of the diagonal. Let \(y_{j}^n\) be the point in \(\mathbb { R}^{2}\) given by
By construction this \(y_{j}^n\) is the weighted arithmetic mean of the \(z_{i}^{j}\) where we have weighted by the \(\xi _{i}\) taking into account that when \(i>k\) then \(z_{i}^{j}\) is the diagonal.
Under our assumption that \(\xi _{i}\frac{n}{m}<\varepsilon \frac{n}{m}\) for all \(i=1, 2, \ldots , m\) and using \(\sum _{i=1}^k a_{i}^{j}=0=\sum _{i=1}^{m} b_{i}^{j}\) we know that
Set \(Y_{n}\) to be the diagram with offdiagonal points \(\{y_{j}^n\}_{j=1}^J\). Using the pairing between \(Y\) and \(Y_{n}\) where we pair \(y_{j}\) with \(y_{j}^n\) we conclude that
Set \(\delta = m\exp \big (2\frac{\varepsilon ^{2} n}{m}\big )\) and solve for \(\varepsilon \). This provides the bound that with probability greater than \(1\delta \)
For \(\varepsilon \in [0,.25]\) it holds that \((1\varepsilon )^{2} <2\) and \(n \ge 8m \ln \frac{m}{\delta }\) implies \(\varepsilon < .25\).
We want to show that \(Y_{n}\) is a local minimum for sufficiently small \(\varepsilon \). Indeed it will be the output of Algorithm 1 given the initializing diagram of \(Y\). Since \(Y\) is a local minimum, Proposition 3.2 implies that there is a ball around \(Y\), \(B(Y,r)\), such that for every diagram in \(B(Y,r)\) there is a unique optimal pairing with each \(Z_{i}\) which corresponds to the unique optimal pairing between \(Y\) and \(Z_{i}\). That is \(\phi _{X}^{Z_{i}} = \phi _{X}^{Y}\circ \phi _{Y}^{Z_{i}}\) for all \(X\in B(Y,r)\). For \(\varepsilon >0\) such that \(\frac{\varepsilon ^{2} m F(Y)}{(1\varepsilon )^{2}}<r^{2}\) we have \(Y_{n}\in B(Y,r)\). Plugging in for \(\varepsilon \) results in \(\frac{m^{2} F(Y)}{n} \ln \left( \frac{m}{\delta }\right) < r^{2}\).
This implies that \(\phi _{Y_{n}}^{Z_{i}} = \phi _{Y_{n}}^{Y}\circ \phi _{Y}^{Z_{i}}\) is the unique optimal pairing between \(Y_{n}\) and \(Z_{i}\) for all \(i\) and hence \(\phi _{Y_{n}}^{X_k} = \phi _{Y_{n}}^{Y}\circ \phi _{Y}^{X_k}\) for each of the sample diagrams \(X_k\). If \(X_k=Z_{i}\) then
By construction \(y_{j}^n\) is the weighted arithmetic mean of the \(z_{i}^{j}\) (weighted by the \(\xi _{i}\)), and hence \(y_{j}^n\) is the arithmetic mean of the \(x_k^{j}\). By Theorem 3.3 \(Y_{n}\) is local minimum. \(\square \)
The above theorem provides a (weak) law of large numbers results for the local minima computed from \(n\) persistence diagrams but it does not ensure that the number of local minima is bounded as \(n\) goes to infinity. The utility of such a convergence result would be limited if the number of local minima could not be bounded. The following lemma states that the number of local minima is bounded.
Lemma 4.2
Let \(\rho =\frac{1}{m}\sum _{i=1}^{m} \delta _{Z_{i}}\) as before. Let \(\rho _{n}=\frac{1}{n}\sum _{k=1}^n \delta _{X_k}\) be the empirical measure of \(n\) points drawn iid from \(\rho \) and \(F_{n}\) is the corresponding Fréchet function. The number of local minima of \(F_{n}\) is bounded by \(\prod _{i=1}^{m}(k_{i}+1)^{(k_{1}+k_2+\ldots k_{m})}\). Here \(k_{i}\) is the number of offdiagonal points in the \(i\)th diagram. This bound is independent of \(n\).
Proof
Set \(Y_{n}\) as a local minimum of \(F_{n}\). This implies there are unique optimal pairings \(\phi _{i}\) between \(Y_{n}\) and \(X_{i}\) for each \(i\) and that any point \(y\) in \(Y_{n}\) is the arithmetic mean of \(\{\phi _{i}(y)\}\). Since the optimal pairing is unique, if \(X_{i}=X_{j}\) then \(\phi _{i}=\phi _{j}\). This in turn means that the \(\phi _{i}\) are determined by which of \(Z_{i}\) are in the set \(X_{j}\) (with multiplicity). This implies that the number of local minima is bounded by the number of different partitions into subsets of the points in the \(\cup X_{j}\) so that each subset has exactly one point from each of the \(X_{j}\). The number of subsets is bounded by \(k_{1}+k_2+\cdots +k_{m}\) and for each subset there is a bound of \(\prod _{i=1}^{m}(k_{i}+1)\) on the choices of which element to take from each of the \(X_{i}\). Thus the number of different partitions is bounded by \(\prod _{i=1}^{m}(k_{i}+1)^{(k_{1}+k_2+\cdots +k_{m})}\). \(\square \)
We would like to discuss not only the convergence of local minima but also the convergence of the Fréchet means. We can do this in the case when there is a unique Fréchet mean.
Lemma 4.3
Let \(\rho =\frac{1}{m}\sum _{i=1}^{m} \delta _{Z_{i}}\) as before. Suppose further that the corresponding Fréchet function \(F\) has a unique minimum. Let \(\rho _{n}=\frac{1}{n}\sum _{k=1}^n \delta _{X_k}\) be the empirical measure of \(n\) points drawn iid from \(\rho \) and \(F_{n}\) is the corresponding Fréchet function. Let \(\mathbf {Y}\) be the Fréchet mean of \(F\) and \(\mathbf {Y}_{n}\) the set of Fréchet means of \(F_{n}\). With probability \(1\) the Hausdorff distance between \(\mathbf {Y}_{n}\) and \(\mathbf {Y}\) goes to zero as \(n\) goes to infinity.
Proof
It is sufficient for us to show for each \(r>0\) that with probability \(1\) there is some \(N_{r}\) such that \(\mathbf {Y}_{n}\subset B( \mathbf {Y},r)\) for all \(n>N_{r}\).
Fix \(r>0\). Suppose there does not exist some \(N_{r}\) such that \(\mathbf {Y}_{n}\subset B( \mathbf {Y},r)\) for all \(n>N_{r}\). Then there is some sequence of \(W_{n_k}\in \mathbf {Y}_{n_k}\) such that \(d(W_{n_k}, \mathbf {Y})\ge r\). The set \(\{W_{n_k}\}\) is clearly bounded, offdiagonally birth–death bounded and uniform and hence precompact. This implies that \((W_{n_k})\) has a convergent subsequence \((W_{{n_k}_{j}})\). Let \(W\) denote the limit of this sequence. Since \(d(W_{{n_k}_{j}}, \mathbf {Y})\ge r\) for all \(j\) we have \(d(W, \mathbf {Y})\ge r\).
By the arguments in Proposition 2.6 there is some \(K\) independent of \(n\) such that \(F_{n}\) is \(K\)Lipschitz in \(B(W,1)\) and hence \(F_{{n_k}_{j}}(W_{{n_k}_{j}})  F_{{n_k}_{j}}(W)\le K d(W_{{n_k}_{j}},W)\) for large \(j\). Hence, for all \(\varepsilon >0\) we can say that \( F_{{n_k}_{j}}(W)\le F_{{n_k}_{j}}(W_{{n_k}_{j}}) + \varepsilon \) for sufficiently large \(j\).
The law of large numbers tells us that \(F_{n}(W)\rightarrow F(W)\) and \(F_{n}(\mathbf {Y})\rightarrow F(\mathbf {Y})\) as \(n \rightarrow \infty \) with probability \(1\). Hence for all \(\varepsilon >0\) we know that with probability \(1\) both \(F(W)\le F_{n}(W) +\varepsilon \) and \(F_{n}(\mathbf {Y})\le F(\mathbf {Y}) +\varepsilon \) for sufficiently large \(n\).
From our assumption that \(W_{{n_k}_{j}}\) is a Fréchet mean of \(F_{{n_k}_{j}}\) we know that \(F_{{n_k}_{j}}(W_{{n_k}_{j}})\le F_{{n_k}_{j}}(\mathbf {Y})\) for all \(j\).
Let \(\varepsilon >0\). Combining the inequalities above we conclude that with probability \(1\)
for \(j\) sufficiently large. Since \(\varepsilon >0\) was arbitrary we obtain \(F(W)\le F(\mathbf {Y})\) which contradicts the uniqueness assumption about the Fréchet mean. \(\square \)
Persistence Diagrams of Random Gaussian Fields
We illustrate the utility of our algorithm in computing means and variances of persistence diagrams in this section via simulation. The idea will be to show that persistence diagrams generated from a random Gaussian field will concentrate around the diagonal with the mean diagram moving closer to the diagonal as the number of diagrams averaged increases.
The persistence diagrams were computed from random Gaussian field over the unit square using the procedure outlined in Sect. 3 in [1]. The field generated is a stationary, isotropic, and infinitely differentiable random field. The Gaussian was set to be mean zero and the covariance function was \(R(p) = \exp (\alpha \Vert p \Vert ^{2})\) where \(\alpha = 100\). A few hundred levels in the range of the realization of the field were taken for each level a simplicial complex was constructed. This was done by taking a fine grid on the unit square and including any vertex, edge or square in the complex if and only if the values of the field at the vertex or set of vertices (for the edge and square cases) were higher than the level. The complex increases as the level decreases which provides the filtering and from which birth–death values of the diagram were computed. We obtained from Subag 10,000 such random persistence diagrams generated as described above. These diagrams contain points with infinite persistence, we ignore these points. Using extended persistence in computing the diagrams would address this issue.
In Fig. 1 we display the mean diagram of sets of \(2, 4, 8, 16, 32, 64, 128\) diagrams randomly drawn from the 10,000 diagrams. This is done for both dimensions zero and one. We wanted to see that as the number of diagrams being averaged increases the Fréchet means converged. To quantify this concentration we took ten draws of \(2, 4, 8, 16, 32, 64, 128\) diagrams from the 10,000 diagrams and considered the distribution \(\frac{1}{10}\sum _{i=1}^{10}\delta _{X_{i}}\) where \(X_{i}\) where the Fréchet means of each of the sets of samples. We then computed the variance of these distributions as documented in Table 1.
Discussion
In this paper we introduce an algorithm for computing estimates of Fréchet means of a set of persistence diagrams. We demonstrate local convergence of this algorithm and provide a law of large numbers for the Fréchet mean computed on this set when the underlying measure has the form \(\rho = m^{1} \sum _{i=1}^{m} \delta _{X_{i}}\), where \(X_{i}\) are persistence diagrams. We believe that generically there is a unique global minimum to the Fréchet function and hence a unique Fréchet mean but this needs to be shown.
The work in this paper is a first step and several obvious extensions are needed. A law of large numbers result when the underlying measure is not restricted to a combination of Dirac functions is obviously important. The results in our paper are strongly dependent on the \(L^{2}\)Wasserstein metric; generalizing these results to the Wasserstein metrics used in computational topology is of central interest. The proofs and problem formulation in this paper are very constructive—the proofs and algorithms are developed for the specific examples and constructions we propose and are not meant to generalize to other metrics or variants on the algorithm. It would be of great interest to provide a presentation of the core ideas in the algorithm and theory we developed in a more general framework using properties of abstract metric spaces and probability theory on these spaces.
Notes
 1.
If both \(x\) and \(\phi (x)\) are the diagonal then this is the diagonal. If exactly one of \(x\) or \(\phi (x)\) is the diagonal then we replace it in this sum by the closest point in the diagonal to \(\phi (x)\) or \(x\) respectively.
 2.
Terminology given by Gromov [9] that stands for Cartan, Alexandrov, and Toponogov.
References
 1.
Adler, R.J., Bobrowski, O., Borman, M.S., Subag, E., Weinberger, S.: Persistent homology for random fields and complexes. In: Berger, J.O., Tony Cai, T., Johnstone, I.M. (eds.) Borrowing Strength: Theory Powering Applications—A Festschrift for Lawrence D. Brown, vol. 6. Institute of Mathematical Statistics, Beachwood (2010)
 2.
Bendich, P., Mukherjee, S., Wang B.: Local homology transfer and stratification learning. In: ACMSIAM Symposium on Discrete Algorithms (2012)
 3.
Birdson, M.R., Haefliger, A.: Metric Spaces of Nonpositive Curvature. SpringerVerlag, Berlin (1999)
 4.
Bubenik, P., Carlsson, G., Kim, P.T., Luo, Z.M.: Statistical topology via Morse theory, persistence, and nonparametric estimation. In: Viana, M.A.G., Wynn, H.P. (eds.) Algebraic Methods in Statistics and Probability II. Contemporary Mathematics, vol. 516, pp. 75–92. American Mathematical Society, Providence (2010)
 5.
Burago, Y., Gromov, M., Perel’man, G.: A.D. Alexandrov spaces with curvature bounded below. Russ. Math. Surv. 47(2), 1–58 (1992)
 6.
Chazal, F., CohenSteiner, D., Lieutier, A.: A sampling theory for compact sets in Euclidean space. Discrete Comput. Geom. 41, 461–479 (2009)
 7.
CohenSteiner, D., Edelsbrunner, H., Harer, J., Mileyko, Y.: Lipschitz functions have \({L}_p\)stable persistence. Found. Comput. Math. 10, 127–139 (2010). doi: 10.1007/s1020801090606
 8.
Edelsbrunner, H., Harer, J.: Computational Topology: An Introduction. American Mathematical Society, Providence (2010)
 9.
Gromov, M.: Hyperbolic groups. In: Gersten, S.M. (ed.) Essays in Group Theory. Mathematical Sciences Research Institute Publications, vol. 8, pp. 75–263. Springer, New York (1987)
 10.
Kahle, M.: Topology of random clique complexes. Discrete Math. 309(6), 1658–1671 (2009)
 11.
Kahle, M.: Random geometric complexes (2011). http://arxiv.org/abs/0910.1649
 12.
Kahle, M., Meckes, E.: Limit theorems for Betti numbers of random simplicial complexes (2010). http://arxiv.org/abs/1009.4130v3
 13.
Lott, J., Villani, C.: Ricci curvature for metricmeasure spaces via optimal transport. Ann. Math. 169, 903–991 (2009)
 14.
Lunagómez, S., Mukherjee, S., Wolpert, R.L.: Geometric representations of hypergraphs for prior specification and posterior sampling (2009). http://arxiv.org/abs/0912.3648
 15.
Lytchak, A.: Open map theorem for metric spaces. St. Petersbg. Math. J. 17(3), 477–491 (2006)
 16.
Mileyko, Y., Mukherjee, S., Harer, J.: Probability measures on the space of persistence diagrams. Inverse Probab. 27(12), 124007 (2012)
 17.
Molchanov, I.: Theory of Random Sets. Springer, London (2005)
 18.
Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)
 19.
Niyogi, P., Smale, S., Weinberger, S.: Finding the homology of submanifolds with high confidence from random samples. Discrete Comput. Geom. 39, 419–441 (2008)
 20.
Niyogi, P., Smale, S., Weinberger, S.: A topological view of unsupervised a topological view of unsupervised learning from noisy data. Manuscript (2008)
 21.
Ohta, S.: Barycenters in Alexandrov spaces with curvature bounded below. Adv. Geom. 12, 571–587 (2012)
 22.
Penrose, M.D.: Random Geometric Graphs. Oxford University Press, New York (2003)
 23.
Penrose, M.D., Yukich, J.E.: Central limit theorems for some graphs in computational geometry. Ann. Appl. Probab. 11(4), 1005–1041 (2001)
 24.
Petrunin, A.: Semiconcave functions in Alexandrov’s geometry. Surv. Differ. Geom. 11, 137–201 (2007)
 25.
Sturm, K.T.: Probability measures on metric spaces of nonpositive curvature. In: Auscher, P., Coulhon, T., Grigoryan, A. (eds.) Heat Kernels and Analysis on Manifolds, Graphs, and Metric Spaces, vol. 338. American Mathematical Society, Providence (2002)
Acknowledgments
SM and KT would like to acknowledge Shmuel Weinberger for discussions and insight. SM and KT would like to acknowledge E. Subag with help in obtaining persistence diagrams computed from random Gaussian fields and explaining the generative model. JH and YM are pleased to acknowledge the support from grants DTRA: HDTRA108BRCWMD, DARPA: D12AP00001On, AFOSR: FA95501010436, and NIH (Systems Biology): 5P50GM081883. SM is pleased to acknowledge support from grants NIH (Systems Biology): 5P50GM081883, AFOSR: FA95501010436, and NSF CCF1049290.
Author information
Affiliations
Corresponding author
Appendix
Appendix
In order to prove Proposition 2.3 we need to give some conditions for a subset of \(\mathcal {D}_{L^{2}}\) to be relatively compact. We will use Theorem 21 in [16] which requires a few definitions.
Definition 6.1
(Birth–death bounded) A set \(S \subset \mathcal {D}_{L^{2}}\) is called birth–death bounded, if there is a constant \(C>0\) such that for all \(Z \in S\) and for all \(\Delta \ne x\in Z\) \(\max \{\mathrm {b}(x), \mathrm {d}(x)\}\le C\), where \(\mathrm {b}(x)\) and \(\mathrm {d}(x)\) are the births and deaths respectively.
For \(\alpha >0 \) and diagram \(Z \in \mathcal {D}_{L^{2}}\) we define the maps
where \(u_\alpha (Z)\) is the \(\alpha \)upper part of \(Z\) (the points in \(Z\) with persistence at least \(\alpha \)) and \(l_\alpha (Z)\) is the \(\alpha \)lower part of \(Z\) (the points in \(Z\) with persistence less than \(\alpha \)).
Definition 6.2
(Offdiagonally birth–death bounded) A set \(S\subset \mathcal {D}_{L^{2}}\) is called offdiagonally birth–death bounded if for all \(\varepsilon > 0\), \(u_\varepsilon (S)\) is birth–death bounded.
Definition 6.3
(Uniform) A set \(S\subset \mathcal {D}_{L^{2}}\) is called uniform if for all \(\varepsilon >0\) there exists \(\alpha >0\) such that \(d(l_\alpha (Z), \Delta )\le \varepsilon \) for all \(Z \in S\).
Theorem 21 in [16] states that a subset of \(\mathcal {D}_{W_{p}}\) is relatively compact if and only if it is bounded, offdiagonally birth–death bounded and uniform. This also holds for \(\mathcal {D}_{L^{2}}\) due to the equivalence in norms stated in (2). We finally are ready to prove Proposition 2.3.
Proof of Proposition 2.3
Fix two diagrams \(X\) and \(Y\). Let \(\Phi \) be the set of bijections \(\phi \) between points in \(X\) and points in \(Y\) with the further condition that
for all \(x\in X\). Recall that by \(\Vert x \Delta \Vert \) we mean the perpendicular distance from \(x\) to the diagonal which can thought of as pairing \(x\) with the closest point to \(x\) on the diagonal. By the above condition we are requiring that we never pair an off diagonal point \(x\in X\) with an off diagonal point in \(Y\) when pairing both with the diagonal would be more efficient.
By considering only the bijections in \(\Phi \) we are only removing bijections \(\tilde{\phi }\) for which there exists some \(\phi \in \Phi \) such that \(\sum _{x\in X} \Vert x\phi (x)\Vert ^{2} <\sum _{x\in X}\Vert x\tilde{\phi }(x)\Vert ^{2}\). This means that (1) is equal to \(\inf \{\sum _{x\in X} \Vert x\phi (x)\Vert ^{2}:\phi \in \Phi \}\). We will show this infimum is a minimum.
For each bijection \(\phi \in \Phi \) we can construct a path \(\gamma _\phi :[0,1] \rightarrow \mathcal {D}_{L^{2}}\) by setting \(\gamma _\phi (t)\) to be the diagram with points \(\{(1t)x_{i}+t\phi (x_{i})x_{i}\in X\}\). Let \(S = \{\gamma _\phi (t) : t\in [0,1], \phi \in \Phi \}\) which contains all the images of the paths \(\gamma _\phi \). We want to show that \(S\) is relatively compact. To do this we will show that \(S\) is bounded, offdiagonally birth–death bounded and uniform which are sufficient conditions for relative compactness by Theorem 21 in [16].
Firstly observe that for any bijection \(\phi \) and any \(t\in [0,1]\) we know
which is finite and independent of \(\phi \) and \(t\). This implies that the set \(S\) is bounded.
We now wish to show that \(S\) is offdiagonally bounded. For each \(\varepsilon >0\) there can only be finitely many points in \(X\) and \(Y\) whose distance from the diagonal is at least \(\varepsilon \). This implies that there is some \(\tilde{C}_\varepsilon \) such that all \(x\in u_\varepsilon (X)\) and \(x\in u_\varepsilon (Y)\) satisfy \(\max \{\mathrm {b}(x), \mathrm {d}(x)\}<\tilde{C}_\varepsilon \). Let \(M:=\max \{d(x, \Delta ): x\in X \text { or } x\in Y\}\). We will show that if \(p \in u_\varepsilon (Z)\) for some \(Z\in S\) then \(\max \{ \mathrm {b}(p), \mathrm {d}(p)\}<\tilde{C}_\varepsilon + \sqrt{2}M\).
Consider \(p\in Z\) for some \(Z\in S\). This means \(p\in \gamma _\phi (t)\) with \(\phi \in \Phi \) and \(t\in [0,1]\) and hence \(p=(1t)x+t\phi (x)\) for some \(x\in X\). We have
In order for \(d(p,\Delta )\ge \varepsilon \) either \(d(x,\Delta )\ge \varepsilon \) or \(d(\phi (x),\Delta )\ge \varepsilon \) and hence \(\min \{\mathrm {b}(x), \mathrm {b}(\phi (x)\}<\tilde{C}_\varepsilon \) and \(\min \{\mathrm {d}(x), \mathrm {d}(\phi (x)\}<\tilde{C}_\varepsilon \).
The condition for \(\phi \) to be in \(\Phi \) is that \(\Vert x  \Delta \Vert ^{2} + \Vert \phi (x) \Delta \Vert ^{2} \ge \Vert x\phi (x)\Vert ^{2}\) and hence \(\Vert x \phi (x)\Vert \le \sqrt{2}M\). Since \(\mathrm {b}(x)\mathrm {b}(\phi (x)) \le \Vert x\phi (x)\Vert \) we can conclude that
Similarly we get \(\max \{\mathrm {d}(x), \mathrm {d}(\phi (x)\} <\tilde{C}_\varepsilon + \sqrt{2} M.\)
We now will show that \(S\) is uniform. Recall that \(S\) is uniform if for all \(\varepsilon >0\) there exists an \(\alpha >0\) such that \(d(l_\alpha (Z),\Delta )<\varepsilon \) for all \(Z\in S\). For any diagram \(Z\in \mathcal {D}_{L^{2}}\) denote \(M_k(Z)\) as the number of points in \(Z\) whose distance to the diagonal is in \([2^{k}, 2^{k+1} )\) for \(k\ge 1\) and let \(M_0(Z)\) be the number points with distance in \([1, \infty )\). Let \(N_k(Z)\) denote the number of points in \(Z\) whose distance from the diagonal is at least \(2^{k}\) (in other words the number of off diagonal points in \(u_{2^{k}}(Z)\)).
Let \(X\cup Y\) be the diagram whose off diagonal points are the union of the off diagonal points in \(X\) and \(Y\). Consider the following sum
Let \(\varepsilon >0\). Since \(\sum _{j=0}^\infty N_{j}(X\cup Y) 2^{2j}\) converges there is some \(L\) such that
Let \(\phi \in \Phi \) be a bijection between \(X\) and \(Y\). Consider the path \(\gamma :[0,1]\rightarrow \mathcal {D}_{L^{2}}\) where \(\gamma _\phi (t)\) is the diagram with points \(\{(1t)x+t\phi (x): x\in X\}\). For the point \((1t)x+t\phi (x)\) to lie a distance at least \(2^{k}\) from the diagonal at least one of \(x\) or \(\phi (x)\) must lie at least \(2^{k}\) from the diagonal. This implies that \(N_k(\gamma _\phi (t)) \le N_k(X\cup Y)\) for all bijections \(\phi \) and \(t\in [0,1]\). In other words \(N_k(Z)\le N_k(X\cup Y)\) for all \(Z\in S\).
Now for any \(Z\in S\) we have
Since the choice of \(\alpha =2^{L}\) was made independently of \(Z\in S\) we conclude that \(S\) is uniform.
We now know that \(\overline{S}\) (the closure of \(S\)) is compact. Every path \(t\mapsto \gamma _\phi (t)\) is a \(K_\phi \)Lipschitz map from \([0,1]\) into \(\overline{S}\) with \(K_\phi ^{2} = \sum _{x\in X}\Vert x\phi (x)\Vert ^{2}\).
Set \(K= d(X,Y) +1\) and let \(A\) be the set of \(K\)Lipschitz maps from \([0,1]\) into \(\overline{S}\). Since \(\overline{S}\) is compact, we know by the Arzela–Ascoli theorem that \(A\) is compact. By the definition of the infimum, there exists a sequence of bijections \(\{\phi _{j}\}\) such that \(K_{\phi _{j}}<K\) for all \(j\) and \(K_{\phi _{j}}\) is a sequence converging to \(K\). The corresponding sequence of paths \(\{\gamma _{j}:=\gamma _{\phi _{j}}\}\) is a sequence of \(K\)Lipschitz maps from \([0,1]\) to \(\overline{S}\) and hence lie in the compact set \(A\). This means there must be a convergent subsequence of paths \(\{\gamma _{n_{j}}\}\) with some limit \(\gamma \) which exists and lies in \(A\) as \(A\) is compact.
Since \(\gamma _{n_{j}}(0)=X\) and \(\gamma _{n_{j}}(1)=Y\) for all \(j\) (as they are all paths from \(X\) to \(Y\)) we know that \(\gamma (0)=X\) and \(\gamma (1)=Y\). From \(d(\gamma _{n_{j}}(t),\gamma _{n_{j}}(s)) \le K_{\phi _{n_{j}}} st\) for all \(s,t\in [0,1]\) and all \(j\) and the limit \(K_{\phi _{n_{j}}} \rightarrow K\) as \(j\rightarrow \infty \) we can infer
for all \(s,t\in [0,1]\). If we follow along the path \(\gamma \) where each point \(x \in X\) goes to in \(Y\) we can construct a bijection \(\phi \) from points in \(X\) to points in \(Y\). This bijection achieves the infimum in (1).
Rights and permissions
About this article
Cite this article
Turner, K., Mileyko, Y., Mukherjee, S. et al. Fréchet Means for Distributions of Persistence Diagrams. Discrete Comput Geom 52, 44–70 (2014). https://doi.org/10.1007/s0045401496047
Received:
Revised:
Accepted:
Published:
Issue Date:
Keywords
 Persistence diagram
 Fréchet mean
 Topological data analysis
 Alexandrov space
 Persistent homology