# Fréchet Means for Distributions of Persistence Diagrams

- 1.2k Downloads
- 27 Citations

## Abstract

Given a distribution \(\rho \) on persistence diagrams and observations \(X_{1},\ldots ,X_{n} \mathop {\sim }\limits ^{iid} \rho \) we introduce an algorithm in this paper that estimates a Fréchet mean from the set of diagrams \(X_{1},\ldots ,X_{n}\). If the underlying measure \(\rho \) is a combination of Dirac masses \(\rho = \frac{1}{m} \sum _{i=1}^{m} \delta _{Z_{i}}\) then we prove the algorithm converges to a local minimum and a law of large numbers result for a Fréchet mean computed by the algorithm given observations drawn iid from \(\rho \). We illustrate the convergence of an empirical mean computed by the algorithm to a population mean by simulations from Gaussian random fields.

## Keywords

Persistence diagram Fréchet mean Topological data analysis Alexandrov space Persistent homology## 1 Introduction

There has been a recent effort in topological data analysis (TDA) to incorporate ideas from stochastic modeling. Much of this work involved the study of random abstract simplicial complexes generated from stochastic processes [10, 11, 12, 14, 22, 23] and non-asymptotic bounds on the convergence or consistency of topological summaries as the number of points increase [2, 4, 6, 19, 20]. The central idea in these papers has been to study statistical properties of topological summaries of point cloud data.

In [16] it was shown that a commonly used topological summary, the persistence diagram [8], admits a well defined notion of probability distributions and notions such as expectations, variances, percentiles and conditional probabilities. The key contribution of this paper is characterizing Fréchet means and variances of finitely many persistence diagrams and providing an algorithm for estimating them. Existence of these means and variances was previously shown. However, a procedure to compute means and variances was not provided.

In this paper we state an algorithm which when given an observed set of persistence diagrams \(X_{1},\ldots ,X_{n}\) computes a new diagram which is a local minimum of the Fréchet function of the empirical measure corresponding to the empirical distribution \(\rho _{n} := n^{-1} \sum _{i=1}^{n} \delta _{X_{i}}\). In the case where the diagrams are sampled independently and identically from a probability measure that is a finite combination of Dirac masses we provide a (weak) law of large numbers for the local minima computed by the algorithm we propose.

## 2 Persistence Diagrams and Alexandrov Spaces with Curvature Bounded from Below

In this section we state properties of the space of persistence diagrams that we will use in the subsequent sections. We first define persistence diagrams and the \(L^{2}\)-Wasserstein metric on the set of persistence diagrams. Note that this is not the same metric as was used in [16]. We discuss the relation between the two metrics and why we work with the \(L^{2}\)-Wasserstein metric later in this section. We then show that the space of persistence diagrams is a geodesic space and specifically an Alexandrov space with curvature bounded from below. We show that the Fréchet function in this space is semiconcave which allows us to define supporting vectors which will serve as an analog of the gradient. The supporting vectors will be used in the algorithm developed in the following section to find local minima—the algorithm is a gradient descent based method.

### 2.1 Persistent Homology and Persistence Diagrams

If a homology class \(\alpha \) is born at \(\mathbb { X}_{a}\) and dies entering \(\mathbb { X}_{b}\) we set \(\mathrm {b}(\alpha ) = a\) and \(\mathrm {d}(\alpha ) = b\) and represent the births and deaths of \(\ell \)-dimensional homology classes by a multiset of points in \(\mathbb { R}^{2}\) with the horizontal axis corresponding to the birth of a class, the vertical axis corresponding to the death of a class, and the multiplicity of a point being the degree of the death value. The idea of a persistence diagram is to consider a basis of persistent homology classes \(\{\alpha \}\) and to represent each persistent homology class \(\alpha \) by a point \((b(\alpha ), d(\alpha ))\).

The persistence of \(\alpha \) is the difference \(\text{ pers }(\alpha ) = \mathrm {d}(\alpha ) - \mathrm {b}(\alpha )\). In the general setting we could have points with infinite persistence which corresponds to points of the form \((-\infty , y)\) or \((x,\infty )\). These points are infinitely far from all points on finite persistence and hence would have to be treated separately. The space of persistence diagrams would be forced to be disconnected with each component corresponding to the number of points at infinity. For the sake of clarity we will restrict ourselves to the case where all classes have finite persistence. This can be achieved by considering extended persistence but for simplicity we can simply kill everything by setting \(\mathbf {g}_\ell ^{a,b} = 0\) if \(b \ge \sup _{x \in \mathbb { X}} f(x)\).

After establishing some notation we can define persistence diagrams and the distance between two diagrams. Let \(\Delta =\{(x,y)\in \mathbb { R}^{2} \mid x=y\}\) be the diagonal in \(\mathbb { R}^{2}\). Let \(\Vert x-y \Vert \) be the usual Euclidean distance if \(x\) and \(y\) are off diagonal points. With a slight abuse of notation let \(\Vert x-\Delta \Vert \) denote the perpendicular distance between \(x\) and the diagonal and \(\Vert \Delta -\Delta \Vert =0\).

### **Definition 2.1**

A persistence diagram is a countable multiset of points in \(\mathbb { R}^{2}\) along with the infinitely many copies of the diagonal \(\Delta =\{(x,y)\in \mathbb { R}^{2} \mid x=y\}\). We also require for the countably many points \(x_{j}\in \mathbb { R}^{2}\) not lying on the diagonal that \(\sum _{j} \Vert x_{j}-\Delta \Vert <\infty \).

Each point \(p=(a,b)\) in a persistence diagram corresponds to some homology class \(\alpha \) with \(\mathrm {b}(\alpha )=a\) and \(\mathrm {d}(\alpha )=b\). As a slight abuse of notation we say that \(p\) is born at \(\mathrm {b}(p):=\mathrm {b}(\alpha )\) and dies at \(\mathrm {d}(p):=\mathrm {d}(\alpha )\).

*optimal*if it achieves this infimum.

### **Theorem 2.2**

For ease of notation in the rest of the paper we denote \(d_{L^{2}}(X,Y)^{2}\) as \(d(X,Y)^{2}\).

### **Proposition 2.3**

For any diagrams \(X, Y\in \mathcal {D}_{L^{2}}\) the infimum in (1) is always achieved.

We prove this proposition in the Appendix.

^{1}\(\gamma \) is a geodesic from \(X\) to \(Y\). The proof of this is the observation that \(\phi _t^{X}:X\rightarrow \gamma (t)\) where

### 2.2 Gradients and Supporting Vectors on \(\mathcal {D}_{L^{2}}\)

We will propose a gradient descent based algorithm to compute Fréchet means. To analyze and understand the algorithm we will need to understand the structure of \(\mathcal {D}_{L^{2}}\). We will show that \(\mathcal {D}_{L^{2}}\) is an Alexandrov space with curvature bounded from below (see [5] for more information on these spaces). This result is not so surprising since there are known relations between \(L^{2}\)-Wasserstein spaces and Alexandrov spaces with curvature bounded from below [13, 21]. The motivating idea behind these spaces was to generalize the results of Riemannian geometry to metric spaces without Riemannian structure.

The property and behavior of Fréchet means is closely related to the curvature of the space. For metric spaces with curvature bounded from above, called \(CAT\)-spaces,^{2} properties of Fréchet means have been investigated and there exist algorithms to compute Fréchet means [25]. \(\mathcal {D}_{L^{2}}\) is not a \(CAT\)-space, see Proposition 2.4. \(\mathcal {D}_{L^{2}}\) is however an Alexandrov space with curvature bounded from below. Less is known about properties of Fréchet means in these spaces as well as algorithms to compute Fréchet means. We use the structure of Alexandrov spaces with curvature bounded from below to compute estimates of Fréchet means and provide some analysis of these estimates. Note that Fréchet means are the same as barycenters which is what is referred to in much of the literature.

We first confirm that \(\mathcal {D}_{L^{2}}\) is not a \(CAT\)-space.

### **Proposition 2.4**

\(\mathcal {D}_{L^{2}}\) is not in \(\text{ CAT }(k)\) for any \(k>0\).

### *Proof*

If \(\mathcal {D}_{L^{2}} \in \text{ CAT }(k)\) then for all \(X,Y\in \mathcal {D}_{L^{2}}\) with \(d(X,Y)^{2}<\pi ^{2}/k\) there is a unique geodesic between them [3, Proposition 2.11]. However, we can find \(X,Y\) arbitrarily close with two distinct geodesics. One example is taking \(X\) to be a diagram with two diagonally opposite corners of a square and \(Y\) a diagram with the other two corners. The horizontal and vertical paths are equally optimal and we may choose the square to be as small as we wish. \(\square \)

### **Theorem 2.5**

The space of persistence diagrams \(\mathcal {D}_{L^{2}}\) with metric \(d\) given in (1) is a non-negatively curved Alexandrov space.

### *Proof*

First observe that \(\mathcal {D}_{L^{2}}\) is a geodesic space. Let \(\gamma :[0,1] \rightarrow \mathcal {D}_{L^{2}}\) be a geodesic from \(X\) to \(Y\) and let \(Z \in \mathcal {D}_{L^{2}}\) be any diagram. We want to show that the inequality (4) holds.

Let \(\phi \) be an optimal bijection between \(X\) and \(Y\) which induces the geodesic \(\gamma \). That is \(\gamma (t)=\{(1-t)x+t\phi (x)\,|\,x\in X\}\) and defined \(\phi _{t(x)} =tx+(1-t)\phi (x)\) as done in (3). Let \(\phi _{Z}^{t}: Z \rightarrow \gamma (t)\) be optimal. Construct bijections \(\phi _{Z}^{X}:Z\rightarrow X\) and \(\phi _{Z}^{Y}: Z\rightarrow Y\) by \(\phi _{Z}^{X}= (\phi _t)^{-1}\circ \phi _{Z}^{t}\) and \(\phi _{Z}^{Y}=\phi \circ \phi _{Z}^{X}\). There is no reason to suppose that either bijections \(\phi _{Z}^{X}\) or \(\phi _{Z}^{Y}\) are optimal. Note that if \(\phi _{Z}^{t}(z)=\Delta \) then \(\phi _{Z}^{X}(z)=\Delta \) and \(\phi _{Z}^{Y}(z)=\Delta \).

### 2.3 Properties of the Fréchet Function

*Fréchet function*to be

*Fréchet mean set*of \(\rho \) is the set of all the minimizers of the map \(F\) on \(\mathcal {D}_{L^{2}}\). If there is a unique minimizer then this is called the

*Fréchet mean*of \(\rho \). The

*variance*is then defined to be the infimum of the above functional.

*-concave*if for any unit speed geodesic \(\gamma \) in \(\Omega \), the function

*semiconcave*if for any point \(x\in \Omega \) there is a neighborhood \(\Omega _x\) of \(x\) and \(\lambda \in \mathbb { R}\) such that the restriction \(f|_{\Omega _x}\) is \(\lambda \)-concave.

### **Proposition 2.6**

If the support of \(\rho \) is bounded \((\)as in has bounded diameter\()\) then the corresponding Fréchet function is semiconcave.

### *Proof*

We now define the additional structure on Alexandrov spaces with curvature bounded from below that we will need to define gradients and supporting vectors. This exposition is a summary of the content in [21, 24].

*tangent cone*\(T_{Y}\) is the Euclidean cone over \(\Sigma _{Y}\):

### **Definition 2.7**

*Gradients and supporting vectors*) Given an open set \(\Omega \subset \mathcal {A}\) and a function \(f: \Omega \rightarrow \mathbb { R}\) we denote by \(\nabla _{p} f\) the

*gradient*of a function \(f\) at a point \(p \in \Omega \). \(\nabla _{p} f\) is the vector \(v \in T_{p}\) such that

- (i)
\(d_{p} f(x) \le \langle v, x\rangle \) for all \(x\in T_{p}\)

- (ii)
\(d_{p} f(v)=\langle v,v\rangle \).

*supporting vector*of \(f\) at \(p\) if \(d_{p} f(x) \le - \langle s, x\rangle \) for all \(x\in T_{p}\). Note that \(-\nabla _{p} f\) is a supporting vector if it exists in the tangent cone at \(p\).

### **Lemma 2.8**

- (i)
If \(s\) is a supporting vector then \(\Vert s\Vert \ge \Vert \nabla _{p} f\Vert \).

- (ii)
If \(p\) is local minimum of \(f\) and \(s\) is a supporting vector of \(f\) at \(p\) then \(s=0\).

### *Proof*

- (i)First observe that from the definitions of \(\nabla _{p} f\) and supporting vectors we haveWe also know that$$\begin{aligned} \langle \nabla _{p} f, \nabla _{p} f \rangle = d_{p} f(\nabla _{p} f)\le -\langle s, \nabla _{p} f\rangle . \end{aligned}$$These inequalities combined tell us that \(0\le -\langle \nabla _{p} f, \nabla _{p} f\rangle + \langle s, s\rangle .\)$$\begin{aligned} 0\le \langle \nabla _{p} f +s, \nabla _{p} f+s\rangle = \langle \nabla _{p} f, \nabla _{p} f\rangle + 2\langle \nabla _{p} f,s\rangle + \langle s, s\rangle . \end{aligned}$$
- (ii)
If \(p\) is a local minimum of \(f\) then \(d_{p}f(x)\ge 0\) for all \(x\in T_{p}\). In particular \(d_{p}(s)\ge 0\). Since \(s\) is a supporting vector \(-\langle s, s \rangle \ge d_{p} f(s) \ge 0\). This implies \(\langle s,s\rangle =0\) and hence \(s=0\).\(\square \)

We care about gradients and supporting vectors because they can help us find local minima of the Fréchet function. Indeed a necessary condition for \(F\) to have local minimum at \(Y\) is \(s=0\) for any supporting vector \(s\) of \(F\) at \(Y\). Since the tangent cone at \(Y\) is a convex subset of a Hilbert space we can take integrals over probability measures with values in \(T_{Y}\). This allows us to find a formula for a supporting vector of the Fréchet function \(F\).

### **Proposition 2.9**

- (i)
If \(\gamma \) is a distance achieving geodesic from \(Y\) to \(X\), then the tangent vector to \(\gamma \) at \(Y\) of length \(2d(X,Y)\) is a supporting vector at \(Y\) for \(F_{X}\).

- (ii)
If \(s_{X}\) is a supporting vector at \(Y\) for the function \(F_{X}\) for each \(X\in \text {supp}(\rho )\) then \(s=\int s_{X}d\rho (X)\) is a supporting vector at Y of the Fréchet function \(F\) corresponding to the distribution \(\rho \).

### *Proof*

In the following section we provide an algorithm that computes a local minimum of a Fréchet function using a gradient descent procedure. The above results will be used since computing a supporting vector of \(Z \mapsto d(X,Z)^{2}\) can be significantly easier and faster than computing a supporting vector of \(F\) itself

## 3 Finding Local Minima of the Fréchet Function

In this section we state an algorithm that computes a Fréchet mean of a finite set of persistence diagrams with finitely many off diagonal points, and examine convergence properties of this algorithm. We will restrict our attention to diagrams with only finitely many off-diagonal points with multiplicity of the points allowed.

We employ a greedy search algorithm based on gradient descent to find a local minimum. A key component of this greedy algorithm (see Algorithm 1) consists of a variant of the Kuhn–Munkres (Hungarian) algorithm [18].

The Hungarian algorithm finds the least cost assignment of tasks to people under the assumption that the number of tasks and people are the same. The input is the cost for each person to do each of the tasks. Suppose we have two diagrams \(X\) and \(Y\) each with only finitely many off diagonal points. Consider as many copies of the diagonal in \(X\) and \(Y\) to allow the option of matching every off diagonal point with the diagonal. We can think of the points and copies of the diagonal in \(X\) as the people and the points and copies of the diagonal in \(Y\) as tasks. The cost of \(x\in X\) doing task \(y\in Y\) is \(\Vert x-y\Vert ^{2}\). The total cost of an assignment (or in other words bijection) \(\phi \) of tasks to people is \(\sum _{x\in X} \Vert x-\phi (x)\Vert ^{2}\). The Hungarian algorithm gives us a bijection \(\phi \) that minimizes this cost. This means it gives an optimal pairing between \(X\) and \(Y.\)

Suppose \(Y\) is our current estimate for the Fréchet mean. Using the Hungarian algorithm we compute optimal pairings between \(Y\) and each of the \(X_{i}\). We denote these pairings as \(\{(y^{j}, x_{i}^{j})\}_{j=1}^{J_{i}}\) where \(J_{i}\) is the number of off diagonal in \(X_{i}\) and \(Y\) combined. For each \(y_{j}\ne \Delta \) we then consider all the \(x_{ij}\). Let \(\tilde{y^{j}}\) be the arithmetic mean of the \(x_{ij}\). Whenever in our pairings \(\{(y^{j}, x_{i}^{j})\}_{j=1}^{J_{i}}\) we see a \((\Delta , x_{i}^{j})\) we think this as a different copy of the diagonal as in any pairing between \(Y\) and \(X_k\) with \(k\ne i\). We would be using the arithmetic mean of \(m-1\) copies of the diagonal and \(x_{i}^{j}\). Let \(Y'\) be the diagram with points \(\tilde{y^{j}}\). We will show later that if \(Y=Y'\) then \(Y\) is a local minimum of the Fréchet function. Otherwise we chose \(Y'\) to be our current estimate.

- (a)
randomly initialize the mean diagram. For example we can start at one of the \(m\) persistence diagrams or the midway point of two of the \(m\) diagrams;

- (b)
use the Hungarian algorithm to compute optimal pairings between the estimate of the mean diagram and each of the persistence diagrams;

- (c)
update each point in the mean diagram estimate with the arithmetic mean over all diagrams—each point in the mean diagram is paired with a point (possibly on the diagonal) in each diagram;

- (d)
if the updated estimate locally minimizes \(F_{m}\) then return the estimate otherwise return to step (b).

An alternative to the above greedy approach would be a brute force search over point configurations to find a Fréchet mean. One way to do this is to list all possible pairings between points in each pair of diagrams. Then compute the arithmetic mean for all such pairings. One of these means will be a Fréchet mean. While this approach will find the complete mean set its combinatorial complexity is prohibitive.

### 3.1 Convergence of the Greedy Algorithm

The remainder of this section provides convergence properties for Algorithm 1. By convergence we mean that the algorithm will terminate at some point having found a local minimum. The reason for this is that at each iteration the cost function \(F_{m}\) decreases, at each iteration the algorithm uses a new set of pairings, and there are only finitely many combinations of pairings between points in the diagrams.

We first develop necessary and sufficient conditions for a diagram \(Y\) to be a local minimum of a set of persistence diagrams. We define \(F_{i} (Z):= d(Z,X_{i})^{2}\), the Fréchet function corresponding to \(\delta _{X_{i}}\). This allows us to define the Fréchet function as \(F= \frac{1}{m} \sum _{i=1}^{m} F_{i}\) corresponding to the the distribution \(\frac{1}{m}\sum _{i=1}^{m} \delta _{X_{i}}\).

The following lemma provides a necessary condition for a diagram to be a local minimum of \(F\). This condition is the stopping criterion in Algorithm 1.

### **Lemma 3.1**

If \(W = \{w_{i}\}\) is a local minimum of the Fréchet function \(F = \frac{1}{m} \sum _{j=1}^{m} F_{j}\) \(F\) then there is a unique optimal pairing from \(W\) to each of the \(X_{j}\) which we denote as \(\phi _{j}\) and each \(w_{i}\) is the arithmetic mean of the points \(\{\phi _{j}(w_{i})\}_{j=1, 2, \ldots , m}\). Furthermore if \(w_k\) and \(w_l\) are off-diagonal points such that \(\Vert w_k-w_l\Vert =0\) then \(\Vert \phi _{j}(w_k)-\phi _{j}(w_l)\Vert =0\) for each \(j\).

### *Proof*

Let \(\phi _{j}\) be some optimal pairings (not yet assumed to be unique) between \(Y\) and \(X_{j}\) and let \(s_{j}\) be the corresponding vectors in the tangent cone at \(Y\) that are tangent to the geodesics induced by \(\phi _{j}\) and are of length \(d(X_{j},Y)\). The \(2s_{j}\) are supporting vectors for the functions \(F_{j}(Y)= d(Y, X_{j})^{2}\) by Proposition 2.9, so we have \(\frac{2}{m}\sum _{j=1}^{m} s_{j}\) is a supporting vector of \(F\).

From Lemma 2.8 we know that \(\frac{2}{m}\sum _{j=1}^{m} s_{j}=0\). Since at each \(w_{i}\) the \(s_{j}\) gives the vector from \(w_{i}\) to \(\phi _{j}(w_{i})\), \(\sum _{j=1}^{m} s_{j}=0\) implies that \(w_{i}\) is the arithmetic mean of the points \(\{\phi _{j}(w_{i})\}_{j=1, 2,\ldots ,m}\).

Now suppose that \(\phi _k\) and \(\tilde{\phi _k}\) are both optimal pairings. By the above reasoning we have \(\frac{1}{m}(\tilde{s_k} + \sum _{j=1, j\ne k}^{m} s_{j})=0 =\frac{1}{m}\sum _{j=1}^{m} s_{j}\) and hence \(\tilde{s_k} = s_k\). This implies that \(\Vert \tilde{\phi _k}(w_{i}) -\phi _k(w_{i})\Vert =0\) for all \(w_{i}\in W\). In particular, for off-diagonal points \(w_k\) and \(w_l\) with \(\Vert w_k-w_l\Vert =0\) and \(\phi _k\) an optimal pairing, we can consider the pairing \(\tilde{\phi }_k\) with \(w_k\) and \(w_l\) swapped. Since \(\Vert \tilde{\phi _k}(w_{i}) -\phi _k(w_{i})\Vert =0\) for all \(w_{i}\in W\) we can conclude that \(\Vert \phi _{j}(w_k)-\phi _{j}(w_l)\Vert \). \(\square \)

We now prove that the above is also a sufficient condition for \(W\) to be a local minimum of \(F\) when \(F\) is the Fréchet function for the measure \(\frac{1}{m}\sum _{i}\delta _{X_{i}}\) withe the diagrams \(X_{i}\) each with finitely many off-diagonal points. This requires a result about a local extension of optimal pairings.

### **Proposition 3.2**

Let \(X\) and \(Y\) be diagrams, each with only finitely many off diagonal points, such that there is a unique optimal pairing \(\phi _{X}^{Y}\) between them and no off diagonal point in \(X\) matches the diagonal in \(Y\). We further stipulate that if \(y_k\) and \(y_l\) are off-diagonal points with \(\Vert y_k-y_l\Vert =0\) then \(\Vert (\phi _{X}^{Y})^{-1}(y_k)-(\phi _{X}^{Y})^{-1}(y_l)\Vert =0\). There is some \(r>0\) such that for every \(Z \in B(Y,r)\) there is a unique optimal pairing between \(X\) and \(Z\) and this optimal pairing is induced from the one from \(X\) to \(Y\). By this we mean there is a unique optimal pairing \(\phi _{Y}^Z\) from \(Y\) to \(Z\) and that the unique optimal pairing from \(X\) to \(Z\) is \(\phi _{Y}^Z \circ \phi _{X}^{Y}\).

Furthermore, if \(X_{1}, X_{2}, \ldots , X_{m}\) and \(Y\) are diagrams with finitely many off-diagonal points such that there is a unique optimal pairing \(\phi _{X_{i}}^{Y}\) between \(X_{i}\) and \(Y\) for each \(i\) with the same conditions as above, then there is some \(r>0\) such that for every \(Z \in B(Y,r)\) there is a unique optimal pairing between each \(X_{i}\) and \(Z\) and this optimal pairing is induced by the one from \(X_{i}\) to \(Y\).

### *Proof*

Since \(Y\) has only finitely many off-diagonal points there is some \(\varepsilon >0\) such that for every diagram \(Z\) with \(d(Y,Z)<\varepsilon \) there is a unique geodesic from \(Y\) to \(Z\).

Now suppose \(X_{1}, X_2, \ldots , X_{m}\) and \(Y\) are diagrams with finitely many off diagonal points such that there is a unique optimal pairing \(\phi _{X_{i}}^{Y}\) between \(X_{i}\) and \(Y\) for each \(i\). By the above argument there are some \(r_{1}, r_2,\ldots ,r_{m}>0\) such that for each \(i\) and for every \(Z \in B(Y,r_{i})\) there is a unique optimal pairing between each \(X_{i}\) and \(Z\) and this optimal pairing is induced by the one from \(X_{i}\) to \(Y\). Take \(r=\min \{r_{i}\}\) which is positive. \(\square \)

The following theorem states that Algorithm 1 will find a local minimum on termination.

### **Theorem 3.3**

Given diagrams \(\{X_{1},\ldots ,X_{m}\}\) and the corresponding Fréchet function \(F\), then \(W = \{w_{i}\}\) is a local minimum of \(F\) if and only if there is a unique optimal pairing from \(W\) to each of the \(X_{j}\) denoted as \(\phi _{j}\) and each \(w_{i}\) is the arithmetic mean of the points \(\{\phi _{j}(w_{i})\}_{j=1,2,\ldots ,m}\).

### *Proof*

In Lemma 3.1 we showed that it it is a necessary condition.

Given \(m\) points in the plane or copies of the diagonal, \(\{x_{1}, x_{2}, \ldots , x_{m}\}\), the choice of \(y\) which minimizes \(\sum _{i=1}^{m} \Vert x_{i}-y\Vert ^{2}\) is the arithmetic mean of \(\{x_{1}, \ldots , x_{m}\}\). As a result we know that \(F(Z)>F(W)\) for all \(Z\) with the same optimal pairings as \(W\) to \(X_{1}, X_{2}, \ldots , X_{m}\). Since there is some ball \(B(W,r)\) such that every \(Z\in B(W,r)\) has the same optimal pairings as \(W\), by Proposition 3.2, we know that \(F(Z)>F(W)\) for all \(Z\) in \(B(W,r)\). Thus we can conclude that \(W\) is a local minimum. \(\square \)

## 4 Law of Large Numbers for the Empirical Fréchet Mean

There exist weak and strong laws of large numbers for general metric spaces (for example see [17, Theorem 3.4]). These results hold for global minima of the Fréchet and empirical Fréchet functions \(F\) and \(F_{n}\), respectively. It is not clear to us how to adapt these results to the case of Algorithm 1 where we can only ensure convergence to a local minimum. It is also not clear how we can adapt these theorems to get rates of convergence of the sample Fréchet mean set to the population quantity.

The main results of this section, Theorem 4.1 and Lemma 4.2, provide a probabilistic justification for Algorithm 1. Theorem 4.1 states that with high probability local minima of the empirical Fréchet function \(F_{n}\) will be close to local minima of the Fréchet function \(F\). Ideally we would like the above convergence to hold for global minima, the Fréchet mean set. The condition of Lemma 4.2 states that the number of local minima of \(F_{n}\) is finite and not a function of \(n\). This suggests that applying Algorithm 1 to a random set of start conditions can be used to explore the finite set of local minima.

### **Theorem 4.1**

### *Proof*

We will bound the probability that \(|\xi _{i}-\frac{n}{m}|>\varepsilon \frac{n}{m}\) for any \(i=1,2,\ldots ,m\). We then will show that under the assumption that \(|\xi _{i}-\frac{n}{m}|\le \varepsilon \frac{n}{m}\) for all \(i=1,2,\ldots ,m\) for sufficiently small \(\varepsilon >0\) there is a local minimal \(Y_{n}\) with \(d(Y,Y_{n})^{2}<\frac{\varepsilon ^{2} m F(Y)}{(1-\varepsilon )^{2}}\).

We want to show that \(Y_{n}\) is a local minimum for sufficiently small \(\varepsilon \). Indeed it will be the output of Algorithm 1 given the initializing diagram of \(Y\). Since \(Y\) is a local minimum, Proposition 3.2 implies that there is a ball around \(Y\), \(B(Y,r)\), such that for every diagram in \(B(Y,r)\) there is a unique optimal pairing with each \(Z_{i}\) which corresponds to the unique optimal pairing between \(Y\) and \(Z_{i}\). That is \(\phi _{X}^{Z_{i}} = \phi _{X}^{Y}\circ \phi _{Y}^{Z_{i}}\) for all \(X\in B(Y,r)\). For \(\varepsilon >0\) such that \(\frac{\varepsilon ^{2} m F(Y)}{(1-\varepsilon )^{2}}<r^{2}\) we have \(Y_{n}\in B(Y,r)\). Plugging in for \(\varepsilon \) results in \(\frac{m^{2} F(Y)}{n} \ln \left( \frac{m}{\delta }\right) < r^{2}\).

The above theorem provides a (weak) law of large numbers results for the local minima computed from \(n\) persistence diagrams but it does not ensure that the number of local minima is bounded as \(n\) goes to infinity. The utility of such a convergence result would be limited if the number of local minima could not be bounded. The following lemma states that the number of local minima is bounded.

### **Lemma 4.2**

Let \(\rho =\frac{1}{m}\sum _{i=1}^{m} \delta _{Z_{i}}\) as before. Let \(\rho _{n}=\frac{1}{n}\sum _{k=1}^n \delta _{X_k}\) be the empirical measure of \(n\) points drawn iid from \(\rho \) and \(F_{n}\) is the corresponding Fréchet function. The number of local minima of \(F_{n}\) is bounded by \(\prod _{i=1}^{m}(k_{i}+1)^{(k_{1}+k_2+\ldots k_{m})}\). Here \(k_{i}\) is the number of off-diagonal points in the \(i\)-th diagram. This bound is independent of \(n\).

### *Proof*

Set \(Y_{n}\) as a local minimum of \(F_{n}\). This implies there are unique optimal pairings \(\phi _{i}\) between \(Y_{n}\) and \(X_{i}\) for each \(i\) and that any point \(y\) in \(Y_{n}\) is the arithmetic mean of \(\{\phi _{i}(y)\}\). Since the optimal pairing is unique, if \(X_{i}=X_{j}\) then \(\phi _{i}=\phi _{j}\). This in turn means that the \(\phi _{i}\) are determined by which of \(Z_{i}\) are in the set \(X_{j}\) (with multiplicity). This implies that the number of local minima is bounded by the number of different partitions into subsets of the points in the \(\cup X_{j}\) so that each subset has exactly one point from each of the \(X_{j}\). The number of subsets is bounded by \(k_{1}+k_2+\cdots +k_{m}\) and for each subset there is a bound of \(\prod _{i=1}^{m}(k_{i}+1)\) on the choices of which element to take from each of the \(X_{i}\). Thus the number of different partitions is bounded by \(\prod _{i=1}^{m}(k_{i}+1)^{(k_{1}+k_2+\cdots +k_{m})}\). \(\square \)

We would like to discuss not only the convergence of local minima but also the convergence of the Fréchet means. We can do this in the case when there is a unique Fréchet mean.

### **Lemma 4.3**

Let \(\rho =\frac{1}{m}\sum _{i=1}^{m} \delta _{Z_{i}}\) as before. Suppose further that the corresponding Fréchet function \(F\) has a unique minimum. Let \(\rho _{n}=\frac{1}{n}\sum _{k=1}^n \delta _{X_k}\) be the empirical measure of \(n\) points drawn iid from \(\rho \) and \(F_{n}\) is the corresponding Fréchet function. Let \(\mathbf {Y}\) be the Fréchet mean of \(F\) and \(\mathbf {Y}_{n}\) the set of Fréchet means of \(F_{n}\). With probability \(1\) the Hausdorff distance between \(\mathbf {Y}_{n}\) and \(\mathbf {Y}\) goes to zero as \(n\) goes to infinity.

### *Proof*

It is sufficient for us to show for each \(r>0\) that with probability \(1\) there is some \(N_{r}\) such that \(\mathbf {Y}_{n}\subset B( \mathbf {Y},r)\) for all \(n>N_{r}\).

Fix \(r>0\). Suppose there does not exist some \(N_{r}\) such that \(\mathbf {Y}_{n}\subset B( \mathbf {Y},r)\) for all \(n>N_{r}\). Then there is some sequence of \(W_{n_k}\in \mathbf {Y}_{n_k}\) such that \(d(W_{n_k}, \mathbf {Y})\ge r\). The set \(\{W_{n_k}\}\) is clearly bounded, off-diagonally birth–death bounded and uniform and hence precompact. This implies that \((W_{n_k})\) has a convergent subsequence \((W_{{n_k}_{j}})\). Let \(W\) denote the limit of this sequence. Since \(d(W_{{n_k}_{j}}, \mathbf {Y})\ge r\) for all \(j\) we have \(d(W, \mathbf {Y})\ge r\).

By the arguments in Proposition 2.6 there is some \(K\) independent of \(n\) such that \(F_{n}\) is \(K\)-Lipschitz in \(B(W,1)\) and hence \(|F_{{n_k}_{j}}(W_{{n_k}_{j}}) - F_{{n_k}_{j}}(W)|\le K d(W_{{n_k}_{j}},W)\) for large \(j\). Hence, for all \(\varepsilon >0\) we can say that \( F_{{n_k}_{j}}(W)\le F_{{n_k}_{j}}(W_{{n_k}_{j}}) + \varepsilon \) for sufficiently large \(j\).

The law of large numbers tells us that \(F_{n}(W)\rightarrow F(W)\) and \(F_{n}(\mathbf {Y})\rightarrow F(\mathbf {Y})\) as \(n \rightarrow \infty \) with probability \(1\). Hence for all \(\varepsilon >0\) we know that with probability \(1\) both \(F(W)\le F_{n}(W) +\varepsilon \) and \(F_{n}(\mathbf {Y})\le F(\mathbf {Y}) +\varepsilon \) for sufficiently large \(n\).

From our assumption that \(W_{{n_k}_{j}}\) is a Fréchet mean of \(F_{{n_k}_{j}}\) we know that \(F_{{n_k}_{j}}(W_{{n_k}_{j}})\le F_{{n_k}_{j}}(\mathbf {Y})\) for all \(j\).

## 5 Persistence Diagrams of Random Gaussian Fields

We illustrate the utility of our algorithm in computing means and variances of persistence diagrams in this section via simulation. The idea will be to show that persistence diagrams generated from a random Gaussian field will concentrate around the diagonal with the mean diagram moving closer to the diagonal as the number of diagrams averaged increases.

Variance of the sample Fréchet means

Number of samples | \(H_0\) | \(H_{1}\) |
---|---|---|

2 | 0.8353 | 0.9058 |

4 | 0.6295 | 0.6741 |

8 | 0.4429 | 0.5608 |

16 | 0.4356 | 0.4618 |

32 | 0.3165 | 0.3742 |

64 | 0.3362 | 0.2965 |

128 | 0.3127 | 0.2233 |

## 6 Discussion

In this paper we introduce an algorithm for computing estimates of Fréchet means of a set of persistence diagrams. We demonstrate local convergence of this algorithm and provide a law of large numbers for the Fréchet mean computed on this set when the underlying measure has the form \(\rho = m^{-1} \sum _{i=1}^{m} \delta _{X_{i}}\), where \(X_{i}\) are persistence diagrams. We believe that generically there is a unique global minimum to the Fréchet function and hence a unique Fréchet mean but this needs to be shown.

The work in this paper is a first step and several obvious extensions are needed. A law of large numbers result when the underlying measure is not restricted to a combination of Dirac functions is obviously important. The results in our paper are strongly dependent on the \(L^{2}\)-Wasserstein metric; generalizing these results to the Wasserstein metrics used in computational topology is of central interest. The proofs and problem formulation in this paper are very constructive—the proofs and algorithms are developed for the specific examples and constructions we propose and are not meant to generalize to other metrics or variants on the algorithm. It would be of great interest to provide a presentation of the core ideas in the algorithm and theory we developed in a more general framework using properties of abstract metric spaces and probability theory on these spaces.

## Footnotes

- 1.
If both \(x\) and \(\phi (x)\) are the diagonal then this is the diagonal. If exactly one of \(x\) or \(\phi (x)\) is the diagonal then we replace it in this sum by the closest point in the diagonal to \(\phi (x)\) or \(x\) respectively.

- 2.
Terminology given by Gromov [9] that stands for Cartan, Alexandrov, and Toponogov.

## Notes

### Acknowledgments

SM and KT would like to acknowledge Shmuel Weinberger for discussions and insight. SM and KT would like to acknowledge E. Subag with help in obtaining persistence diagrams computed from random Gaussian fields and explaining the generative model. JH and YM are pleased to acknowledge the support from grants DTRA: HDTRA1-08-BRCWMD, DARPA: D12AP00001On, AFOSR: FA9550-10-1-0436, and NIH (Systems Biology): 5P50-GM081883. SM is pleased to acknowledge support from grants NIH (Systems Biology): 5P50-GM081883, AFOSR: FA9550-10-1-0436, and NSF CCF-1049290.

## References

- 1.Adler, R.J., Bobrowski, O., Borman, M.S., Subag, E., Weinberger, S.: Persistent homology for random fields and complexes. In: Berger, J.O., Tony Cai, T., Johnstone, I.M. (eds.) Borrowing Strength: Theory Powering Applications—A Festschrift for Lawrence D. Brown, vol. 6. Institute of Mathematical Statistics, Beachwood (2010)Google Scholar
- 2.Bendich, P., Mukherjee, S., Wang B.: Local homology transfer and stratification learning. In: ACM-SIAM Symposium on Discrete Algorithms (2012)Google Scholar
- 3.Birdson, M.R., Haefliger, A.: Metric Spaces of Non-positive Curvature. Springer-Verlag, Berlin (1999)CrossRefGoogle Scholar
- 4.Bubenik, P., Carlsson, G., Kim, P.T., Luo, Z.-M.: Statistical topology via Morse theory, persistence, and nonparametric estimation. In: Viana, M.A.G., Wynn, H.P. (eds.) Algebraic Methods in Statistics and Probability II. Contemporary Mathematics, vol. 516, pp. 75–92. American Mathematical Society, Providence (2010)CrossRefGoogle Scholar
- 5.Burago, Y., Gromov, M., Perel’man, G.: A.D. Alexandrov spaces with curvature bounded below. Russ. Math. Surv.
**47**(2), 1–58 (1992)CrossRefzbMATHMathSciNetGoogle Scholar - 6.Chazal, F., Cohen-Steiner, D., Lieutier, A.: A sampling theory for compact sets in Euclidean space. Discrete Comput. Geom.
**41**, 461–479 (2009)CrossRefzbMATHMathSciNetGoogle Scholar - 7.Cohen-Steiner, D., Edelsbrunner, H., Harer, J., Mileyko, Y.: Lipschitz functions have \({L}_p\)-stable persistence. Found. Comput. Math.
**10**, 127–139 (2010). doi: 10.1007/s10208-010-9060-6 CrossRefzbMATHMathSciNetGoogle Scholar - 8.Edelsbrunner, H., Harer, J.: Computational Topology: An Introduction. American Mathematical Society, Providence (2010)Google Scholar
- 9.Gromov, M.: Hyperbolic groups. In: Gersten, S.M. (ed.) Essays in Group Theory. Mathematical Sciences Research Institute Publications, vol. 8, pp. 75–263. Springer, New York (1987)Google Scholar
- 10.Kahle, M.: Topology of random clique complexes. Discrete Math.
**309**(6), 1658–1671 (2009)CrossRefzbMATHMathSciNetGoogle Scholar - 11.Kahle, M.: Random geometric complexes (2011). http://arxiv.org/abs/0910.1649
- 12.Kahle, M., Meckes, E.: Limit theorems for Betti numbers of random simplicial complexes (2010). http://arxiv.org/abs/1009.4130v3
- 13.Lott, J., Villani, C.: Ricci curvature for metric-measure spaces via optimal transport. Ann. Math.
**169**, 903–991 (2009)CrossRefzbMATHMathSciNetGoogle Scholar - 14.Lunagómez, S., Mukherjee, S., Wolpert, R.L.: Geometric representations of hypergraphs for prior specification and posterior sampling (2009). http://arxiv.org/abs/0912.3648
- 15.Lytchak, A.: Open map theorem for metric spaces. St. Petersbg. Math. J.
**17**(3), 477–491 (2006)CrossRefzbMATHMathSciNetGoogle Scholar - 16.Mileyko, Y., Mukherjee, S., Harer, J.: Probability measures on the space of persistence diagrams. Inverse Probab.
**27**(12), 124007 (2012)CrossRefMathSciNetGoogle Scholar - 17.Molchanov, I.: Theory of Random Sets. Springer, London (2005)zbMATHGoogle Scholar
- 18.Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math.
**5**(1), 32–38 (1957)CrossRefzbMATHMathSciNetGoogle Scholar - 19.Niyogi, P., Smale, S., Weinberger, S.: Finding the homology of submanifolds with high confidence from random samples. Discrete Comput. Geom.
**39**, 419–441 (2008)CrossRefzbMATHMathSciNetGoogle Scholar - 20.Niyogi, P., Smale, S., Weinberger, S.: A topological view of unsupervised a topological view of unsupervised learning from noisy data. Manuscript (2008)Google Scholar
- 21.Ohta, S.: Barycenters in Alexandrov spaces with curvature bounded below. Adv. Geom.
**12**, 571–587 (2012)zbMATHMathSciNetGoogle Scholar - 22.Penrose, M.D.: Random Geometric Graphs. Oxford University Press, New York (2003)CrossRefzbMATHGoogle Scholar
- 23.Penrose, M.D., Yukich, J.E.: Central limit theorems for some graphs in computational geometry. Ann. Appl. Probab.
**11**(4), 1005–1041 (2001)zbMATHMathSciNetGoogle Scholar - 24.Petrunin, A.: Semiconcave functions in Alexandrov’s geometry. Surv. Differ. Geom.
**11**, 137–201 (2007)CrossRefMathSciNetGoogle Scholar - 25.Sturm, K.-T.: Probability measures on metric spaces of nonpositive curvature. In: Auscher, P., Coulhon, T., Grigoryan, A. (eds.) Heat Kernels and Analysis on Manifolds, Graphs, and Metric Spaces, vol. 338. American Mathematical Society, Providence (2002)Google Scholar