Morse Theory for Filtrations and Efficient Computation of Persistent Homology
 1.4k Downloads
 68 Citations
Abstract
We introduce an efficient preprocessing algorithm to reduce the number of cells in a filtered cell complex while preserving its persistent homology groups. The technique is based on an extension of combinatorial Morse theory from complexes to filtrations.
Keywords
Computational topology Discrete Morse theory Persistent homology1 Introduction
The use of topological methods for data analysis is rapidly growing and persistent homology is proving to be one of the more successful techniques [1, 6, 7, 10]. Three fundamental properties account for the importance of persistent homology: (i) being based on algebraic topology, it provides a well understood codification of potentially complicated and/or high dimensional geometric information, (ii) the information it provides is stable with respect to perturbations [3], and (iii) it is readily computable [4, 8, 24, 34, 35]. Our focus is on this last point.
The most common algorithm used for computing persistent homology is presented in [35] wherein it is remarked that the worst case complexity is of the same order in time and space as that of computing homology. Subsequent work [21] has reduced this complexity to \({{\mathrm{O}}}(n^\omega )\), where \(\omega \) is the matrix multiplication exponent. To the best of our knowledge, there are no known implementations of fast matrix multiplication besides Strassen’s algorithm [31], which has an exponent \(\omega _S \sim 2.8\).
While our work is strongly motivated by the usefulness of persistent homology, there are instances in which one is interested in computing general homology groups of filtered complexes. The optimal worst case analysis for current homology algorithms appears to be superquartic over general principal ideal domains [12] and roughly cubical in practice over the integers [5, 28, 30] with respect to the size of the input complex. For massive datasets, this can be a severe limitation. Since we know of no way to improve the worst case complexity of the problem, the strategy we propose in this paper is to use ideas from combinatorial Morse theory [9] to reduce the initial complex using geometric and combinatorial methods before applying the algorithms of [35]. This reduction preserves all homological information in general and persistent homology groups in particular.
For a heuristic understanding of our approach, consider a complex \(\mathcal{X }\) (a precise definition is given in Sect. 2) with a finite nested sequence of subcomplexes \(\mathcal{X }^0\subset \mathcal{X }^1 {\subset }{\cdots }{\subset } \mathcal{X }^K {=} \mathcal{X }\). We refer to this structure as a filtration \(\mathcal{F }\) of \(\mathcal{X }\). The inclusions canonically induce maps \(i^{k}_*:{{\mathrm{H}}}_*(\mathcal{X }^k){\rightarrow } {{\mathrm{H}}}_*(\mathcal{X }^{k+1})\) on the homology groups. For each number \(p \ge k\), let \(i^{k,p}_*:{{\mathrm{H}}}_*(\mathcal{X }^k){\rightarrow } {{\mathrm{H}}}_*(\mathcal{X }^p)\) denote the composition \(i^{p1}_*\circ \cdots \circ i^{k+1}_* \circ i^{k}_*\).
When working over field coefficients, it is possible to simultaneously choose bases of \({{\mathrm{H}}}_*(\mathcal{X }^k)\), \(k = 0,\ldots , K\) such that for each basis element \(\alpha \in {{\mathrm{H}}}_q(\mathcal{X }^k)\) there exists a welldefined pair of integers \(b_\alpha \le k\) and \(d_\alpha \ge k+1\) satisfying the following properties: \(b_\alpha \) is the smallest integer \(\ell \) so that \(\alpha \in i^{\ell ,k}_q({{\mathrm{H}}}_q(\mathcal{X }^\ell ))\) and \(d_\alpha \) is the largest integer \(\ell \) with \(i^{k,\ell 1}_q(\alpha ) \ne 0\). The pair \((b_\alpha ,d_\alpha )\) indicates the ”birth” and ”death” of the topological feature identified by \(\alpha \). Observe that if \(\beta \) is the element of \({{\mathrm{H}}}_q(\mathcal{X }^p)\) satisfying \(\beta = i^{k,p}(\alpha )\), then \((b_\alpha ,d_\alpha ) = (b_\beta ,d_\beta )\) and hence we identify these intervals. The collection of these equivalence classes of pairs ranging over all \(\alpha \in {{\mathrm{H}}}_q(\mathcal{X }^k)\) for \(k=0,\ldots , K\) produces the \(q\)th persistence diagram for the filtration \(\mathcal{F }\).
The complexity of computing the persistence diagram of a filtration \(\mathcal{F }\) is essentially determined by the complexes \(\{\mathcal{X }^k\}\). Thus, a natural approach for reducing the computational cost is to perform an efficient preprocessing step that constructs an alternate filtration \(\mathcal{F }^{\prime }\) consisting of significantly smaller complexes \(\{\mathcal{X }^{\prime k}\}\) which has the same persistence diagram as \(\mathcal{F }\). The same strategy has been adopted to compute homology of an unfiltered complex [14, 19]. Again, we provide a heuristic description of this technique.
In the classical setting, the Morse homology of smooth manifolds is defined in terms of a complex where the chains are generated by critical points of a smooth functional and the boundary operator is determined by heteroclinic orbits generated by the gradient flow of the functional. In the combinatorial setting, the gradient flow is replaced by a partial pairing on cells in the complex. The unpaired cells generate the chains in the Morse complex, while the boundary operator is defined via paths in the cell complex generated by the pairing. The preprocessing algorithm of [14] is based in part on the coreduction algorithm of [23] and provides an efficient means for constructing a partial pairing on a given cell complex. As is demonstrated in [13], for many complexes the resulting Morse complex is many orders of magnitude smaller than the original. A contribution of this paper is an algorithm that takes a filtration \(\mathcal{F }\) and produces a new, typically much smaller, filtration \(\mathcal{M }\) such that the persistence diagram of \(\mathcal{M }\) agrees with that of \(\mathcal{F }\). In Sect. 6 we show that for many examples this preprocessing step has the advantage of significantly reducing both the computational time and the required memory for running the persistence algorithm [35].
An outline of this paper is as follows. Section 2 recalls the fundamental ideas and constructions related to complexes, persistent homology and combinatorial Morse theory. Section 3 provides a categorical construction that allows us to relate the persistence homology groups of different filtrations. Using this language we prove Theorem 4.3 in Sect. 4, which establishes that the persistent homology of a given filtration is equivalent to the persistent homology of an associated Morse filtration. Section 5 contains the preprocessing algorithm MorseReduce along with our main result which demonstrates that the output of MorseReduce is a filtration with the same persistent homology as the input filtration. Finally, Sect. 6 presents experimental results derived from applying our preprocessing algorithm to a variety of filtrations based on different types of complexes.
2 Background
In this section we provide a brief review, primarily to establish notation of complexes, persistent homology and combinatorial Morse theory.
2.1 Complexes
As indicated in the Introduction, our interest in computing persistent homology is motivated by data analysis. For these problems typically one does not have an a priori understanding of the structure of the underlying space and thus one works with abstract complexes which may or may not correspond to a geometric or topological realization. With this in mind, we use a rather general notion of complex that dates back to Tucker [32] and Lefschetz [18]. Throughout this paper, \(\mathbf{R}\) denotes a principal ideal domain (PID) whose invertible elements will be called units.
Definition 2.1
 (i)For each \(\xi \) and \(\xi ^{\prime }\) in \(\mathcal{X }\),$$\begin{aligned} \kappa (\xi ,\xi ^{\prime }) \ne 0\quad \text{ implies }\quad \dim \xi = \dim \xi ^{\prime } + 1. \end{aligned}$$(1)
 (ii)For each \(\xi \) and \(\xi ^{\prime \prime }\) in \(\mathcal{X }\),$$\begin{aligned} \sum \limits _{\xi ^{\prime }\in \mathcal{X }} \kappa (\xi ,\xi ^{\prime }) \cdot \kappa (\xi ^{\prime },\xi ^{\prime \prime }) = 0. \end{aligned}$$(2)
Consider \(\mathcal{X }^{\prime } \subset \mathcal{X }\) and note that the restriction of \(\kappa \) to \(\mathcal{X }^{\prime } \times \mathcal{X }^{\prime }\) satisfies (1). If for each \(\eta \in \mathcal{X }^{\prime }\) the set \(\{\xi \in \mathcal{X }\mid \xi \prec \eta \}\) is contained in \(\mathcal{X }^{\prime }\), then we say that \(\mathcal{X }^{\prime }\) satisfies the subcomplex property and call \(\mathcal{X }^{\prime }\) a subcomplex of \((\mathcal{X },\kappa )\). Note that Eq. (2) is automatically satisfied for a subcomplex \(\mathcal{X }^{\prime }\), and so \((\mathcal{X }^{\prime },\kappa )\) is a complex in its own right.
2.2 Persistent Homology
Recall that \(\mathcal{F }= \{\mathcal{X }^k\mid k = 1,\ldots K\}\) is a filtration of the complex \(\mathcal{X }\) if for all \(k\), \(\mathcal{X }^k\) is a subcomplex of \(\mathcal{X }\) and \(\mathcal{X }^k\subset \mathcal{X }^{k+1}\). The individual complex \(\mathcal{X }^k\) is referred to as the \(k\)th frame of the filtration. Let \(i^{p,k}:\mathcal{X }^k\hookrightarrow \mathcal{X }^{k+p}\) denote the inclusion map. This induces a natural map on the chain complex, which we also denote by \(i^{p,k}_*:C_*(\mathcal{X }^k)\rightarrow C_*(\mathcal{X }^{k+p})\).
2.3 Combinatorial Morse Theory
Let \((\mathcal{X },\kappa )\) be a complex over the PID \(\mathbf{R}\) and denote by \(\prec \) the generating relation of the face partial order \({{\curlyeqprec }}\) on \(\mathcal{X }\).
Definition 2.2
A partial matching of \((\mathcal{X },\kappa )\) consists of a partition of \(\mathcal{X }\) into three sets \(\mathcal{A }\), \(\mathcal{K }\) and \(\mathcal{Q }\) along with a bijection \(\mathop {w}\nolimits :\mathcal{Q }\rightarrow \mathcal{K }\), such that for each \(Q \in \mathcal{Q }\) the incidence \(\kappa (\mathop {w}\nolimits (Q),Q)\) is a unit in \(\mathbf{R}\). We denote this matching by \((\mathcal{A },\mathop {w}\nolimits :\mathcal{Q }\rightarrow \mathcal{K })\).
Remark 2.3
The definition of an acyclic matching \((\mathcal{A }, \mathop {w}\nolimits :\mathcal{Q }\rightarrow \mathcal{K })\) is clearly related to earlier presentations of combinatorial Morse theory. See for example the work of Forman [9], Chari [2], and in particular Kozlov [16, 17]. Elements of \(\mathcal{A }\) are typically referred to as critical cells in analogy to classical Morse theory. The paired elements in \(\mathcal{K }\) and \(\mathcal{Q }\) are often not explicitly labelled since from a purely Morse theoretic perspective they are unimportant objects; it is only the pairing \(\mathop {w}\nolimits \) that plays an essential role. However, our interest is in using combinatorial Morse theory to develop algorithms that are designed to be applied to complexes arising from experimental or numerical data sets. In particular, as is explained in Sect. 5 we often iteratively apply the preprocessing algorithm of this paper to the resulting Morse complex. This has no analogue in the classical Morse theory and in particular the critical cells of one complex cease to be critical cells in the next iterate of the algorithm. Similarly, in some applications (e.g. computing induced maps on homology [14]) it is essential to be able to recover homology generators in the original complex. For this we need to keep track of the paired cells and find it useful to have different labels for the different elements of the pairing.
Theorem 2.4
The complex \((\mathcal{A },\widetilde{\kappa })\) is called the Morse complex associated to the acyclic matching \((\mathcal{A },\mathop {w}\nolimits :\mathcal{Q }\rightarrow \mathcal{K })\) of \(\mathcal{X }\) and \(\widetilde{\kappa }\) is called the associated Morse incidence function. Theorem 2.4 follows from the work of Forman [9] and has been reproven in a variety of contexts [2, 14, 16, 17]. For the purpose of obtaining Theorem 4.3 we have adopted a slightly different presentation. Thus, to introduce the necessary notation we conclude this section with a terse outline of the proof which is obtained inductively via the following reduction step.
Observe that the acyclic matching \((\mathcal{A },\mathop {w}\nolimits :\mathcal{Q }\rightarrow \mathcal{K })\) on \(\mathcal{X }\) induces an acyclic matching on \(\mathcal{X }_Q\) of the form \((\mathcal{A },\mathop {w}\nolimits _Q:\mathcal{Q }_Q\rightarrow \mathcal{K }_Q)\) where \(\mathcal{Q }_Q = \mathcal{Q }\setminus \{Q\}\), \(\mathcal{K }_Q = \mathcal{K }\setminus \{\mathop {w}\nolimits (Q)\}\), and \(\mathop {w}\nolimits _Q = \mathop {w}\nolimits \mid _{\mathcal{Q }_Q}\).
Lemma 2.5
The maps \(\psi _{Q*}\) and \(\phi _{Q*}\) are chain equivalences.
Proof
An immediate consequence of Lemma 2.5 is that \({{\mathrm{H}}}_{*}(\mathcal{X }) \cong {{\mathrm{H}}}_*(\mathcal{X }_Q)\). Before concluding the proof of Theorem 2.4 we need the following result which guarantees that the Morse incidence function remains unaffected by the reduction step.
Proposition 2.6
Let \(\widetilde{\kappa }_Q\) denote the Morse incidence function of the reduced complex \(\mathcal{X }_Q\) with the induced acyclic matching \((\mathcal{A },\mathop {w}\nolimits _Q:\mathcal{Q }_Q\rightarrow \mathcal{K }_Q)\). Then, \(\widetilde{\kappa }_Q\equiv \widetilde{\kappa }\) on \(\mathcal{A }\times \mathcal{A }\).
Proof
Fix cells \(A\) and \(A^{\prime }\) in \(\mathcal{A }\) and let \(\rho = (Q_1,\ldots ,w(Q_M))\) be a connection in \(\mathcal{X }_Q\) from \(A\) to \(A^{\prime }\). We make the simplifying assumptions that \(Q \nprec A\) and \(A^{\prime }\nprec \mathop {w}\nolimits (Q)\), because the argument is very similar to the sequel when one or both of these assumptions is revoked. Now, we have \(\kappa _Q(A,A^{\prime }) = \kappa (A,A^{\prime })\), so we only need to show that the sumoverconnections term of (7) is the same for \(\mathcal{X }\) and \(\mathcal{X }_Q\).
Finally, we provide a brief proof of the central theorem of combinatorial Morse theory.
Proof of Theorem 2.4
3 Filtered Chain Maps
As Theorem 2.4 indicates, combinatorial Morse theory provides a means by which the homology of a complex can be computed using a potentially smaller complex. The main result of this paper is a corresponding result for computing persistent homology of a filtration. There are two issues that need to be resolved: construction of the new filtration and demonstrating that the persistent homology is the same for both filtrations. To clarify the proof of the second issue, we provide a short description of an obvious categorical structure on the set of filtrations.
Let \((\mathcal{X },\kappa )\) and \((\mathcal{X }^{\prime },\kappa ^{\prime })\) be complexes and let \(\mathcal{F }\) and \(\mathcal{F }^{\prime }\) be filtrations of \(\mathcal{X }\) and \(\mathcal{X }^{\prime }\) respectively. We are interested in comparing the persistent homology between these filtrations and thus we turn our attention to chain maps.
Definition 3.1
A filtered chain map induces a family of maps on homology from \({{\mathrm{H}}}_*(\mathcal{X }^k)\) to \({{\mathrm{H}}}_*(\mathcal{X }^{\prime k})\) for each \(k\). More interesting for the purposes of this paper is the following.
Proposition 3.2
The proof follows directly from the commuting diagram of the preceding definition and the fact that every map in sight is a chain map. Next, we extend the concept of a chain homotopy to filtrations as follows.
Definition 3.3
Let \(\Phi ,\Psi :\mathcal{F }\rightarrow \mathcal{F }^{\prime }\) be filtered chain maps. A filtered chain homotopy between \(\Phi \) and \(\Psi \) consists of a collection of chain homotopies \(\{\Theta ^k \mid k = 1,\ldots ,K\}\) between each \(\phi ^k_*\) and \(\psi ^k_*\).
If \(\Phi \) and \(\Psi \) are filtered chain homotopic maps from \(\mathcal{F }\) to \(\mathcal{F }^{\prime }\) then the induced maps \(\phi ^{p,k}_*:{{\mathrm{H}}}^p_*(\mathcal{X }^k) \rightarrow {{\mathrm{H}}}^p_*(\mathcal{X }^{\prime k}) \) and \(\psi ^{p,k}_*:{{\mathrm{H}}}^p_*(\mathcal{X }^k) \rightarrow {{\mathrm{H}}}^p_*(\mathcal{X }^{\prime k})\) are identical on the persistent homology groups. This follows from the commuting diagram of Definition 3.1. Finally, filtered chain maps \(\Phi : \mathcal{F }\rightarrow \mathcal{F }^{\prime }\) and \(\Psi : \mathcal{F }^{\prime } \rightarrow \mathcal{F }\) are filtered chain equivalences if \(\Phi \circ \Psi \) and \(\Psi \circ \Phi \) are filtered chain homotopic to the identity. In particular, we have the following simple proposition.
Proposition 3.4
The proof of this proposition for a fixed \(p\) and \(k\) is a straightforward calculation which only requires the existence of a chain homotopy \(\Theta _*^{k+p}\) between \(\phi _*^{k+p}\) and \(\psi _*^{k+p}\).
4 Filtered Morse Complexes
We begin by extending an acyclic matching of a complex to a filtration. Consider a filtration \(\mathcal{F }= \{\mathcal{X }^k\mid k = 1,\ldots , K\}\) of a complex \((\mathcal{X },\kappa )\).
Definition 4.1
Assume we have a filtered acyclic matching \((\mathcal{A }^k,\mathop {w}\nolimits ^k:\mathcal{Q }^k\rightarrow \mathcal{K }^k)\) of the filtration \(\mathcal{F }\). By convention, we write \((\mathcal{A },\mathop {w}\nolimits :\mathcal{Q }\rightarrow \mathcal{K })\) for the matching \((\mathcal{A }^K,\mathop {w}\nolimits ^K:\mathcal{Q }^K\rightarrow \mathcal{K }^K)\) of the final frame \(\mathcal{X }^K = \mathcal{X }\). In particular, \((\mathcal{A },\widetilde{\kappa })\) is the Morse complex corresponding to the acyclic matching \((\mathcal{A },\mathop {w}\nolimits :\mathcal{Q }\rightarrow \mathcal{K })\) of \((\mathcal{X },\kappa )\).
Proposition 4.2
\(\mathcal{M }{:=} \{ \mathcal{A }^k\mid k = 1,\ldots , K\}\) is a filtration of the Morse complex \((\mathcal{A },\widetilde{\kappa })\).
Proof
 (1)
\(\widetilde{\kappa }^k(A,A^{\prime }) = \widetilde{\kappa }(A,A^{\prime })\) as desired, and more importantly,
 (2)
given \(A \in \mathcal{X }^k\), any connection \(\rho \) from \(A\) lies entirely in \(\mathcal{X }^k\).
We call \(\mathcal{M }\) the Morse filtration associated to the filtered acyclic matching \((\mathcal{A }^k,\mathop {w}\nolimits ^k:\mathcal{K }^k\rightarrow \mathcal{Q }^k)\). By Theorem 2.4, the associated Morse complex \((\mathcal{A },\widetilde{\kappa })\) has the same homology as the complex \((\mathcal{X },\kappa )\). We now extend this result to the level of persistent homology.
Theorem 4.3
 (1)
by Definition 4.1, \(b(Q) = b(\mathop {w}\nolimits (Q))\) for each \(Q \in \mathcal{Q }\), and
 (2)
by the subcomplex property, \(b(\xi ) \le b(\eta )\) whenever \(\xi \prec \eta \).
Proposition 4.4
The maps \(\Psi :\mathcal{F }\rightarrow \mathcal{M }\) and \(\Phi :\mathcal{M }\rightarrow \mathcal{F }\) are filtered chain equivalences.
Proof
Similarly, we show that \(\phi _{Q*}\) is the identity map on \(C(\mathcal{X }^k)\) whenever \(b(Q) \ge k+1\). From (10), \(\phi _{Q*}(\eta )\) may differ from \(\eta \) only when \(\kappa (\eta ,Q) \ne 0\), i.e., when \(Q \prec \eta \). By the second observed property of the function \(b\), we must have \(b(\eta ) \ge b(Q) = k+1\) and so \(\eta \in \mathcal{X }\setminus \mathcal{X }^k\) as desired. Thus, \(\Phi \) is a filtered chain map as well.\(\square \)
5 Algorithms
5.1 Description
Theorem 4.3 implies that it is possible to compute the persistent homology groups of \(\mathcal{F }\) by applying the algorithm of [35] to a smaller filtration \(\mathcal{M }=\{ \mathcal{A }^k\mid k = 1,\ldots , K\}\) associated with a Morse complex \((\mathcal{A },\widetilde{\kappa })\) for \(\mathcal{X }\). The usefulness of this approach depends upon having an efficient algorithm for constructing the filtration \(\mathcal{M }\) and the Morse incidence function \(\widetilde{\kappa }\), or equivalently the boundary operator \(\partial ^\mathcal{A }\) on \(\mathcal{A }\).
The filtration \(\mathcal{M }\) and incidence function \(\widetilde{\kappa }\) depend on the acyclic matching \((\mathcal{A },\mathop {w}\nolimits :\mathcal{Q }\rightarrow \mathcal{K })\). The trivial matching given by \(\mathcal{A }=\mathcal{X }\) and \(\mathcal{Q }=\mathcal{K }=\varnothing \) always exists, but results in the same filtration and thus provides no savings in computational cost. Clearly, the desired goal is to choose an acyclic matching which minimizes the number of cells in \(\mathcal{A }\), or equivalently maximizes the number of cells paired by \(\mathop {w}\nolimits :\mathcal{Q }\rightarrow \mathcal{K }\). It is known that in general the problem of constructing an optimal acyclic matching is NP hard (see [15] and [20, Sect. 4.5]).
We differ from [13, 14] in the construction of \(\partial ^\mathcal{A }\). From (7) it is clear that \(\widetilde{\kappa }\)—and hence \(\partial ^\mathcal{A }\)—is defined by summing over all connections between cells in \(\mathcal{A }\). A naïve attempt to enumerate all the connections between two such cells can lead to a combinatorial explosion. To circumvent this summation, we make use of the observation that the coreductionbased construction of the pairing \(\mathop {w}\nolimits :\mathcal{Q }\rightarrow \mathcal{K }\) is done by building gradient paths in reverse order with respect to \(\ll \). We therefore proceed by assigning to each cell \(\zeta \in \mathcal{X }\) a chain \(g(\zeta )\in C_*(\mathcal{A })\) such that if \(A \in \mathcal{A }\), then \(g(A) = \partial ^\mathcal{A }A\).
Thus, the output \(A^{\prime }\) of MakeCritical becomes a generator of \(C_*(\mathcal{A }^k)\). The gradient chains of remaining coboundary cells \(\text{ cb }_\mathcal{N }(A^{\prime })\) are then updated to reflect their incidence with \(A^{\prime }\). In this manner, the construction of gradient chains is from the “bottomup”. Finally, the action of the Morse boundary operator \(\partial ^\mathcal{A }\) on \(A^{\prime }\) is recovered from the corresponding gradient chain \(g(A^{\prime })\).
Recall that on the theoretical level coreduction pairs are identified as \(\mathop {w}\nolimits \)paired cells and hence they define steps in gradient paths. Thus, before the coreduction pair can be removed two additional steps need to be performed involving the remaining coboundary cells \(\text{ cb }(Q)\) of \(Q\). First, we check if the removal of \(Q\) has created new coreduction pairs. For this, it suffices to check cells in the coboundary of \(Q\) and so we enqueue those cells in a queue structure. Secondly, if the pair \((K,Q)\) potentially lies on a gradient path between unpaired cells of adjacent dimension, the gradient chains of \(Q\) and hence of its remaining coboundary cells are updated by a call to UpdateGradientChain.
Note that we use a queue data structure Que which gets reinitialized once for each iteration of the outer while loop from Line 02. We keep track of which cells are in Que so that no cell is queued twice per such iteration. This can be achieved in practice either by storing an additional flag for each cell or by mirroring the queue in a separate data structure which has been optimized for search.
5.2 Verification
We use Theorem 4.3 to confirm that the output filtration \(\mathcal{M }\) generated by MorseReduce has the same persistent homology groups as those of the input filtration \(\mathcal{F }\).
Theorem 5.1
 (1)
MorseReduce terminates when applied to \(\{\mathcal{N }^k\}_1^K\) and produces smaller collections of cells \(\{\mathcal{N }_\mathcal{A }^k\}_1^K\).
 (2)
The output \(\{\mathcal{N }^k_\mathcal{A }\}\) defines a filtration \(\mathcal{M }\) of a complex \((\mathcal{A },\widetilde{\kappa })\) where each frame \(\mathcal{A }^k\) is given by \(\bigcup _{\ell =1}^k\mathcal{N }_\mathcal{A }^\ell \) and the underlying incidence function \(\widetilde{\kappa }\) corresponds to the boundary operator \(\partial ^\mathcal{A }\).
 (3)For each \(p\), \(q\) and \(k\), we have an isomorphism of the corresponding persistent homology group$$\begin{aligned} {{\mathrm{H}}}^p_q(\mathcal{X }^k)\cong {{\mathrm{H}}}^p_q(\mathcal{A }^k). \end{aligned}$$
Proof
Each iteration of the outer while loop from Line 02 permanently excises at least one cell \(A\) via MakeCritical.^{1} The fact that no cell is queued twice during any iteration of the second while loop in Line 07 guarantees the absence of infinite loops. Moreover, it is clear that the final size of each \(\mathcal{N }_\mathcal{A }^k\) is smaller than the initial size of \(\mathcal{N }^k\) because MakeCritical is only called once per iteration of the outer while loop and each call to MakeCritical results in a single cell from \(\mathcal{N }^k\) being removed and stored in the corresponding \(\mathcal{N }_\mathcal{A }^k\). Thus, \(\mathcal{N }_\mathcal{A }^k \subset \mathcal{N }^k\) for each \(k\).
Observe from Line 10 that if \((\xi , \eta )\) is sent to RemovePair, then \(b(\xi ) = b(\eta )\) and \(\kappa (\xi ,\eta )\) equals some unit \(u\) in \(\mathbf{R}\). Let \(k_* = b(\xi )\), and note that defining \(\mathop {w}\nolimits ^{k_*}(\eta )=\xi \) for each such pair constructs \(\mathop {w}\nolimits ^{k_*}:\mathcal{Q }^{k_*}\rightarrow \mathcal{K }^{k_*}\). Combining this pairing information with the output of MakeCritical produces a filtered partial matching \((\mathcal{A }^k,\mathop {w}\nolimits ^k:\mathcal{Q }^k\rightarrow \mathcal{K }^k)\) of \(\mathcal{F }\).
To see that this partial matching is acyclic, observe from Lines 10 and 11 that a pairing \(\mathop {w}\nolimits (\eta ) = \xi \) is only made when \(\eta \) is the last remaining face of \(\xi \), i.e., the unique cell in \(\{\zeta \in \mathcal{N }^{b(\xi )} \mid \zeta \prec \xi \}\). Recall that \(Q~{\lhd ~}\eta \) for some \(Q \in \mathcal{Q }\) if and only if \(Q \prec \xi \) by (4). Thus, all elements of \(\mathcal{Q }\) satisfying \(Q~{\lhd ~}\eta \) must have already been been excised before the pair \((\xi ,\eta )\) and so the order of pair excision respects the relation \({\lhd ~}\) on \(\mathcal{Q }\). Therefore, the transitive closure \(\ll \) of the generating relation \({\lhd ~}\) must be a partial order on \(\mathcal{Q }\) as desired.
By Theorem 4.3, in order to show that the output determines a filtration \(\mathcal{M }\) with isomorphic persistent homology to \(\mathcal{F }\), it suffices to establish that \(\mathcal{M }\) is the Morse filtration associated to the acyclic matching \((\mathcal{A }^k,\mathop {w}\nolimits ^k:\mathcal{Q }^k\rightarrow \mathcal{K }^k)\). Thus, we must ensure that the stored boundary \(\partial ^\mathcal{A }\) of each cell \(A \in \mathcal{A }\) built from the corresponding gradient chain \(g(A)\) equals the boundary operator corresponding to the Morse incidence function \(\widetilde{\kappa }\) from (7). This is addressed by the subsequent proposition, which concludes the proof.\(\square \)
The proof of the following proposition employs the usual inner product \(\left\langle {~,~} \right\rangle :C(\mathcal{X }) \times C(\mathcal{X }) \rightarrow \mathbf{R}\) on chains of the input complex \((\mathcal{X },\kappa )\) obtained by treating the cells in \(\mathcal{X }\) as an orthonormal basis.
Proposition 5.2
Proof
 [A] Assume that \(\zeta \) is an unremoved cell with \(A^{\prime } \prec \zeta \). Then, by Line 02 of MakeCritical and the subsequent call to UpdateGradientChain, the gradient chain \(g(\zeta )\) of \(\zeta \) is incremented as follows:Since this is the first instance of \(A^{\prime }\) being added to gradient chains, we are guaranteed to have \(\left\langle {g(\zeta ),A^{\prime }} \right\rangle = \kappa (\zeta ,A^{\prime })\) when MakeCritical returns \(A^{\prime }\).$$\begin{aligned} g(\zeta ) \leftarrow g(\zeta ) + \kappa (\zeta ,A^{\prime }) \cdot A^{\prime }. \end{aligned}$$
 [Q] Assume \(\zeta \) is an arbitrary unremoved cell. Each cell \(Q\) excised as an element of \(\mathcal{Q }\) via RemovePair inherits its gradient chain from the existing gradient chain of its paired cell \(\mathop {w}\nolimits (Q)\) by the formulaThis follows from Line 04 of RemovePair. As UpdateGradientChain is called in the next line, each remaining cell \(\zeta \) satisfying \(Q \prec \zeta \) has its gradient chain incremented by \(\kappa (\zeta ,Q)\cdot g(Q)\). By the preceding formula for \(g(Q)\), we have$$\begin{aligned} g(Q) = \frac{g\big (\mathop {w}\nolimits (Q)\big )}{\kappa \big (\mathop {w}\nolimits (Q),Q\big )}. \end{aligned}$$$$\begin{aligned} g(\zeta ) \leftarrow g(\zeta ) + \frac{\kappa (\zeta ,Q)}{\kappa \big (\mathop {w}\nolimits (Q),Q\big )} \cdot g\big (\mathop {w}\nolimits (Q)\big ). \end{aligned}$$
5.3 Complexity
5.3.1 Parameters and Assumptions
 (1)
The input size—denoted by \(n\)—is the number of cells in \(\mathcal{X }\).
 (2)
The output size is the number of cells in the filtered Morse complex \(\mathcal{A }\) which we denote by \(m\). Note that \(m\) is partitioned by \(m = m_0 + \cdots + m_D\) where \(m_d\) is the cardinality of \(d\)dimensional cells in \(\mathcal{A }\). As we have remarked before, constructing an optimal acyclic matching—that is, a matching which minimizes \(m\)—is NP hard [15, 20]. Providing sharp bounds on optimal \(m\) values relative to \(n\) for arbitrary complexes would require major breakthroughs in algebraic topology as well as graph theory. Therefore, we leave \(m\) as a parameter.
 (3)The coboundary mass \(p\) of \(\mathcal{X }\) is defined aswhere \(\#\) denotes cardinality. Thus, the coboundary mass bounds the number of cells \(\eta \in \mathcal{X }\) which satisfy \(\xi \prec \eta \) for a given cell \(\xi \in \mathcal{X }\). Even though \(p\) may safely be bounded by \(n\), in most situations this is a gross overestimate. For example, the coboundary mass of a \(d\)dimensional cubical grid is bounded above by \(2d\) independent of the total number of cubes present.$$\begin{aligned} p = \sup _{\xi \in \mathcal{X }} ~\#\big \{\eta \in \mathcal{X }\mid \kappa (\eta ,\xi ) \ne 0\big \}, \end{aligned}$$
 (1)
we assume that adding, removing or locating a cell \(\xi \in \mathcal{X }\) incurs a constant cost, and
 (2)
we assume that ring operations in \(\mathbf{R}\) may be performed in constant time so that the cost of adding and scaling gradient chains is linear in the length of the chains involved.
5.3.2 Evaluating Complexity
We begin by evaluating the complexity of a single iteration of the outer while loop from Line 02 of MorseReduce. Assume that in this iteration the call to MakeCritical via Line 03 has returned a cell \(A^{\prime }\) of dimension \(d\). Since in each iteration of this while loop we add a cell to \(\mathsf{{Que}}\) at most once, the maximum size attainable by \(\mathsf{{Que}}\) is \(n\). Moreover, each \(\mathsf{{Que}}\) insertion involves testing the coboundary of a cell which requires at most \(p\) operations. In light of these bounds, we will just assume that the total cost of managing the \(\mathsf{{Que}}\) data structure within a single while iteration depends linearly on \(n\cdot p\) and we will not separately tabulate the cost of each \(\mathsf{{Que}}\) operation.
 (1)
The cost of calling UpdateGradientChain on a \(d\)dimensional cell equals \({{\mathrm{O}}}(p\cdot m_d)\). This follows from the fact that we must iterate over each cell \(\zeta \) in the remaining coboundary of \(\xi \) and update the gradient chain \(g(\zeta )\) which consists of \(d\)dimensional cells in \(\mathcal{A }\).
 (2)
A call to MakeCritical in Line 03 also costs \({{\mathrm{O}}}(p \cdot m_d)\), since the only nontrivial operation is the call to UpdateGradientChain in Line 03.
 (3)
In the worst case, the if statement from Line 03 of RemovePair always evaluates positively and hence UpdateGradientChain is called. Thus, each call to RemovePair also incurs a worst case cost of \({{\mathrm{O}}}(p\cdot m_d)\) since all other nontrivial operations only involve \(\mathsf{{Que}}\) insertion.
Proposition 5.3
Assume that MorseReduce is executed on a filtered complex \(\mathcal{X }\) of top dimension \(D\), size \(n\) and coboundary mass \(p\). If the resulting Morse complex \(\mathcal{A }\) has size \(m = m_0 + \cdots +m_D\), then the worstcase complexity is bounded by \({{\mathrm{O}}}(n \cdot p \cdot \tilde{m})\), where \(\tilde{m} = m_0^2 + \cdots + m_D^2\).
Thus, the cost of computing the maps induced on homology by inclusions \(\mathcal{X }^k \subset \mathcal{X }^{k+1}\) in the filtered complex \(\mathcal{X }\) over an arbitrary PID \(\mathbf{R}\) reduces from \({{\mathrm{O}}}(n^4)\) [12] to \({{\mathrm{O}}}(n \cdot p \cdot \tilde{m} + m^4)\) if MorseReduce is used as a preprocessor. In the special case when \(\mathbf{R}\) is a field, computing persistence intervals has complexity \({{\mathrm{O}}}(n^\omega )\) where \(\omega \) is the matrix multiplication exponent [21]. Therefore, the cost of computing the persistence intervals of \(\mathcal{X }\) after applying MorseReduce to \(\mathcal{X }\) equals \({{\mathrm{O}}}(n\cdot p\cdot \tilde{m} + m^\omega )\). In practice, persistence intervals are typically computed using the standard algorithm from [35] which has cubic complexity in the worst case [22]. Thus, using MorseReduce as a preprocessor for the standard algorithm lowers the overall complexity from \({{\mathrm{O}}}(n^3)\) to \({{\mathrm{O}}}(n\cdot p \cdot \tilde{m} + m^3)\). If \(m\) is much smaller than \(n\), then the \(n\cdot p \cdot \tilde{m}\) term is dominant in each case and one observes essentially linear cost in terms of the input size \(n\).
Remark 5.4

C1 Consider a complex \(\mathcal{X }\) such that each nonzero incidence \(\kappa (\xi ,\xi ^{\prime }) \in \mathbf{R}\) is not a unit for any pair of cells \(\xi ,\xi ^{\prime } \in \mathcal{X }\).

C2 Consider a complex \(\mathcal{X }\) so that whenever \(\kappa (\xi ,\xi ^{\prime }) \ne 0\) for cells \(\xi ,\xi ^{\prime }\) we have \(b(\xi ) \ne b(\xi ^{\prime })\). Since matched cells are required to have the same \(b\) values by Definition 4.1, no nontrivial matching is possible in this case.
 (1)
For both cubical and simplicial complexes all nonzero incidences are units \(\pm 1\) in any PID \(\mathbf{R}\), so C1 is avoided.
 (2)
The \(b\) values are only prescribed on topdimensional cells (such as grayscale pixel or voxel values for image data). In these situations, each lower dimensional cell recursively inherits its \(b\) value as the minimum \(b\) value encountered among its coboundary cells. This guarantees the existence of at least some cells \(\xi \prec \xi ^{\prime }\) with \(b(\xi ) = b(\xi ^{\prime })\) and avoids C2,
 (3)
The \(b\) values are inherited from lower dimensional cells. A prime example is the Vietoris–Rips complex built around point cloud data. Here each simplex inherits the maximum \(b\) value encountered in its \(1\)skeleton. Again, this process ensures the existence of dimensionally adjacent cells which share \(b\) values and hence avoids C2.
6 Experimental Results
 (1)
Coefficient Rings: ProcessLowerStars constructs the Morse complex over \(\mathbb{Z }/2\mathbb{Z }\) coefficients whereas MorseReduce may be applied to filtered cell complexes over an arbitrary PID.
 (2)
Complex Types: ProcessLowerStars requires a filtered cubical complex as input along with birth times provided only for top dimensional cells. The birth time for a lower dimensional cell is recursively inherited as the minimum birth found among all cells in its coboundary. On the other hand, MorseReduce is complexagnostic and does not impose any such topdown inheritance requirement. Moreover, ProcessLowerStars requires perturbing the input filtration so that no two topcells have the same birth time. This is unnecessary in MorseReduce even when dealing with 3D cubical data.
 (3)
Dimensions: MorseReduce is dimensionindependent whereas ProcessLowerStars, as written, requires a top dimension of \(3\).
Note that since the output of MorseReduce is a filtration in its own right, it is possible to iterate the algorithm until the number of reductions performed becomes essentially negligible. Thus, the cells output by an iteration of MorseReduce get further partitioned by the subsequent iteration and may get paired by the associated acyclic matching. We are not aware of any existing technique which allows for such iteration on arbitrary filtrations.
We demonstrate the results of MorseReduce on cubical grids, simplicial complexes, Vietoris–Rips complexes and movies. The cubical complexes come from sublevel sets of finite element Cahn–Hilliard simulations and the simplicial complexes arise from brain imaging data. The Vietoris–Rips complexes come from point clouds of experimental granular flow data. Our largest datasets by far, courtesy of M. Schatz, are two black and white movies obtained by segmenting Rayleigh–Bénard convection data, each successive frame consisting of about \(155,000\) three dimensional cubes.
The implementation of MorseReduce benchmarked here was coded in C++ using the standard template library and compiled using the GNU C++ compiler with optimization level O3. All computations were performed on an Intel Core i5 machine with 32 GB of available RAM and virtual memory disabled. The source code for our implementation is available at [26].
The comparison is with our implementation of the standard algorithm for computing persistent homology as found in [35] which we will denote by SP. While this algorithm may be found in various flavors and as part of the software package jPlex ^{2} or from the Dionysus project,^{3} the authors feel that the present comparison is fair because the same data structures are used in both cases. The SP results simply provide the time taken when no combinatorial Morse theoretic preprocessing is performed while holding all other implementationspecific factors constant. Thus, if more efficient implementations of SP exist, then we expect that preprocessing with MorseReduce will improve the performance of those implementations as well.
Experimental results
Type  Dim  # Frames  # Cells (M)  Red. # cells (K)  SP  MR + SP 

C  2  16  0.26  2.31  2.4  0.9 
C  3  25  1.24  3.35  12.7  4.2 
C  3  2,400  5.25  9.50  195.8  46.6 
S  5  20  12.86  27.13  73.9  8.8 
S  5  5,000  0.86  99.8  1951.6  153.1 
VR  2  100  2.34  86.33  1277.0  37.7 
VR  3  50  9.50  18.31  286.5  47.2 
VR  3  250  1.29  53.95  551.1  125.2 
M  3  209  259.21  1.25  DNF  7,213 
M  3  215  266.67  2.10  DNF  7,416 
The table is arranged as follows: the first column indicates the type of complexes in the filtration (Cubical, Simplicial, VietorisRips or Movie) while the second column contains the maximum dimension of the cells present in the filtration. The third column contains the number of frames \(K\) of each input filtration \(\mathcal{F }= \big \{\mathcal{X }^k\big \}_1^K\). The next two columns provide the size (in number of cells) of the filtration before and after Morse reduction. The penultimate column provides the time taken by our implementation of SP to compute persistence intervals over \(\mathbb{Z }/2\mathbb{Z }\) of the filtration, whereas the last column provides the total time taken to first apply MorseReduce and then compute the persistence intervals of the reduced filtration with SP. DNF indicates that the given algorithm failed to terminate because it ran out of memory. All times are in seconds.
A final note to illustrate the power and scalability of the Morse theoretic approach: the movie datasets were far too large to be held in memory all at once. Our approach involved storing about \(30\) frames at a time and removing paired cells from all but the last frame. This freed up considerable memory which we used to input the remaining portions of the movies in pieces, each comprising 30 consecutive frames. At each stage we left the last frame unreduced so that the next piece of the movie could be attached to it, and so on. In this way, extremely large and complicated persistence computations may be brought within the scope of commodity hardware. To the best of our knowledge, there is no other publically available technique which yields persistence intervals of a large filtration from such local computations without ever holding all the cells in memory at once.
Footnotes
 1.
In practice many more cells are also removed, since if \((\xi ,\eta )\) appear as a coreduction pair then they are excised during the RemovePair subroutine. Furthermore, observe that their appearance as a coreduction pair is equivalent to their being part of a gradient path that descends to \(A\).
 2.
 3.
Notes
Acknowledgments
The authors were partially supported by NSF Grants DMS0915019 and CBI0835621 and by contracts from DARPA and AFOSR.
References
 1.Carlsson, G.: Topology and data. Bull. Am. Math. Soc. (N.S.) 46(2), 255–308 (2009)Google Scholar
 2.Chari, M.K.: On discrete Morse functions and combinatorial decompositions. In: Proc. Formal Power Series and Algebraic Combinatorics (Vienna) (1997). Discrete Math. 217(1–3), 101–113 (2000)Google Scholar
 3.CohenSteiner, D., Edelsbrunner, H., Harer, J.: Stability of persistence diagrams. Discrete Comput. Geom. 37(1), 103–120 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
 4.Delfinado, C.J.A., Edelsbrunner, H.: An incremental algorithm for Betti numbers of simplicial complexes on the \(3\)sphere: Grid generation, finite elements, and geometric design. Comput. Aided Geom. Des. 12(7), 771–784 (1995)Google Scholar
 5.Dumas, J.G., Heckenbach, F., Saunders, D., Welker, V.: Computing simplicial homology based on efficient Smith normal form algorithms. In: Joswig, M., Takayama, N. (eds.) Algebra, Geometry, and Software Systems, pp. 177–206. Springer, Berlin (2003)Google Scholar
 6.Edelsbrunner, H., Harer, J.: Persistent homology—a survey. In: Surveys on Discrete and Computational Geometry, Contemporary Mathematics, vol. 453, pp. 257–282. American Mathematical Society, Providence (2008)Google Scholar
 7.Edelsbrunner, H., Harer, J.L.: Computational Topology. American Mathematical Society, Providence, RI. An introduction (2010)Google Scholar
 8.Edelsbrunner, H., Letscher, D., Zomorodian, A.: Topological persistence and simplification. In: Discrete and Computational Geometry and Graph Drawing (Columbia, SC) (2001). Discrete Comput. Geom. 28(4), 511–533 (2002)Google Scholar
 9.Forman, R.: Morse theory for cell complexes. Adv. Math. 134, 90–145 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
 10.Ghrist, R.: Barcodes: the persistent topology of data. Bull. Am. Math. Soc. (N.S.), 45(1), 61–75 (2008)Google Scholar
 11.Gunther, D., Reininghaus, J., Wagner, H., Hotz, I.: Memory efficient computation of persistent homology for 3d image data using discrete Morse theory. In: Proceedings of Conference on Graphics, Patterns and Images, 24 (to appear)Google Scholar
 12.Hafner, J., McCurley, K.: Asymptotically fast triangularization of matrices over rings. SIAM J. Comput. 20(6), 1068–1083 (1991)MathSciNetzbMATHCrossRefGoogle Scholar
 13.Harker, S., Mischaikow, K., Mrozek, M., Nanda, V., Wagner, H., Juda, M., Dlotko, P.: The efficiency of a homology algorithm based on discrete Morse theory and coreductions. In: Proceedings of the 3rd International Workshop on Computational Topology in Image Context, Image A, vol. 1, pp. 41–47 (2010)Google Scholar
 14.Harker, S., Mischaikow, K., Mrozek, M., Nanda, V.: Discrete Morse theoretic algorithms for computing homology of complexes and maps. Found. Comput. Math. (2013). doi: 10.1007/s1020801391450 Google Scholar
 15.Joswig, M., Pfetsch, M.: Computing optimal Morse matchings. SIAM J. Discrete Math 20(1), 11–25 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
 16.Kozlov, D.: Discrete Morse theory for free chain complexes. C. R. Math. 340, 867–872 (2005)zbMATHCrossRefGoogle Scholar
 17.Kozlov, D.: Combinatorial Algebraic Topology. Algorithms and Computation in Mathematics, vol. 21. Springer, Berlin (2008)Google Scholar
 18.Lefschetz, S.: Algebraic Topology. American Mathematical Society Colloquium Publications, vol. 27. American Mathematical Society, New York (1942)Google Scholar
 19.Lewiner, T.: Geometric discrete Morse complexes. Ph.D. Dissertation. Department of Mathematics, PUCRio. http://thomas.lewiner.org/pdfs/tomlew_phd_puc.pdf (2005)
 20.Lewiner, T., Lopes, H., Tavares, G.: Toward optimality in discrete Morse theory. Exp. Math. 12(3), 271–286 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
 21.Milosavljevic, N., Morozov, D., Skraba, P.: Zigzag persistent homology in matrix multiplication time. In: Proceedings of the 27th Annual ACM Symposium on Computational Geometry (SCG’11), pp. 216–225, Paris (2011)Google Scholar
 22.Morozov, D.: Persistence Algorithm Takes Cubic Time in the Worst Case. BioGeometry News, Department of Computer Science, Duke University, Durham (2005)Google Scholar
 23.Mrozek, M., Batko, B.: The coreduction homology algorithm. Discrete Comput. Geom. 41, 96–118 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
 24.Mrozek, M., Wanner, T.: Coreduction homology algorithm for inclusions and persistent homology. Comput. Math. Appl. 60(10), 2812–2833 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
 25.Munkres, J.R.: Elements of Algebraic Topology. The Benjamin/Cummings Publishing Company, Inc., Menlo Park (1984)zbMATHGoogle Scholar
 26.Perseus, the Persistent Homology Software. http://www.math.rutgers.edu/vidit/perseus
 27.Robins, V., Wood, P.J., Sheppard, A.P.: Theory and algorithms for constructing discrete Morse complexes from grayscale digital images. IEEE Trans. Pattern Anal. Mach. Intell., 1–14 (2010)Google Scholar
 28.Saunders, B.D., Wan, Z.: Smith normal form of dense integer matrices, fast algorithms into practice. In: The International Symposium on Symbolic and Algebraic Computation, pp. 274–281 (2004)Google Scholar
 29.Spanier, E.H.: Algebraic Topology. McGrawHill Book Co., New York (1966)zbMATHGoogle Scholar
 30.Storjohann, A.: Nearly optimal algorithms for computing Smith normal forms of integer matrices, vol. 96. In: Proceedings of ISSAC, pp. 267–274 (1996)Google Scholar
 31.Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13, 354–356 (1969)MathSciNetzbMATHCrossRefGoogle Scholar
 32.Tucker, A.W.: Cell spaces. Ann. Math. 37, 92–100 (1936)CrossRefGoogle Scholar
 33.Wagner, H., Chen, C., Vucini, E.: Efficient computation of persistent homology for cubical data. In: Topological Methods in Data Analysis and Visualization II, pp. 91–106 (2012)Google Scholar
 34.Zomorodian, A.: Topology for Computing, Cambridge Monographs on Applied and Computational Mathematics, vol. 16. Cambridge University Press, Cambridge, MA (2005)Google Scholar
 35.Zomorodian, A., Carlsson, G.: Computing persistent homology. Discrete Comput. Geom. 33(2), 249–274 (2005)Google Scholar