Ripser: efficient computation of Vietoris-Rips persistence barcodes

We present an algorithm for the computation of Vietoris-Rips persistence barcodes and describe its implementation in the software Ripser. The method relies on implicit representations of the coboundary operator and the filtration order of the simplices, avoiding the explicit construction and storage of the filtration coboundary matrix. Moreover, it makes use of apparent pairs, a simple but powerful method for constructing a discrete gradient field from a total order on the simplices of a simplicial complex, which is also of independent interest. Our implementation shows substantial improvements over previous software both in time and memory usage.


Introduction
Persistent homology is a central tool in computational topology and topological data analysis. It captures topological features of a filtration, a growing one-parameter family of topological spaces, and tracks the lifespan of those features throughout the parameter range in the form of a collection of intervals called the persistence barcode. One of the most common constructions for a filtration from a geometric data set is the Vietoris-Rips complex, which is constructed from a finite metric space by connecting any subset of the points with diameter bounded by a specified threshold with a simplex.
The predominant approach to persistence computation consist of two steps: the construction of a filtration boundary matrix, and the computation of persistence barcodes using a matrix reduction algorithm similar to Gaussian elimination, which provides a decomposition of the filtered chain complex into indecomposable summands [1]. Among the fastest codes for the matrix reduction step is PHAT [5], which has been created with the goal of assessing and understanding the relation and interplay of the various optimizations proposed in the previous literature on the matrix reduction algorithm. In the course of that project, it became evident that often the construction of the filtration boundary matrix becomes the bottleneck for the computation of Vietoris-Rips barcodes.
The approach followed in Ripser [2] is to avoid the construction and storage of the filtration boundary matrix as a whole, discarding and recomputing parts of it when necessary. In particular, instead of representing the coboundary map explicitly by a matrix data structure, it is given only algorithmically, recomputing the coboundary of a simplex whenever needed. The filtration itself is also not specified explicitly but only algorithmically, via a method for comparing simplices with respect to their appearance in the filtration order, together with a method for computing the cofacets of a given simplex and their

Preliminaries
Simplicial complexes and filtrations Given a finite set , an (abstract) simplex on is simply a nonempty subset ⊆ . The dimension of is one less than its cardinality, dim = | | − 1. Given two simplices ⊆ , we say that is a face of , and that is a coface of . If additionally dim +1 = dim , we say that is a facet of (a face of codimension 1), and that is a cofacet of .
A finite (abstract) simplicial complex is a collection of simplices that is closed under the face relation: if ∈ and ⊆ , then ∈ . The set is called the vertices of , and the subsets in are called simplices. A subcomplex of is a subset ⊆ that is itself a simplicial complex.
Given a finite simplicial complex , a filtration of is a collection of subcomplexes ( ) ∈ of , where is a totally ordered indexing set, such that ≤ implies ⊆ . In particular, for a finite metric space ( , ), represented by a symmetric distance matrix, the Vietoris-Rips complex at scale ∈ R is the abstract simplicial complex Vietoris-Rips complex were first introduced by Vietoris [50] as a means of defining a homology theory for general compact metric spaces, and later used by Rips in the study of hyperbolic groups (see [18]). Their usage in topological data analysis was pioneered by Silva and Carlsson [44], foreshadowed by results of Hausmann [19] and Latschev [29] on sampling conditions for recovering the homotopy type of a Riemannian manifold from a Vietoris-Rips complex. Letting the scale parameter vary, the resulting filtration, indexed by = R, is a filtration of the full simplex Δ( ), called the Vietoris-Rips filtration. For this paper, other relevant indexing sets besides the real numbers R are the set of distances { ( , ) | , ∈ } in a finite metric space ( , ), and the set of simplices of Δ( ) equipped with an appropriate total order refining the order by simplex diameter, as explained later.
We call a filtration essential if ≠ implies ≠ . A simplexwise filtration of is a filtration such that for all ∈ with ≠ ∅, there is some simplex ∈ and some index < ∈ such that \ = { }. In an essential simplexwise filtration, the index is the predecessor of in . Thus, essential simplexwise filtrations correspond bĳectively to total orders extending the face poset of , up to isomorphism of the indexing set . In particular, in this case we often identify the indexing set with the set of simplices. If a simplex appears earlier in the filtration than another simplex , i.e., ∈ whenever ∈ , we say that is younger than , and is older than .
It is often convenient to think of a simplicial filtration as a diagram • : → Simp of simplicial complexes indexed over some finite totally ordered set , such that all maps → in the diagram (with ≤ ) are inclusions. In terms of category theory, • is a functor.
Reindexing and refinement of filtrations A reindexing of a filtration • : → Simp indexed over some totally ordered set is another filtration • : → Simp such that = ( ) for some monotonic map : → , called reindexing map. If there is a complex that does not occur in the filtration • , we say that • refines • .
As an example, the filtration Rips • ( ) is indexed by the real numbers R, but can be condensed to an essential filtration • , indexed by the finite set of pairwise distances of . In order to compute persistent homology, one needs to apply one further step of reindexing, refining the filtration to an essential simplexwise one, as described in detail later.
Sublevel sets of functions A function : → R on a simplicial complex is monotonic if ⊆ ∈ implies ( ) ≤ ( ). For any ∈ R, the sublevel set −1 (−∞, ] of a monotonic function is a subcomplex. The sublevel sets form a filtration of indexed over R. Clearly, any finite filtration • : → Simp of simplicial complexes can be obtained as a reduction of some sublevel set filtration. In particular, the Vietoris-Rips filtration is simply the sublevel set filtration of the diameter function. Discrete Morse theory [17] studies the topology of sublevel sets for generic functions on simplicial complexes. A discrete vector field on a simplicial complex is a partition of into singleton sets and pairs { , } in which is a facet of . We call such a pair a facet pair. A monotonic function : → R is a discrete Morse function if the facet pairs { , } with ( ) = ( ) generate a discrete vector field , which is then called the discrete gradient of . A simplex that is not contained in any pair of is called a critical simplex, and the corresponding value is a critical value of .

Persistent homology
In this paper, we only consider simplicial homology with coefficients in a prime field F , and write * ( ) as a shortcut for * ( ; F ). Applying homology to a filtration of finite simplicial complexes • : → Simp yields another diagram * ( • ) : → Vect of finite dimensional vector spaces over F , often called a persistence module [7].
If all vector spaces have finite dimension, such diagrams have a particularly simple structure: they decompose into a direct sum of interval persistence modules, consisting of copies of the field F connected by the identity map over an interval range of indices, and the trivial vector space outside the interval [11,53]. This decomposition is unique up to isomorphism, and the collection of intervals describing the structure, the persistence barcode, is therefore a complete invariant of the isomorphism type, capturing the homology at each index of the filtration together with the maps connecting any two different indices. In fact, a corresponding decomposition exists already on the level of filtered chain complexes [1], and this decomposition is constructed by algorithms for computing persistence barcodes.
If • is an essential filtration and [ , ) ⊆ is an interval in the persistence barcode of • , then we call a birth index, a death index, and the pair ( , ) an index persistence pair. Moreover, if [ , ∞) is an interval in the persistence barcode of , we say that is an essential (birth) index. For an essential simplexwise filtration • , the indies are in bĳection with the simplices, and so in this context we also speak about birth, death, and essential simplices, and we consider pairs of simplices as persistence pairs. If • is a reindexing of a sublevel set filtration for a monotonic function , we say that the pair ( , ) has persistence ( ) − ( ).
Note that this is a direct consequence of the fact that the two filtrations • , • , the reindexing map , and homology * are functors, and composition of functors is associative.
If the reindexing map is not surjective, the persistence barcode of the reindexed filtration • may contain intervals that do not correspond to intervals in the barcode of • . The preimage −1 [ , ) ⊆ of an interval [ , ) ⊆ in the persistence barcode of • is then either empty, in which case we call ( , ) a zero persistence pair; if • is the sublevel set filtration of , this is the case if and only if ( ) = ( ).
is an interval of the persistence barcode for • , and all such intervals arise this way. We summarize: Filtration boundary matrices Given a simplicial complex with a totally ordered set of vertices , there is a canonical basis of the simplicial chain complex * ( ), consisting of the simplices oriented according to the specified total order. A simplexwise filtration turns this into an ordered basis and gives rise to a filtration boundary matrix, which is the matrix of the boundary operator of the chain complex * ( ) with respect to that ordered basis. We may consider boundary matrices both for the combined boundary map * : * → * as well as for the individual boundary maps : → −1 in each dimension . Generalizing the latter case, we say that a matrix with column indices ⊂ and row indices −1 ⊂ is a filtration -boundary matrix for a simplexwise filtration • : → Simp if for each ∈ , the columns of with indices ≤ form a generating set of the ( − 1)-boundaries −1 ( ). This allows us to remove columns from a boundary matrix that are linear combinations of the previous columns, a strategy called clearing that is discussed in Section 3.2.
Indexing simplices in the combinatorial number system We now describe the combinatorial number system [25,41], which provides a way of indexing the simplices of the full simplex Δ( ) and of the Vietoris-Rips filtration Rips • ( ) by natural numbers, and which has previously been employed for persistence computation in [4]. Again, we assume a total order on the vertices = { 0 , . . . , −1 } of the filtration. Using this order, we identify each -simplex with the sorted ( + 1)-tuple of its vertex indices ( , . . . , 0 ) in decreasing order > · · · > 0 . This induces a lexicographic order on the set of -simplices, which we refer to as the colexicographic vertex order. The combinatorial number system of order + 1 is the order-preserving bĳection  Note that for > the convention = 0 is used here. As an example, the simplex { 5 , 3 , 0 } is assigned the number (5, 3, 0) ↦ → 5 3 + 3 2 + 0 1 = 10 + 3 + 0 = 13. Conversely, if a -simplex with vertex indices ( , . . . , 0 ) has index in the combinatorial number system, the vertices of can be obtained by a binary search, as described in Section 4.

Lexicographic refinement of the Vietoris-Rips filtration
We now describe an essential simplexwise refinement of the Vietoris-Rips filtration, as required for the computation of persistent homology. To this end, we consider another lexicographic order on the simplices of the full simplex Δ( ) with vertex set , given by ordering the simplices • by diameter, • then by dimension, • then by reverse colexicographic vertex order.
We will refer to the simplexwise filtration resulting from this total order as the lexicographically refined Vietoris-Rips filtration. The choice of the reverse colexicographic vertex order has algorithmic advantages, explained in Section 4.

Computation
In this section, we explain the algorithm for computing persistent homology implemented in Ripser, and discuss the various optimization employed to achieve an efficient implementation.

Matrix reduction
The prevalent approach to computing persistent homology is by column reduction [10] of the filtration boundary matrix. We write to denote the th column of a matrix . The pivot index of , denoted by Pivot , is the largest row index of any nonzero entry, taken to be 0 if all entries of are 0. Otherwise, the corresponding nonzero entry is called the pivot entry, denoted by PivotEntry . We define Pivots = Pivot \ {0}. A column is called reduced if Pivot cannot be decreased using column additions by scalar multiples of columns with < . Equivalently, Pivot is minimal among all pivot indices of linear combinations ≤ with ≠ 0, meaning that multiplication from the right by a regular upper triangular matrix leaves the pivot index of the column unchanged: Pivot = Pivot ( ) . In particular, a column is reduced if either = 0 or all columns with < are reduced and satisfy Pivot ≠ Pivot . A matrix is called reduced if all of its columns are reduced. The following proposition forms the basis of matrix reduction algorithms for computing persistent homology. A basis for the filtered chain complex that is compatible with both the filtration and the boundary maps is given by the chains { | is a death index} ∪ { | is a death index} ∪ { | is an essential index}, determining a direct sum decomposition of * ( ) into elementary chain complexes of the form for each death index and · · · → 0 → → 0 → . . .
for each essential index . Taking intersections with the filtration * ( • ), we obtain elementary filtered chain complexes, in which is a cycle appearing in the filtration at index = Pivot and becoming a boundary when enters the filtration at index , and in which an essential cycle enters the filtration at index . The persistent homology is thus generated by the representative cycles { | is a death index} ∪ { | is an essential index}, in the sense that, for all indices ∈ , the homology * ( ) has a basis generated by the cycles and for all pairs of indices , ∈ with ≤ , the image of the map in homology * ( ) → * ( ) induced by inclusion has a basis generated by the cycles An algorithm for computing the matrix reduction = · is given below as Algorithm 1. It can be applied either to the entire filtration boundary matrix in order to compute persistence in all dimensions at once, or to the filtration -boundary matrix, resulting in the persistence pairs of dimensions ( − 1, ) and the essential indices of dimension . This algorithm appeared for the first time in [10], rephrasing the original algorithm for persistent homology [16] as a matrix algorithm. An algorithmic advantage to the decomposition algorithm described in [1] is that it does not require any row operations.
Note that the column operations involving the matrix are often omitted if the goal is to compute just the persistence pairs and representative cycles are not required. In Section 3.4, we will use the matrix nevertheless to implicitly represent the matrix = · .
Typically, reducing a column at a birth index tends to be significantly more expensive than one with a death index. This observation can be explained using the time complexity analysis for the matrix reduction algorithm given in [15,Section VII.2]: the reduction of a column for a -simplex with death index and corresponding birth index requires at most ( + 1) ( − ) 2 steps, while the reduction of a column with birth index requires at most ( + 1) ( − 1) 2 steps. Typically, the index persistence ( − ) is quite small, while the reduction of birth columns indeed becomes expensive for large birth indices . The next subsection describes a way to circumvent these birth column reductions whenever possible.

Clearing inessential birth columns
An optimization to the matrix reduction algorithm, due to Chen and Kerber [8], is based on the observation that the computation of persistence pairs according to Proposition 3.1 does not involve any column with an inessential birth index. Reducing those columns to zero is therefore unnecessary, and avoiding their reduction can lead to dramatic improvements in running time. The clearing optimization simply sets a column to 0 if is the pivot index of another column, = Pivot .
As proposed in [8], clearing yields only the reduced matrix , and the method is described in the survey by Morozov and Edelsbrunner [35] as incompatible with the computation of the reduction matrix . In fact, however, the clearing optimization can actually be extended to also obtain the reduction matrix, which plays a crucial role in our implementation. To see this, consider an inessential birth index = Pivot , corresponding to a cleared column. Obtaining the requisite full rank upper triangular

Algorithm 1 Matrix reduction and persistence pairs
Require: : × filtration boundary matrix (with row indices and column indices ) Ensure: : full rank upper triangular × matrix, = · : reduced matrix, : persistence pairs, : essential indices := ∅ for ∈ in increasing order do := := while there exist < with Pivot = Pivot do := PivotEntry /PivotEntry := − · ⊲ For implicit matrix reduction (Section 3.4): append to end if end for return , , , reduction matrix requires an appropriate column such that · = = 0, i.e., is a cycle. It suffices to simply take = ; by construction, this column is a boundary, and since Pivot = , the resulting matrix will be full rank upper triangular.
Note that when removing columns with birth indices from a filtration -boundary matrix, the remaining columns remain a generating set for the boundaries as required by the definition. The resulting method is summarized in Algorithm 2.
The birth indices of persistence pairs ( , ) are identified with the reduction of column . In order to ensure that this happens before the algorithm arrives at column , so that the column is cleared already before it would get reduced, the matrices for the boundary maps are reduced in order of decreasing dimension = ( + 1), . . . , 1. For each index persistence pair ( , ) computed in the reduction of the boundary matrix for , the corresponding column for index can now be removed from the boundary matrix for −1 . Note however that computing persistent homology in dimensions 0 ≤ ≤ still requires the reduction of the full boundary matrix +1 . This can become very expensive, especially if there are many ( +1)-simplices, as in the case of a Vietoris-Rips filtration. In this setting, the complex is the ( + 1)-skeleton of the full simplex on vertices. The standard matrix reduction algorithm for persistent homology requires the reduction of one column per simplex of dimension 1 ≤ ≤ + 1, amounting to columns in total. Note that dim −1 ( ) equals the number of death columns and dim ( ) equals the number of birth columns in the -boundary matrix. As an example, for = 2, = 192 we obtain 56 050 096 columns, of which 1 161 471 are death columns and 54 888 625 are birth columns. Using the

Algorithm 2 Matrix reduction and persistence pairs with clearing
Require: : × filtration -boundary matrix (with row indices and column indices ), : reduced filtration ( + 1)-boundary matrix, : persistence pairs of dimensions ( , + 1) Ensure: : full rank upper triangular × matrix, = · : reduced filtration -boundary matrix, : persistence pairs of dimensions ( − 1, ), : essential indices of dimension := \ Pivots := × submatrix of Apply Algorithm 1 to reduce to = · and obtain the persistence pairs and the essential indices Extend to a × matrix by filling in zeros Extend to a × matrix by filling in zeros for ( , ) ∈ do := end for return , , , clearing optimization, this number is lowered to

Persistent Cohomology
The clearing optimization can be used to a much greater effect by computing persistence barcodes using cohomology instead of homology of Vietoris-Rips filtrations. As noted by de Silva et al. [13], for a filtration • of a simplicial complex the persistence barcodes for homology * ( • ) and cohomology * ( • ) coincide, since for coefficients in a field, cohomology is a vector space dual to homology [37], and the barcode of persistent homology (and more generally, of any pointwise finite-dimensional persistence module) is uniquely determined by the ranks of the internal linear maps in the persistence module, which are preserved by vector space duality.
The filtration of chain complexes * ( • ) gives rise to a diagram of cochain complexes ( • ), with reversed order on the indexing set. Since cohomology is a contravariant functor, the morphisms in this diagrams are however surjections instead of injections. To obtain a setting that is suitable for our reduction algorithms, we instead consider the filtration of relative cochain complexes ( , • ). The filtration coboundary matrix for : ( , • ) → +1 ( , • ) is given as the transpose of the filtration boundary matrix with rows and columns ordered in reverse filtration order [13]. The persistence barcodes for relative cohomology * ( , • ) uniquely determine those for absolute cohomology * ( • ) (and coincide with the respective homology barcodes by duality). This correspondence can be seen as a consequence of the fact that the short exact sequence of cochain complexes of persistence modules 0 → * ( , • ) → * ( ) → * ( • ) → 0 (where * ( ) is interpreted as a complex of constant persistence modules) gives rise to a long exact sequence in cohomology, which can be seen to split at ( , • ) and at +1 ( • ), with im * as the summand corresponding to the bounded intervals in either barcode [3]. Now the persistence pairs ( , ) of dimensions ( , − 1) for relative cohomology ( , ) correspond to persistence pairs ( , ) of dimensions ( − 1, ) for (absolute) homology −1 ( ) in one dimension below, i.e., a death index becomes a dual non-essential birth index and vice versa, while the essential birth indices for ( , ) remain essential birth indices for ( ) in the same dimension. Thus, the persistence barcode can also be computed by matrix reduction of the filtration coboundary matrix. Note that this is equivalent to row reduction of the filtration boundary matrix, reducing the rows from bottom to top.
Since the coboundary map increases the degree, in order to apply the clearing optimization described in Section 3.2, the filtration -coboundary matrices are now reduced in order of increasing dimension using Algorithm 2. This yields the relative persistence pairs of dimensions ( + 1, ) for +1 ( , ), corresponding to the absolute persistence pairs of dimensions ( , + 1) for ( ), and the essential indices of dimension . This is the approach used in Ripser.
The crucial advantage of using cohomology to compute the Vietoris-Rips persistence barcodes in dimensions 0 ≤ ≤ lies in avoiding the expensive reduction of columns whose birth indices correspond to ( + 1)-simplices, as discussed in Section 3.2. To illustrate the difference, we first consider cohomology without clearing. Note that for persistent cohomology, the number of column reductions performed by the standard matrix reduction (Algorithm 1) is again, for the ( + 1)-skeleton of the full simplex on vertices with = 2, = 192, this amounts to 1 179 808 columns, of which 1 161 471 are death columns and 18 337 are birth columns. While this number is significantly smaller than for homology, for small values of the number of rows of the coboundary matrix, +1 , is much larger than that of the boundary matrix, , and thus the reduction of birth columns becomes prohibitively expensive in practice. Consequently, reducing the coboundary matrix without clearing has not been observed as more efficient in practice than reducing the boundary matrix [5]. However, in conjunction with the clearing optimization, only columns remain to be reduced; for = 2, = 192 we get 1 161 472 = 1 161 471 + 1 columns, of which 1 161 471 are death columns and only one is a birth column, corresponding to the single essential class in dimension 0. In addition, typically a large fraction of the death columns will be reduced already from the beginning, as observed in Section 5. Thus, in practice, the combination of clearing and cohomology drastically reduces the number of columns operations in comparison to Algorithm 1.

Implicit matrix reduction
The matrix reduction algorithm can be modified slightly to yield a variant in which only the reduction matrix is represented explicitly in memory. The columns of the coboundary matrix are computed on the fly instead, by a method that enumerates the nonzero entries in a given column of in reverse colexicographic vertex order of the corresponding rows. Specifically, using the combinatorial number system to index the simplices on the vertex set {0, . . . , − 1}, the cofacets of a simplex can be enumerated efficiently, as described in more detail in Section 4. The matrix = · is now determined implicitly by and . During the execution of the algorithm, only the current column on which additions are performed is kept in memory. For all previously reduced columns , with < , only the pivot index and its entry, Pivot and PivotEntry , are stored in memory. Whenever needed in the algorithm, those columns are recomputed on the fly as = · (see Algorithm 1). Note that the extension of clearing to the reduction matrix described in Section 3.2 is crucial for an efficient implementation of implicit matrix reduction.
To further decrease the memory usage, we apply another minor optimization. Note that in the matrix reduction algorithm Algorithm 1, only a death index ( = · ≠ 0) may satisfy the condition Pivot = Pivot in Algorithm 1. Hence, only columns with a death index are used later in the computation to eliminate pivots. Consequently, our implementation does not actually maintain the entire matrix , but only stores the columns of with a death index. In other words, it does not store explicit generating cocycles for persistent cohomology, but only their cobounding cochains.

Apparent pairs
We now discuss a class of persistence pairs that can be identified in a Vietoris-Rips filtration directly from the boundary matrix without reduction, and often even without actually enumerating all cofacets, i.e., without entirely constructing the corresponding columns of the coboundary matrix. The columns in question are already reduced in the coboundary matrix, and hence remain unaffected by the reduction algorithm. Most relevant to us are persistence pairs that have persistence zero with respect to the original filtration parameter, meaning that they arise only as an artifact of the lexicographic refinement and do not contribute to the Vietoris-Rips barcode itself. In practice, most of the pairs arising in the computation of Vietoris-Rips persistence are of this kind (see Section 5, and also [51, Section 4.2]). Equivalently, all entries in the filtration boundary matrix of • below or to the left of ( , ) are 0.
The notion applies to any simplexwise filtration • , which may arise as a simplexwise refinement of some coarser filtration • , such as our lexicographic refinement of the Vietoris-Rips filtration. As an example, consider the Vietoris-Rips filtration on the vertices of a rectangle from Section 2. The filtration boundary matrices are    Note that in this example, every zero persistence pair is an apparent pair. Apparent pairs provide a connection between persistence and discrete Morse theory [17]. The collection of all apparent pairs pairs of a simplexwise filtration • will simultaneously constitute a subset of the persistence pairs (Lemma 3.3) and form a discrete gradient in the sense of discrete Morse theory (Lemma 3.5).

Apparent pairs as persistence pairs
As an immediate consequence of the definition of an apparent pair, we get the following lemma. Proof. Since the entries in the filtration boundary matrix of • to the left of an apparent pair ( , ) are 0, the index of is the pivot of the column of in the filtration boundary matrix. Thus, the column of in the boundary matrix is already reduced from the beginning, and Proposition 3.1 yields the claim.

Remark 3.4.
Note that the property of being an apparent pair does not depend on the choice of the coefficient field. Indeed, the above statement holds for any choice of coefficient field for (co)homology. In that sense, the apparent pairs are universal persistence pairs.
Specifically, given an apparent pair ( , ), the chain complex · · · → 0 → → → 0 → . . . can easily be seen to yield an indecomposable summand of the filtered chain complex * ( • ; Z) with integer coefficients (by intersecting the chain complexes), and to generate an interval summand of persistent homology (as a diagram of Abelian groups indexed by a totally ordered set), and thus any other coefficients, by the universal coefficient theorem. The boundary enters the filtration simultaneously with , and the appearance of and in the filtration determine the endpoints of the resulting interval.

Apparent pairs as gradient pairs
The next two lemmas relate apparent pairs to discrete Morse theory. First, we show that apparent pairs are gradient pairs.

Lemma 3.5. The apparent pairs of a simplexwise filtration form a discrete gradient.
Proof. Let ( , ) be an apparent pair, with dim = . By definition, is uniquely determined by , and so cannot appear in another apparent pair ( , ) for any ( + 1)-simplex ≠ . We show that also does not appear in another apparent pair ( , ) for any ( − 1)-simplex . To see this, note that there is another -simplex ≠ that is also a facet of and a cofacet of . Since is assumed to be the youngest facet of , the simplex is older than . In particular, is not the oldest cofacet of , and so ( , ) is not an apparent pair. We conclude that no simplex appears in more than one apparent pair, i.e., the apparent pairs define a discrete vector field.
To show that this discrete vector field is a gradient, let 1 , . . . , be the simplices of in filtration order, and consider the function : ↦ → if there is an apparent pair ( , ), otherwise.
To verify that is a discrete Morse function, first note that ( ) ≤ . Now let be a facet of . Then < . If ( , ) is not an apparent pair, we have ( ) ≤ < = ( ). On the other hand, if ( , ) is an apparent pair, then is the youngest facet of , i.e., ≤ for every facet of , and thus ( ) ≤ ≤ = ( ), with equality holding if and only if = . We conclude that is a discrete Morse function whose sublevel set filtration is refined by and whose gradient pairs are exactly the apparent pairs of the filtration.
For the previous example of a simplexwise filtration obtained fron the vertices of a rectangle, the resulting discrete Morse function as constructed in the above proof is shown in the following table.
Note that there might be apparent pairs of nonzero persistence. In particular, starting with a discrete Morse function and constructing another discrete Morse function˜ from as in the proof of Lemma 3.5, the gradient pairs of form a subset of the gradient pairs of˜ . In particular, the nonzero persistence apparent pairs of will be gradient pairs of˜ , but not of .
Remark 3.7. The notion of apparent pairs generalize to the setting of algebraic Morse theory [23,26,45] in a straightforward way [28]. In this setting, one considers a finitely generated free chain complex of free modules over some ring , equipped with a fixed ordered basis Σ , which is assumed to induce a filtration by subcomplexes. In this setting, the facet and cofacet relations are defined by the more general condition that two basis elements ∈ Σ and ∈ Σ +1 have a nonzero coefficient in the boundary matrix for +1 . In addition, an algebraic apparent pair ( , ) is required to satisfy the additional condition that this boundary coefficient is a unit in . Similarly to the simplicial setting, in this case cannot appear in another apparent pair ( , ) for any facet of . To see this, note that again there has to be another basis element ≠ ∈ Σ that is also a facet of and a cofacet of ; if were the only such basis element, then the coefficient of ( , ) in the matrix of • +1 would be nonzero, which is excluded by the chain complex property. The above Lemmas 3.3, 3.5 and 3.6 therefore hold in this generalized setting as well. Note however that in this case, apparent pairs are no longer guaranteed to be universal persistence pairs, as the condition defining the apparent pairs now depends of the choice of coefficients.

Lexicographic discrete gradients
The construction proposed by Kahle [24] for a discrete gradient on a simplicial complex from a total vertex order can be understood as a special case of the apparent pairs gradient. The definition of the gradient is as follows. Consider the vertices 1 , . . . of the simplicial complex in some fixed total order. Whenever possible, pair a simplex = { , . . . 1 }, > · · · > 1 , with the simplex = { , . . . 1 , 0 } for which 0 < 1 is minimal. These pairs ( , ) form a discrete gradient [24], which we call the lexicographic gradient.
We illustrate how this gradient can be considered as a special case of our definition of apparent pairs for the lexicographic filtration of , where the simplices are ordered by dimension, and simplices of the same dimension are ordered lexicographically according to the chosen vertex order.

Lemma 3.8. The lexicographic gradient is the apparent pairs gradient of the lexicographic filtration.
Proof. To see that any pair ( , ) in is an apparent pair, observe that 0 is chosen such that is the lexicographically smallest cofacet of . Moreover, is clearly the lexicographically largest facet of .

Vietoris-Rips filtrations
Having discussed the Morse-theoretic interpretation of apparent pairs, we now illustrate their relevance for the computation of persistence. Focusing on the lexicographically refined Rips filtration, we first describe the apparent pair of persistence zero in a way that is suitable for computation. Proof. The two conditions hold for any zero persistence apparent pair by definition. It remains to show that the two conditions also imply that ( , ) is an apparent pair. Recall that in the lexicographically refined Rips filtration, simplices are sorted by diameter, then by dimension, and then in (reverse) lexicographic order. The simplex must be the oldest cofacet of in the filtration order: no cofacet of can have a diameter less than diam = diam , and among the cofacets of with the same diameter as , the lexicographically maximal cofacet is the oldest one by construction of the lexicographic filtration order. Similarly, must be the oldest cofacet of in the filtration order.
As it turns out, a large portion of all persistence pairs arising in the persistence computation for Rips filtrations can be found this way, and the savings from not having to enumerate all cofacets are substantial. The following theorem gives a partial explanation of this observation, for generic finite metric spaces and persistence in dimension 1.
To set the stage, note that in a a simplexwise refinement of a Vietoris-Rips filtration, any apparent pair, and more generally, any emergent facet pair ( , ) of dimensions ( , + 1) with ≥ 1 necessarily has persistence zero, as the diameter of equals the maximal diameter of its facets, which by definition is the diameter of . The following theorem establishes a partial converse to this fact in dimension 1 and under a certain genericity assumption on the metric space. Theorem 3.10. Let ( , ) be a finite metric space with distinct pairwise distances, and let • be a simplexwise refinement of the Vietoris-Rips filtration for . Among the persistence pairs of • in dimension 1, the zero persistence pairs are precisely the apparent pairs.
Proof. As noted above, every apparent pair is a zero persistence pair. Conversely, let ( , ) be a zero persistence pair of dimensions (1, 2). Since the edge diameters are assumed to be distinct, the edge must be the youngest facet of in • . Now let be the oldest cofacet of . We then have diam( ) ≤ diam( ) ≤ diam( ), and since ( , ) is a zero persistence pair, this implies diam( ) = diam( ) = diam( ). Since pairwise distances are assumed to be distinct, any edge ⊂ , ≠ must satisfy diam( ) < diam( ). This implies that has to be the youngest facet of . Hence, ( , ) is an apparent pair. But since ( , ) is assumed to be a persistence pair, we conclude with Lemma 3.3 that = , and so ( , ) is an apparent pair.
This result implies that every column in the filtration 1-coboundary matrix that is not amenable to the emergent shortcut described in Section 3.5 actually corresponds to a proper interval in the Vietoris-Rips barcode. In other words, the zero persistence pairs of the simplexwise refinement do not incur any computational cost in the matrix reduction.

Emergent persistence pairs
The definition of apparent pairs generalizes to the persistence pairs in a simplexwise filtration that become apparent in the reduced matrix during the matrix reduction, and are therefore called emergent pairs. They come in two flavors. Definition 3.11. Consider a simplexwise filtration ( ) ∈ of a finite simplicial complex . A persistence pair ( , ) of this filtration is an emergent facet pair if is the youngest facet of , and an emergent cofacet pair if is the oldest cofacet of .
In other words, ( , ) is an emergent facet pair if the column of in the filtration boundary matrix is reduced, and an emergent cofacet pair if the column of in the coboundary matrix is reduced. Thus, the columns corresponding to emergent pairs are precisely the ones that are unmodified by the matrix reduction algorithm.
Note that ( , ) is an apparent pair if and only if it is both an emergent facet pair and an emergent cofacet pair. In contrast to the notion of an apparent pair, however, the property of being an emergent pair does depend on the choice of the coefficient field. Proof. Similar to Proposition 3.9, both conditions hold for any zero persistence emergent cofacet pair, and it remains to show that the conditions imply that ( , ) is an emergent cofacet pair. Again, the first condition implies that is the oldest cofacet of in the filtration order. Now if is not paired with some younger simplex , we conclude from Proposition 3.1 that ( , ) forms a persistence pair, which is then an emergent cofacet pair.

Shortcuts for apparent and emergent pairs
In practice, zero persistence apparent and emergent pairs provide a way to identify persistence pairs ( , ) without even enumerating all cofacets of the simplex , and thus without fully constructing the coboundary matrix for the reduction algorithm.
We first describe how to determine whether a simplex appears in an apparent pair ( , ). Following Proposition 3.9, enumerate the cofacets of in reverse lexicographic order until encountering the first cofacet with the same diameter as . Subsequently, enumerate the facets of in forward lexicographic order, until encountering the first facet of with the same diameter as . Now ( , ) is an apparent pair if and only if that facet equals . No further cofacets of need to be enumerated; this is the mentioned shortcut. This way, we can determine for a given simplex whether it appears in an apparent pair ( , ), and identify the corresponding cofacet . By an analogous strategy, we can also determine whether a simplex appears in an apparent pair ( , ), and identify the corresponding facet .
For the emergent pairs, recall that the columns of the coboundary filtration matrix are processed in reverse filtration order, from youngest to oldest simplex. Assume that is the filtration -coboundary matrix, corresponding to the coboundary operator : ( ) → +1 ( ). Let be the current simplex whose column is to be reduced, i.e., the columns of the matrix = · corresponding to younger simplices are already reduced, while the current column is still unmodified, containing the coboundary of . Following Proposition 3.12, enumerate the cofacets of the simplex in reverse lexicographic order. When we encounter the first cofacet with the same diameter as , we know that is the youngest cofacet of . Equivalently, the index of is the pivot index of the column for in the coboundary matrix. Up to this point, all persistence pairs ( , ) with simplices younger than have been identified already, since the algorithm processes simplices from youngest to oldest. Thus, if has not previously been paired with any simplex, we conclude from Proposition 3.1 that ( , ) is an emergent cofacet pair. Again, no further cofacets of need to be enumerated, shortcutting the construction of the coboundary column for .

Implementation
We now discuss the main data structures and the relevant implementation details of Ripser, the C++ implementation of the algorithms discussed in this paper. The code is licensed under the MIT license and available at ripser.org. The development of Ripser started in October 2015, with support for the emergent pairs shortcut added in March 2016, and support for sparse distance matrices added in September 2016. The first version of Ripser has been released in July 2016. The version discussed in the present article (v1.2) has been released in February 2020.
Input The input to Ripser is a finite metric space ( , ), encoded in a comma (or whitespace, or other non-numerical character) separated list as either a distance matrix (full, lower, or upper triangular part), or as a list of points in some Euclidean space (euclidean_distance_matrix), from which a distance matrix is constructed. The data type for distance values and coordinates is a 32 bit floating point number (value_t). There are two data structures for storing distance matrices: compressed_distance_matrix is used for dense distance matrices, storing the entries of the lower (or upper) triangular part of the distance matrix in a std::vector, sorted lexicographically by row index, then column index. The adjacency list data structure sparse_distance_matrix is used when the persistence barcode is computed only up to a specified threshold, storing only the distances below that threshold. If no threshold is specified, the minimum enclosing radius min of the input is used as a threshold, as suggested by Henselman-Petrusek [21] and implemented in Eirene. Above that threshold, for any point ∈ minimizing the above formula for the enclosing radius, the Vietoris-Rips complex is a simplicial cone with apex , and so the homology remains trivial afterwards.

Vertices and simplices
Vertices are identified with natural numbers {0, . . . , − 1}, where is the cardinality of the input space. Simplices are indexed by natural numbers according to the combinatorial number system. The data type for both is index_t, which is defined as a 64 bit signed integer (int64_t). The dimension of a simplex is not encoded explicitly, but passed to methods as an extra parameter. The enumeration of the vertices of a simplex encoded in the combinatorial number system can be performed efficiently in decreasing order of the vertices. It is based on the following simple observation. Proof. First note that +1 is a summand of = =0 +1 , and so we have For the maximality of , note that is a -simplex on the vertex set {0, . . . , }. In total, there are +1 +1 -simplices on that vertex set, indexed in the combinatorial number system with the numbers 0, . . . , +1 +1 − 1. Thus, we have < + 1 + 1 .
Using this lemma, the largest vertex index of a simplex can be found by performing a binary search, implemented in the method get_max_vertex. Moreover, the second largest vertex index, −1 , equals the largest vertex index of the simplex with vertex indices ( , . . . , 0 ), which has number − +1 . This way, the numbers of any simplex can be computed by iteratively applying the above lemma. This is implemented in the method get_simplex_vertices.
The binary search for the maximal vertex of a simplex is implemented in get_max_vertex. The requisite computation of binomial coefficients is done in advance and stored in a lookup table (binomial_coeff_table).

Enumerating cofacets and facets of a simplex
The columns of the coboundary matrix are computed by enumerating the cofacets (implemented in the class simplex_coboundary_enumerator). There are two implementations, one for sparse and one for dense distance matrices. For the enumeration of facets there is only one implementation (implemented in the class simplex_coboundary_enumerator), as every facet of a simplex has to be contained in the filtration by the property that a simplicial complex is closed under the fact relation.
For sparse distance matrices with a distance threshold (sparse_distance_matrix), the cofacets of a simplex are obtained by taking the intersection of the neighbor sets for the vertices of the simplex, More specifically, the distances are stored in an adjacency list data structure, maintaining for every vertex an ordered list of its neighbors together with the corresponding distance, sorted by the indices of the neighbors (std::vector<std::vector<index_diameter_t>> neighbors). The enumeration of cofacets of a given simplex (ripser<sparse_distance_matrix>::simplex_coboundary_enumerator) searches through the adjacency lists for the vertices of the simplex for common neighbors (in the method (has_next). Any common neighbor then gives rise to a cofacet of the simplex.
For dense distance matrices, the following straightforward method is used to enumerate the cofacets Thus, the cofacets of can easily be enumerated in reverse colexicographic vertex order by maintaining and updating the values of the two partial sums appearing in the above equation, starting from the number for the simplex , starting from the value 0 for the left partial sum (idx_above) and the number for the simplex for the right partial sum (idx_below). The coboundary coefficient of the corresponding cofacet is (−1) . Returning to the example from Section 2, the cofacets of the simplex { 5 , 3 , 0 } in the full simplex on seven vertices { 0 , . . . , 6 } are, in reverse colexicographic vertex order, the simplices with numbers The facets of can easily be enumerated in (forward) colexicographic vertex order by maintaining and updating the values of the two partial sums appearing in the above equation, starting from the value 0 for the left partial sum (idx_above) and the number for the simplex for the right partial sum (idx_below).
Assembling column indices of the filtration coboundary matrix The above methods for enumerating cofacets of a simplex are also used to enumerate the -simplices corresponding to the columns in the filtration coboundary matrix, i.e., the essential simplices and the death simplices for relative cohomology (in the method assemble_columns_to_reduce). is enumerated exactly once, as a cofacet of ( −1 , . . . , 0 ). Since version 1.2, Ripser only assembles column indices that do not correspond to simplices appearing in a zero persistence apparent pair. This strategy, adopted from the parallel GPU implementation of Ripser by Zhang et al. [52], leads to another significant improvement both in memory usage and in computation time. The method is_in_zero_apparent_pair) checks whether a simplex has a zero persistence apparent cofacet or facet. Since low-dimensional simplices in Vietoris-Rips filtration more frequently have apparent cofacets than apparent facets, the check for cofacets is carried out first. To check for a zero persistence apparent cofacet of a simplex (get_zero_apparent_cofacet), following Proposition 3.9, the method get_zero_pivot_cofacet searches for the lexicographically maximal cofacet with the same diameter as the simplex, and then in turn get_zero_pivot_facet searches for its lexicographically minimal facet with the same diameter. If that facet is the initial simplex, a zero apparent pair is found. Checking for an apparent facet (get_apparent_facet) works analogously.
Coefficients Ripser supports the computation of persistent homology with coefficients in a prime field F , for any prime number < 2 16 . The support for coefficients in as prime field can be enabled or disabled by setting a compiler flag (USE_COEFFICIENTS). The data type for coefficients is coeff_t, which is defined as a 16 bit unsigned integer (uint16_t), admitting fast multiplication without overflow on 64 bit architectures. Fast division in modular arithmetic is obtained by precomputing the multiplicative inverses of nonzero elements of the field (in the method multiplicative_inverse_vector).

Column and matrix data sutrctures
The basic data type for entries in a (diameter_entry_t) boundary or coefficient matrix is a tuple consisting of a simplex index (index_t), a floating point value (value_t) caching the diameter of the simplex with that index, and a coefficient (coeff_t) if coefficients are enabled. The type diameter_entry_t thus represents a scalar multiple of an oriented -simplex, seen as basis element of the cochain vector space ( ). If support for coefficients is enabled, the index (48 bit) and the coefficient (16 bit) are packed into a single 64 bit word (using __attribute__((packed))). The actual number of bits used for the coefficients can be adjusted by changing num_coefficient_bits, in order to accommodate a larger number of possible simplex indices.
The reduction matrix used in the persistence computation is represented as a list of columns in a sparse matrix format (compressed_sparse_matrix), with each column storing only a collection of nonzero entries, encoding a linear combination of the basis elements for the row space. The diagonal entries of are always 1 and are therefore not stored explicitly in the data structure. Note that the rows and columns of are not indexed by the combinatorial number system, but by a prefix of the natural numbers corresponding to a filtration-ordered list of boundary columns (columns_to_reduce).
Pivot extraction During the matrix reduction, the current working columns and on which column operations are performed (working_reduction_column and working_coboundary) are maintained as binary heaps (std::priority_queue) with value type diameter_entry_t, using a comparison function object to specify the ordering of the heap elements (greater_diameter_or_smaller_index) in reverse filtration order (of the simplices in the lexicographically refined Rips filtration), thus providing fast access to the pivot entry of a column.
A heap encodes a column vector as a sum of scalar multiples of the row basis elements, the summands being encoded in the data type diameter_entry_t. The heap may actually contain several entries with the same row index, and should thus be considered as a lazily evaluated representation of a formal linear combination. In particular, the pivot entry of the column is obtained (in the method pop_pivot) by iteratively extracting the top entries of the heap and summing up their coefficients as long as their row index does not change. At any point, the coefficient sum might become zero, in which case the procedure continues with the next row index regardless of its row index. A similar lazy heap data structure has been used already in PHAT [5] and DIPHA [4].

Coboundary matrix
The method init_coboundary_and_get_pivot initializes both working columns as = and = and returns the pivot of the column . During the construction of , the method checks for a possible emergent pair ( , ) while enumerating the cofacets of the simplex with index . If the pivot index of is found to form an emergent pair with , the method immediately returns the pivot, without completing the construction of . Since the implicit matrix reduction variant of Algorithm 1 discards this column afterwards, retaining only the pivot index ( = Pivot ) and the pivot entry (PivotEntry ), this shortcut does not affect the correctness of the computation.

Persistence pairs
The computation of persistence barcodes proceeds by applying Algorithm 2 to the filtration coboundary matrix, as described in Section 3.3. First, the persistent cohomology in degree 0 is computed (in compute_dim_0_pairs) using Kruskal's minimum spanning tree algorithm [27] with a union-find [46] data structure (union_find). After that, the remaining barcodes are computed in increasing dimension (compute_dim_0_pairs).
Apparent pairs are not stored in the hash table. Consequently, the check for a column with a given pivot index proceeds in two steps: first querying the hash table, and if no pair is found, checking for an apparent pair (get_apparent_facet). The method add_coboundary performs the columns additions = − · · and = − · in Algorithm 1. If the reduction at column index finishes with a nonzero working coboundary and thus a persistence pair ( , ) is found, the index in the vector columns_to_reduce corresponding to column is stored at key in the hash table pivot_column_index (std::unordered_map), providing fast queries for the column with a given pivot index = Pivot , as required in Algorithm 1. The working reduction column is written into the compressed reduction matrix, while the working coboundary is discarded.
Since the keys of the hash table are precisely the birth indices of persistence pairs, by the clearing optimization these indices are excluded when assembling the column indices for the coboundary matrix in the next dimension (in the method assemble_columns_to_reduce). The key type for the hash table is entry_t, and the key in the hash table contains Pivot (in the index_t member of the type entry_t) as well as PivotEntry (in the coefficient_t member). Only the index_t field is used for hashing and comparing keys.  Table 1: Running times and memory usage of different software packages for various data sets. The number of points is denoted by , the maximal degree of homology to be computed is denoted by , and the diameter threshold is denoted by .

Experiments
We compare Ripser (v1.2) to the four most efficient publicly available implementations for the computation of Vietoris-Rips persistence: Dionysus 2 (v2.0.8) [33], DIPHA (v2.1.0) [4], Gudhi (v3.2.0) [49], and Eirene (v1.3.5) [20]. All results were obtained on a desktop computer with 4 GHz Intel Core i7 processor and 32 GB 1600 MHz DDR3 RAM. The benchmark is implemented in Docker and can be reproduced using the command docker build github.com/Ripser/ripser-benchmark on any machine with sufficient memory. The software DIPHA, written to support parallel and distributed persistence computation, was configured to run on all 4 physical cores. The data set sphere3 consists of 192 random points on the unit sphere in R 3 . It is taken from the benchmark for PHAT [5]. The data set o3 consists of 4096 random orthogonal 3 × 3 matrices. For this data set, we computed cohomology up to degree 3 and up to a diameter threshold of 1.4. We also used a prefix of the o3 dataset consisting of 1024 matrices, for which we used the threshold 1.8. The data set torus4 consists of 50000 random points from the Clifford torus 1 × 1 ⊂ R 4 , for which we used a diameter threshold of 0.15. The data sets dragon, fractal-r, and random16 are taken from the extensive benchmark [40]. Our results are shown in Table 1.

Apparent and emergent pairs
For typical data sets, a large portion of the persistence pairs are apparent or emergent pairs of persistence 0 and can thus be identified using the shortcuts described in Section 3.5, as shown in Table 2. This table shows the counts of various pairs for the data sets in each dimension, starting from 1. As predicted by Theorem 3.10, in dimension 1 every zero pair is an apparent pair and hence also an emergent pair. However, some non-zero emergent pairs appear as well. In higher dimensions, there are non-emergent zero persistence pairs. The speedup obtained by the emergent and apparent pair optimizations is shown in Table 3.
Implicit reduction matrix Implicit matrix reduction (Section 3.4) is a prerequisite for discarding the columns of the reduced matrix = · instead of storing them in memory. This, in turn, is a prerequisite for the emergent pairs shortcut, which does not even fully construct all columns of the boundary matrix. Finally, the apparent pairs shortcut further avoids storing apparent pairs in the pivot hash table and the list of columns to be reduced. Table 3 exemplifies the speedup obtained by these optimizations. The running times shown in this table are obtained by making small modification to Ripser to disable the optimizations in the order of their dependencies. Given the similar timings of implicit reduction (with storing the reduced matrix) and of explicit reduction (also using the reduced matrix for columns additions), we observe that the extra cost of recreating the reduced columns, as required for implicit reduction, is actually negligible in practice.  Table 3: Comparison of running times for different optimization enabled in Ripser: explicit matrix reduction, implicit matrix reduction storing the reduced matrix , implicit matrix reduction discarding the reduced matrix , and implicit matrix reduction discarding the reduced matrix, employing the emergent pairs shortcut during reduction, and skipping all apparent pairs. All variants also employ clearing and compute cohomology.