The Extended Persistent Homology Transform of manifolds with boundary

The Extended Persistent Homology Transform (XPHT) is a topological transform which takes as input a shape embedded in Euclidean space, and to each unit vector assigns the extended persistence module of the height function over that shape with respect to that direction. We can define a distance between two shapes by integrating over the sphere the distance between their respective extended persistence modules. By using extended persistence we get finite distances between shapes even when they have different Betti numbers. We use Morse theory to show that the extended persistence of a height function over a manifold with boundary can be deduced from the extended persistence for that height function restricted to the boundary, alongside labels on the critical points as positive or negative critical. We study the application of the XPHT to binary images; outlining an algorithm for efficient calculation of the XPHT exploiting relationships between the PHT of the boundary curves to the extended persistence of the foreground.


Introduction
The fundamental goal in statistical shape analysis is to define and compute meaningful distances between different subsets of Euclidean space.A recent landmark-free approach to quantify both the geometry and topology of a shape is to use a topological transform such as the Persistent Homology Transform (PHT) or the Euler Characteristic Transform (ECT).Both of these transforms take a shape M , viewed as a subset R n , and associate to each direction v ∈ S n−1 a shape summary obtained by scanning M in the direction v, calculating the persistent homology (PH(M, v)) and the Euler curve respectively.
Different formulations of the PHT and ECT have been demonstrably useful in diverse applications including prediction of disease progression from the shapes of tumours ( [8,20]), identification of different cultivars from the shapes of leaves [24], quantification of morphological variation of barley seeds [1], and identification of structural differences among proteins [22].This paper introduces an improved variant of this topological transform called the Extended Persistent Homology Transform (XPHT) and establishes properties that significantly reduce the time required to compute it.
A limitation of the PHT is it does not work well with shapes that have different Betti numbers (the ranks of the homology groups).For M 1 , M 2 ⊂ R n , the (p-)distance between their persistent homology transforms is defined as where W p (•, •) is the p-Wasserstein distance.If M 1 and M 2 have different Betti numbers, then W p (PH(M 1 , v), PH(M 2 , v)) = ∞, for all v, and thus dist p (PHT(M 1 ), PHT(M 2 )) = ∞.
One potential work-around would be to replace the Wasserstein distance with a different metric on the space of persistence modules, one where having different Betti numbers does not enforce infinite distance.A more satisfying approach is to replace persistent homology with extended persistent homology.
The theory of extended persistence for functions over a manifold X was developed in [7] to quantify the support of the essential homology classes of X (these essential classes are the elements of H * (X)).Even when the domains have different Betti numbers we still have a finite Wasserstein distance between their extended persistence modules.This motivates the Extended Persistent Homology Transform (XPHT) as a topological transform, which is defined in exactly the same manner as the PHT but replacing regular persistent homology with extended persistent homology.By quantifying the size of essential classes it is possible for XPHT to be stable with respect to the addition to, or removal of, "small" essential classes in the different domains.For example, if we add an isolated noisy pixel to a binary image then the change in the XPHT will be commensurate with the size of a pixel.This extra stability can provide greater power and robustness to statistical methods that use distances between shapes derived from the XPHT.As this paper is focused on computational aspects of the XPHT, comprehensive stability results are left as a future research direction.
We believe that extended persistence is currently under-utilised within applied topology and this paper addresses three potential obstacles.Firstly, we make extended persistence modules more theoretically accessible by placing them within a generalised framework that includes both regular persistence as well as extended persistence.Secondly, we provide motivation with an important example (in the form of the XPHT) where using extended persistence provides a qualitative improvement in usefulness.Lastly, we provide insights on how to ease the computation of extended persistence in the important case of height functions, with implemented code for binary images.

Outline of paper
The mathematical treatment of the XPHT and algorithms to compute it requires the adaption and extension of many standard definitions within applied topology.We cover this material in some detail to make the paper more selfcontained and to provide a cohesive perspective on results from different areas of the literature.
The original definition of extended persistence in [7] is made for functions defined on a smooth or piecewise-linear (PL) manifold and concatenates two homology sequences, the standard inclusion-induced persistent homology sequence for the sublevel set filtration, followed by a descending relative homology sequence for superlevel sets.In section 2, we reformulate this as a persistence module over a totally ordered set, with all transition maps defined as those induced on relative homology by inclusions of a pair of spaces.These spaces are defined by a real-valued function on a triangulated manifold with boundary, f : M → R. We then establish a relationship between the intervals of extended persistence modules of f and (−f ), which is one of the results required to reduce computation time for the XPHT.
In section 3 we generalise the definition of Wasserstein and bottleneck distances between persistence diagrams to apply to persistence modules over a totally ordered metric space, with a defined set of ephemeral (zero-length) intervals.The Wasserstein and bottleneck distances are optimal transport metrics with transport plans that include a bijection between chosen subsets of intervals and then subsets of unmatched intervals.To define the cost of a transportation plan we need a distance between intervals and cost of having an interval unmatched.We show our definition agrees with the existing definitions of bottleneck distance between extended persistence diagrams.
A key theoretical insight of our work, and one which makes the XPHT feasible to compute, is that for manifolds with boundary embedded in R n the extended persistent homology of a height function over M can be deduced from the persistent homology of the same height function restricted to ∂M .This is the topic of section 4. The proof of this insight requires ideas from Morse theory for manifolds with boundary, in both the smooth and piecewise-linear settings.This background material is covered in section 4.1.We also precisely state the relationship between birth and death parameters of extended persistence in terms of the different kinds of critical points of a smooth or PL Morse function on a manifold with boundary.Section 4.2 then develops results specifically for the case of a directional height function.It is worth noting that any subset of R n with positive weak feature size is arbitrarily close to a n-manifold with boundary by taking an expansion.This means the restriction to n-manifolds with boundary is reasonable from an application standpoint.
Adapting the definition of the persistent homology transform (PHT) to extended persistence is straightforward.We cover this material in section 5.
Shape analysis of objects in digital images is an application domain with wide interest.Objects in binary images can be modelled as two dimensional manifolds with boundary lying in the plane, so our XPHT results apply.In section 6 we define boundary curves that separate foreground and background connected components consistent with a chosen digital adjacency, and show that these boundary curves are disjoint simple closed PL 1-manifolds.Digital grids create degeneracy in the height function critical values, so we derive additional results that establish the correctness of our implemented algorithms.Finally, in Section 7 we illustrate our R-package implementation by comparing the XPHT of the letters 'A' and 'g' rendered in a variety of standard fonts.We find the XPHT of the upper case 'A' naturally separates the serif and sans-serif fonts, and that the XPHT of the lower case 'g' naturally separates the single-storey and the double-storey fonts.

Relation to the Alexander Duality for Extended Persistence
A form of Alexander Duality for extended persistence was proved in [12].That paper considers the decomposition of the sphere into two sets U , V with U ∪ V = S n and U ∩ V a (n − 1)-manifold, and proves results about the extended persistence of a perfect Morse function f over these sets.A perfect Morse function over S n is a smooth function with exactly two critical points, one minimum and one maximum.Edelsbrunner and Kerber prove that the extended persistence module of U ∩ V is the direct sum of those for U and V (with minor adjustments for homology dimension zero).The statement of our Theorem 4.17 is effectively a special case of their result.However, our proof is very different as it is based on Morse theory instead of Alexander Duality.Another key difference in our results is that we show how the extended persistence module for U ∩ V splits into the two different parts (Theorem 4.18); this is not established in [12].
Since our ultimate goal is to calculate the extended persistence of U from that of U ∩ V this splitting criteria is pivotal.
2 Extended persistence modules

Persistence modules over totally ordered sets
Commonly persistence modules are defined with an underlying parameter space a subset of R but they can be defined where the parameter space is a totally ordered set.This approach will make working with extended persistence substantially cleaner and more intuitive as we will want to split our parameter space into ordinary and relative homology parameter types.
Definition 2.1.A totally ordered set (Θ, ≤) is a set Θ with a relation ≤ which is • Antisymmetric: that is α ≤ β and β ≤ α implies α = β, • Transitive: that is α ≤ β and β ≤ γ implies α ≤ γ, and Definition 2.2.Fix a field F and Θ a totally ordered set.A persistence module P over Θ is a family {V α } α∈Θ of F-vector spaces indexed by elements of Θ, together with a family of homomorphism {ϕ β α : for all α ≤ β ≤ γ, and ϕ α α = id V α .We call the ϕ β α the transition maps.We say P is pointwise finite dimensional if the V α are finite dimensional for all α ∈ Θ.
In the algebraic theory of persistence modules there are often technical requirements about tameness, and being pointwise finite dimensional will generally be a sufficient condition.This is a very reasonable assumption in almost any application.The most important algebraic result is the decomposition theorem.This gives a complete yet discrete description of a persistence module up to isomorphism.We will decompose persistence modules into sums of interval modules, but first we must define interval persistence modules.
We are all familiar with intervals that are subsets of the real line.We generalise this notion to any totally ordered set as follows.
Definition 2.3.An interval in a totally ordered space (Θ, ≤) is a subset I ⊂ Θ such that for all α ∈ Θ either α ∈ I, or α ≤ θ for all θ ∈ I, or θ ≤ α for all θ ∈ I.An interval module over an interval I is a persistence module I I with attached vector spaces and transition maps are the identity, id F , when both domain and codomain are F and 0 otherwise.For each interval module I I we call b(I I ) = inf I the birth parameter and d(I I ) = sup I the death parameter.
The nomenclature of "interval" was introduced for persistence modules with parameter space R but it is still reasonable even in the generalised setting of totally ordered sets.If we can map the totally ordered set to a subset of the real line, say f : Θ → R, in a way that respects the order relation, then we can view each interval module as having support f −1 (I) where I ⊂ R is some interval.Theorem 2.4 ([9] Theorem 1.1).A pointwise finite dimensional persistence module over any subset of R admits an interval decomposition.That is, there is a multiset of intervals S such that the module is isomorphic to a direct sum of interval modules where each I I is an interval module.This decomposition is unique up to isomorphism.
For the rest of the paper we will be assuming all persistence modules are pointwise finite dimensional and that the the underlying parameter space is equivalent to a subset of R (with respect to the order relation), and thus we can always assume an interval decomposition occurs.Readers may be familiar with persistence diagrams.Persistence diagrams are a graphical representation of a persistence barcode.If we take our ordered set to be R then the parameters are real numbers.We can represent each interval module in the persistence module decomposition by a point in R 2 with the x-coordinate the birth parameter, and its y-coordinate the death parameter.We then construct the persistence diagram as the resulting multiset of points in R 2 together with all the points along the diagonal in the plane.

Extended persistence
Extended persistence combines the regular filtration of sublevel sets for f : M → R with a filtration of relative homology groups of M relative to superlevel sets of f .This provides a wealth of extra information about the structure of M , especially in the case that M is a manifold with boundary.
We first recall the definition of relative homology, and the maps induced by the inclusion of a pair.Given a subcomplex X ⊂ Y we observe that the boundary map on C * (Y ) leaves C * (X) invariant.This means we can define a We can then define the relative homology groups by Relative homology is a generalisation of normal homology as If Y ⊂ B and X ⊂ A ⊂ B we have an inclusion of pairs (Y, X) ⊂ (B, A).This inclusion of pairs induces a map between their relative homology groups, We are now ready to define the extended persistence module as a form of persistence modules.The parameter space over which the persistence module is constructed will be the union of two sets -one corresponding to ordinary homology and the other corresponding to relative homology.Set O = {(t, Ord) : We define a total order over Θ by (s, Ord) < (t, Ord) when s < t (s, Rel) < (t, Rel) when s > t (s, Ord) < (t, Rel) for all s, t Figure 1: An illustration of extended persistence intervals for a rather abstract snail, M .The function f : M → R is simply the x-coordinate and the function value is denoted by the blue-green colour gradient.We have drawn a copy of M with its x-coordinate reflected to illustrate the superlevel sets used in the relative part of the sequence.
We then assign vector spaces to each θ ∈ Θ defined in terms of sublevel and superlevel sets.As input we have a topological space M with a bounded function f : M → R. Let M s = f −1 (−∞, s] and M s = f −1 [s, ∞) denote the sublevel and superlevel sets of f : M → R. We assign the vector spaces as V (t,Ord) = H k (M t , ∅) and V (t,Rel) = H k (M, M t ).The transition maps are the natural ones induced by inclusions of a pair.
The compositions of induced maps of inclusions of a pair is the corresponding induced map by inclusion.This means that the transition maps commute as needed and we have constructed a persistence module.
Each interval in the interval decomposition will be supported over some interval of Θ which will be one of three types; if the supports contains only the parameters in O we call it ordinary, if the support only contains parameters in R we call it relative.Finally, the persistent homology class might exist for parameters spanning both O and R, in which case we call it essential.Essential persistent homology classes exist in the vector space H k (M, ∅) = H k (M ) and in classical persistent homology are assigned a death parameter of infinity.The object in Fig. 1 illustrates the parameter space Θ and has one class of each type.
Remark 2.5.To preempt any confusion, we note a difference in our nomenclature from some papers, including [7].What we call essential classes above are instead called "extended".We prefer the term "essential" as these classes do indeed correspond to the essential classes of M .Furthermore it means we can use "extended" to refer to any class in the extended persistence module.
We can partition the elements of the interval decomposition of extended persistent homology into three sets depending on whether they are ordinary, relative or essential.Following [5] we can further split the essential classes into positive and negative types.For an essential class with birth time (s, Ord) and death time (t, Rel), we say it is positive if s < t and negative if s > t.
We can express the extended persistence module as a direct sum of ordinary, relative and essential persistence modules.For an extended persistence module constructed from sublevel and superlevel set filtrations of f : M → R denote these submodules by Ord k (M, f ), Rel k (M, f ) and Ess + k (M, f ) and Ess − k (M, f ), which are each persistence modules over R. For Rel k (M, f ) and Ess − k (M, f ) the order of parameters in R is reversed -that is, the real value associated with the birth time is larger than the real value associated with the death time.In the case of subsets of R 2 (cf. the example in Fig. 1) we will show that Ess 0 = Ess + 0 and Ess 1 = Ess − 1 and thus we do not need to indicate the sign of the essential classes.

Duality
There is a form of duality between the ordinary persistent homology of f : M → R and the relative persistent homology of (−f ) : M → R.This follows from results in [11] but that paper uses substantially different notation to us.Furthermore, that paper considers filtrations of simplicial complexes, a context where we cannot naively switch between sublevel and superlevel sets.For these reasons, we rewrite their proposition to suit the requirements of our setting.Proposition 2.6 (Proposition 2.4 in [11]).Let M = {M t } be a filtration of simplicial complexes.Let PH k (M) be the persistence module of k-dimensional persistent homology of the filtration M. Let PH 0 k (M) be the restriction of PH k (M) to persistence classes with finite lifetimes.Let PH k+1 (M ∞ , M) be the persistence module of relative homology classes H k+1 (M ∞ , M t ) and let PH 0 k+1 (M ∞ , M) be the restriction of PH k+1 (M ∞ , M) to persistence classes with finite lifetimes.Then PH 0 k (M) and PH 0 k+1 (M ∞ , M) are isomorphic.
Corollary 2.7.Let M be a finite simplicial complex, with vertex set V , and geometric realisation |M |.Let f : |M | → R be a continuous map such that on each cell f is the linear interpolation of the values on its vertices.We have a bijection ρ between the interval modules in the interval decomposition of Ord k (M, f ) to that of Rel k+1 (M, (−f )) with Proof.The PH 0 k+1 (M ∞ , M t ) of [11] is the relative homology of M with respect to the (increasing t) sequence , so the sequence M t of sublevel sets of f is identical to a sequence of superlevel sets, M s , of (−f ), with s = −t.Note that when the filtration is expressed as superlevel sets of (−f ), the parameter s is a decreasing one, as used in the relative part of an extended persistence module.We note that this duality result is quite different from the duality theorem of [7], which is proved in the case that M is a triangulated d-manifold.That paper goes on to also establish a symmetry theorem for extended persistence for functions over manifolds without boundary, which we discuss in our notation and context below.

Symmetry
In the case that M is a manifold we find that the information content in extended persistence modules is greatly reduced by the isomorphisms established in the following result.
Proposition 2.8 (Symmetry theorem of [7]).Let M be a triangulated dmanifold and f : M → R be a piecewise-linear function interpolating the values on the vertices of M .There are bijections, ψ • , between submodules of extended persistence for f and (−f ) as follows: Remark 2.9.We note that [7] has a typographical error in the dimensions for the relative homology classes.
Proof.As in [7], first use Lefschetz duality H k (X, ∂X) ↔ H d−k (X, ∅) with X = M t and the excision theorem to see that Combined with the inclusion-induced maps on homology, this gives a bijection between the finite intervals of ordinary and relative homology in complementary dimensions: The same relationship holds for the essential homology classes: . Note these bijections are those established by the duality theorem of [7].Combined with the duality result 2.7 above, we now see that Composing the two bijections establishes the maps ψ • in each case.
Remark 2.10.Our application to binary images has data M that are manifolds with boundary, so the duality and symmetry theorems of [7] do not apply directly.We will use the duality result of [11] to reduce the number of directions required when computing the extended persistent homology transform, since it gives a bijection between the intervals for height filtrations in opposite directions.Since the boundary ∂M of a manifold with boundary (M, ∂M ) is a manifold we will be able to use the symmetry result to characterise the essential classes in Ess 0 (∂M, f ) and Ess n−1 (∂M, f ).
3 Wasserstein distance between extended persistence modules

Wasserstein distances between persistence modules
There are many possible metrics between persistence modules, and various representations of them.In this paper we restrict our attention to Wasserstein distances.Wasserstein distances between persistence modules are usually defined in terms of the points in their corresponding persistence diagrams.However, given our desire to study extended persistence, we rephrase the definitions here in terms of persistence modules over a totally ordered set.Wasserstein distances are a form of optimal transport metric.A transportation plan between two persistence modules matches subsets of intervals from each, with the remaining unmatched intervals paired with an ephemeral interval.Since every persistence module considered in this paper is isomorphic to a direct sum of interval modules it is sufficient to define our transportation plans between persistence modules written in this form.
Definition 3.1.Let Θ be a totally ordered set and P = Ii∈S P I Ii and Q = Ij ∈S Q I Ij persistence modules over Θ.A transportation plan between P and Q is a triple T = ( ŜP , ŜQ , ρ) where ŜP ⊂ S P , ŜQ ⊂ S Q and ρ : ŜP → ŜQ is a bijection.We call the intervals in ŜP and in ŜQ matched intervals in T , and we call the intervals in S P \ ŜP and in S P \ ŜQ unmatched intervals in T .
Each transportation plan has an associated cost, constructed analogously to an L p function metric.This in turn depends on the metric used to measure distance between points in Θ, which we define below.Definition 3.2.We call (Θ, ≤, dist) a totally ordered metric space if (Θ, ≤) is a totally ordered set, and dist is an extended metric over Θ such that dist(β, γ) ≤ dist(α, γ) and dist(α, β) ≤ dist(α, γ) whenever α ≤ β ≤ γ.
From the metric on Θ we obtain a p-distance between intervals over Θ, analogous to the l p distance between points in R 2 .Given two intervals I and I , the p-distance Note that for general interval modules this is actually a pseudo-distance as it cannot distinguish between intervals with open or closed endpoints.However, if the persistence modules are constructed from filtrations involving closed sublevel and superlevel sets then the intervals are always half-open, including the birth parameter and not including the death parameter.When restricted to such half-open interval modules the above definition of dist p will satisfy the identity of indiscernibles, making it an actual distance.Throughout this paper we will work exclusively with persistence modules that have these half-open intervals.
The final ingredient we need before defining the transportation plans and their costs is the notion of an "empty interval".For persistence diagrams these are points on the diagonal, corresponding to intervals of zero length in the usual setting of persistence modules over R. In the general definition of Wasserstein distance we are allowed to fix any subset of interval modules to perform this role.We call this set the ephemeral intervals denoted Eph.This name is inspired by the definition of an ephemeral persistence module as one with distance zero to the trivial persistence module (see [6]).
We now define the cost of a transportation plan using p-distances between intervals where the unmatched intervals of a plan are costed by their distance to the set of ephemeral intervals.Definition 3.3.Let Θ be a totally ordered set; P = a∈S P I a and Q = b∈S Q I b be persistence modules over the ordered metric space (Θ, ≤, dist).Let Eph denote the set ephemeral intervals over Θ.Let T = ( ŜP , ŜQ , ρ) be a transportation plan between P and Q.For p ∈ [1, ∞) we define the p-cost of T by Observe that c ∞ (T ) is the limit of c p (T ) as p goes to infinity.The Wasserstein distance is defined as the infimum of the costs of all transportation plans.Note that there is always at least one possible transportation plan as we can choose ŜP and ŜQ to be empty.
Let Θ be a totally ordered set and P = a∈S P I a and Q = b∈S Q I b be persistence modules over the ordered metric space (Θ, ≤, dist).The p-Wasserstein distance between P and Q is W p (P, Q) = inf{c p (T ) | T a transportation plan between P and Q}.
The bottleneck distance between P and Q is This definition agrees with the standard definitions of Wasserstein and bottleneck distances between persistence diagrams when Θ is the real line with its standard order, dist(s, t) = |s − t|, and Eph = {[t, t] : t ∈ R}.More generally, for any totally ordered metric space and any choice for the set of ephemeral intervals, the Wasserstein distance defined above will determine an extended metric.Again, for general persistence modules this will be, strictly speaking, a pseudo-distance.But, as discussed earlier, in this paper the persistence modules will only contain appropriate half-open intervals and W p (P, Q) satisfies the identity of indiscernibles.

Wasserstein distance for extended persistence
The Wasserstein distance between persistence modules is specified by the ordered metric space and set of ephemeral interval modules.Recall from Section 2.2 that extended persistent modules have parameter set Θ = O ∪ R, with O = {(t, Ord) : t ∈ R} and R = {(t, Rel) : t ∈ R}, and the total order over P is (s, Ord) < (t, Ord) when s < t (s, Rel) < (t, Rel) when s > t (s, Ord) < (t, Rel) for all s, t.
We also need to define the set of ephemeral interval modules; there are three different types: ordinary, relative and essential.We set For computational purposes it is much easier to split the calculation of distances between extended persistence modules into separate calculations for the submodules of the types Ord, Rel, Ess + and Ess − .This is justified by the following proposition.Proposition 3.5.Let P and Q be extended persistence modules in a single homology dimension and let P = Ord(P) ⊕ Rel(P) ⊕ Ess + (P) ⊕ Ess − (P) and Proof.The right hand side of the both equations is the infimum of transportation costs over the set of transportation plans which never match any intervals of different types.It is thus sufficient to show that for any transportation plan between P and Q there is another transportation T with the same or lesser cost such that any matched pair within T keeps to the same type.Any two intervals of different types of Rel, Ord or Ess are an infinite distance apart.Since every interval module has finite distance to some ephemeral interval it will always be more efficient to change any interval that is matched to a different type to instead be unmatched.Similarly there is a higher cost to match positive with negative essential classes than to leave both unmatched.
It is worth observing that in previous work, such as [5,2], the extended persistent homology modules are represented by multiple persistence diagrams, separating the different types into their own persistence diagrams.The ordinary persistence diagram has points above the diagonal, the relative persistence diagram has points only below the diagonal, and the essential persistence diagram has points on both sides -positive above and negative below.The bottleneck distance in [2] is then defined as the formula within Proposition 3.5.
Remark 3.6.We believe that the Wasserstein distance could also be defined analogous to the algebraic Wasserstein distance in [21] but adapted to extended persistence, and that these two versions of Wasserstein distances would be equivalent.Given the enormous homological algebra set up required to prove such a result it is beyond the scope of this paper and left as a future direction of research.

Morse theory for manifolds with boundary and extended persistence
This section contains the main theoretical results relating extended persistence of a height function over a manifold with boundary to that of the same function restricted to the boundary.We establish these results using Morse theory, a standard technique when working with persistence modules built from sublevel set filtrations.Previous results, however, apply only to functions on manifolds not to those with boundary.The presence of a boundary requires extra analysis to characterise critical points located on this boundary.We start by summarising the necessary definitions and results from Morse theory covering both the smooth and piecewise-linear settings.

Background: Smooth and PL Morse theory
We need our results about extended persistent homology to hold for both the smooth (theoretical) case, and the piecewise-linear setting relevant to numerical computations.Most of the theorems and their proofs are effectively the same but we must first set up the definitions and relevant lemmas about critical points.The background theory is covered for the smooth case in [4,16], and the piecewise linear case in [15].We direct readers interested in more details to these references.Although regular and critical points and their indices in Morse theory are more commonly defined in terms of the derivatives and Hessian of a function, this approach does not translate well to the PL setting.There is, however, an equivalent approach to defining critical points and indices that uses polynomial functions over charts, and this can be easily adapted to the PL setting.To make this paper self-contained we start by recalling the definitions of smooth and PL manifold (with or without boundary) in terms of charts.Definition 4.1.For a topological space, M , and an open subset U ⊂ M , a chart is a homeomorphism φ : U → φ(U ) where φ(U ) is a subset of Euclidean space.An atlas for M is an indexed family of charts {(U α , φ α }) that cover M , i.e., ∪U α = M .A topological n-manifold is a second countable, Hausdorff space equipped with an atlas where the codomain of each φ α is an open subset of R n .A topological n-manifold with boundary is a second countable, Hausdorff space equipped with an atlas where the codomain of each To introduce the adjectives smooth and piecewise linear (PL) we need to discuss the compatibility of φ α and φ β on the intersections of their domains.Given two charts (U α , φ α ) and (U β , φ β ) where U α ∩ U β has non-empty intersection we can define two different maps by restricting the domains of φ α and φ β to U α ∩U β .The new homeomorphisms are These are called the transition maps between charts.Definition 4.2.A topological n-manifold, with or without boundary, is called smooth if its transition maps are smooth.It is called piecewise-linear (PL for short) if its transition maps are piecewise-linear.
We say that {(U α , φ α )} is maximal if there does not exist another atlas containing it with more charts.A maximal atlas is often referred to as the smooth structure, or respectively, the PL structure of a manifold.Once we have a smooth (or PL) structure we can define what it means for a function f : M → R to be smooth or piecewise linear.Definition 4.3.Let M be a smooth n-manifold, with or without boundary, with smooth (respectively PL) structure An example to keep in mind is M being a smooth or piecewise linear ndimensional subset of R d with its structure inherited from the embedding.A simple function on such a manifold is the height function with respect to some The classical approach to defining critical points in Morse theory is as follows.For a manifold M without boundary and a smooth function We then say the Morse index of f at p is the number of negative eigenvalues of the Hessian, counting multiplicity.A point is regular if it is not critical.These definitions are well defined as they do not depend on the choice of chart (see [18]).
Instead of using definitions for critical and regular points in terms of the derivative, we need an alternative that will be more adaptable to the PL setting.By using the implicit function theorem we can redefine regular points by the existence of a linear function over some chart.We can also remove the need to reference the Hessian for defining the index of a critical point by using the Morse Lemma.
The point p ∈ M is a non-degenerate critical point of f with Morse index k if and only if there is a chart (U, φ) where φ(p) = 0 and The proof of this lemma is covered in [18].We use it as an equivalent definition of a regular point and a non-degenerate critical point of Morse index k.In the piecewise linear setting the only modification is to replace squares with absolute values.
The point p ∈ M is a non-degenerate critical point of f with Morse index k if and only if there is a chart (U, φ) with φ(p) which is of the form We now need to generalise the definitions of regular and critical points to the case of a function over a manifold with boundary (M, ∂M ).Points in the interior of M are treated exactly as above, so we need only discuss the case for points on the boundary.We again phrase the definitions using charts to make it easy to move between smooth and PL settings, following the terminology and notation in [15].Recall that a chart containing a point, A point on the boundary is critical if it is critical for f restricted to ∂M , but the definition of its index requires additional information about whether the function increases or decreases as we move into the manifold.
Please note that there is inconsistency within the literature in terms of sign conventions for critical points on the boundary and our choice may differ from sources the reader is familiar with.Now we have the definitions for all the different types of critical point, we can define what a Morse function is for both the smooth and PL settings.In the following we describe the (persistent) homology in terms of the signs of critical points so it is useful to have notation for this.Highly analogous to the well-known theory of Morse functions on manifolds, we can use the index of critical points to compute the relative homology of nearby sublevel sets of f .Proposition 4.11.Let (M, ∂M ) be a smooth (respectively PL) manifold with boundary and f : M → R a smooth (respectively PL) Morse function.We consider homology with coefficients in a field F, and use Kronecker delta notation δ k i below.
• If t is not a critical value of neither f nor f | ∂M then H i (M t+ , M t− ) = 0 for all i and all > 0 sufficiently small.
for all i and for all > 0 sufficiently small.
For the smooth case, this proposition is proved in [4] and in [16].Please note that in [16] they use the term "m-function" for Morse function.Some minor massaging is needed to convert their results to the homology statements above as they describe the changes in terms of glueing cells.The PL version of this proposition is proved in [15].
We can determine critical points and indices of (−f ) from those of f using charts, as summarised in the following lemma which holds for both the smooth and PL settings.
for all i and all > 0 sufficiently small .
for all i and for all > 0 sufficiently small.
for all i and for > 0 sufficiently small.Proof.We first want to write the superlevel sets of f in terms of sublevel sets of (−f ).We have and thus If t is not a critical value for f nor f | ∂M then by Lemma 4.12 (−t) is not a critical value of (−f ) nor (−f )| ∂M .By Proposition 4.11 we know for all i and all > 0 sufficiently small.
If p ∈ Crit(f, n − k), then by Lemma 4.12 we have p ∈ Crit((−f ), k).If p ∈ Crit(f, (n − k − 1, −1)) then by Lemma 4.12 p ∈ Crit((−f ), (k, +1)).In both cases we can apply Proposition 4.11, with (−f ) at p, which implies that )), then by Lemma 4.12 we have p ∈ Crit((−f ), (k, −1)).By Proposition 4.11, with (−f ) at p, we know for > 0 sufficiently small.As might be expected, there is a direct relationship between the critical values of Morse functions and the endpoints of intervals in the barcode decomposition of extended persistent homology.We will need to distinguish between endpoints lying in the ordinary and relative parameter spaces as they behave differently.
Let XPH(M, f ) be the extended persistence module constructed from f : M → R. To ease notation let These are the sets of parameters {(t, ord)} and {(t, rel)} respectively where a new interval begins in the interval decomposition of XPH(M, f ).Similarly let . These are the sets of parameters {(t, ord)} and {(t, rel)} respectively where an interval finishes in the interval decomposition of XPH(M, f ).Furthermore ) denote the sets of birth and death parameters respectively for the extended persistence module XPH(M, f ).In constructing these sets we use the fact that every essential class is born somewhere in the ordinary parameter range and then dies somewhere in the relative parameter range.
The following corollary follows from Proposition 4.11 and Lemma 4.13.
Corollary 4.14.Let (M, ∂M ) be an n-dimensional manifold with boundary and let f : M → R be a Morse function.Then

Relating the extended persistent homology of a manifold to that of its boundary
We can now restrict to the situation of interest for the XPHT; that of computing the extended persistent homology of a height function over a compact n-dimensional manifold with boundary embedded in R n .The results in this section start by comparing the sets of birth and death parameters for the height filtration of the manifold and for its boundary, in Propositions 4.15 and 4.16.The next step is to show that these births and deaths are paired consistently as endpoints of intervals in the relevant persistence modules (Theorem 4.17).We finish with a complete characterisation of the extended persistent homology for the manifold as a submodule of that for its boundary in Theorem 4.18.
The height function is specified in a direction v and restricted to various subsets of R n .That is, h v : R n → R with h v (x) = x • v.To ease notation let h S v denote the restriction of the height function to Proposition 4.15.Let M ⊂ R n be a compact n-manifold with boundary.Suppose that h M v : M → R, the height function in direction v, is a Morse function.For each critical value t let p(t) be the unique critical point of h M v or h ∂M v with h v (p) = t.For all k > 0 we have Proof.Choose R > 0 large enough so that M ⊂ B(0, R) where B(0, R) is the open ball of radius R centred on the origin.Let L = B(0, R)\ int(M ).As there are only finitely many critical points of h M v and h ∂M v there is an > 0 such that all the critical values are at least apart.The critical values lie within [inf(h v (M )), sup(h v (M ))] ⊂ (−R, R) so we can reduce to be small enough that no critical value is within of −R or R.
The function h v defined over all of R n has no critical points, so there will be no critical points in the interior of M .This means we need only consider critical points of h ∂M v .For each s ∈ R we consider the sublevel sets of h v restricted to the three subsets: M s , (∂M ) s and L s .By construction M s ∩ L s = (∂M ) s and M s ∪ L s = h −1 v (−∞, s] ∩ B(0, R).For each k > 0 we therefore have H k+1 (M s ∪ L s ) = 0 = H k (M s ∪ L s ).Using this in the Mayer-Vietoris sequence shows us that H k ((∂M ) s ) and H k (M s ) ⊕ H k (L s ) are isomorphic and hence for all s ∈ R. For k = 0 we know H 0 (M s ∪ L s ) = 1 whenever s ≥ −R.Mayer-Vietoris then gives the short exact sequence By comparing the ranks we have . By Proposition 4.11 we know sgn(h M v , p(t)) = +1 and this implies sgn(h L v , p(t)) = −1.Proposition 4.11 now implies that β k (L t+ ) = β k (L t− ).For k > 0 we can use (1) to calculate If k = 0 we instead use (2) to calculate This is where we use the requirement that is small enough that all critical points of M are greater than −R + .
If k = 0 then we instead use (2) to calculate We have again used t − > −R.Since t is the only critical value of h ∂A v in [t − , t] we conclude that (t, ord) ∈ b ord k (M, h v ).When considering the sets of births and deaths in the relative parameter range we need to use a relative version of the Mayer-Vietoris sequence.For this recall that M ∩ L = ∂M , and M s ∩ L s = ∂M s .The relative version of the Mayer-Vietoris sequence states that there is a long exact sequence Since t is the only critical value of h ∂M v in [t − , t + ] we conclude that (t, rel) ∈ b rel k (∂M, h v ).Now suppose that (t, rel) ∈ b k (∂M, h v ) with sgn(h M v , p(t)) = −1 These facts imply β k (∂M, (∂M ) t ) − β k (∂M, (∂M ) t+ ) = 1 and sgn(h L v , p(t)) = +1.By Corollary 4.13 we therefore have β k (L, L t− ) = β k (L, L t+ ) and can calculate The proof for the sets of death critical values is highly analogous; the difference of the Betti numbers is −1 instead of 1.
Throughout the following collection of results we fix the following sets: Let A be a compact subset of R n whose boundary ∂A = X is therefore a finite collection of disjoint n − 1 manifolds.Let R > 0 such that A ⊂ B(0, R).Let B be the set such that A ∪ B = B(0, R) and A ∩ B = X.
Let S R denote the sphere of radius R. We can observe that ∂A = X and ∂B = X ∪ S R .Let h v be the height function in the direction v ∈ S n−1 , with v such that h X v is a Morse function.
Proposition 4.16.Let A ⊂ R n be a compact n-dimensional manifold with boundary.Let h v : R n → R be the height function in direction v such that h X v is a Morse function.Let R > 0 be such that A ⊂ B(0, R).Let B be the set such that A ∪ B = B(0, R) and A ∩ B = X.Let S R denote the sphere of radius R. Then we have the equality of the following disjoint unions: Since ∂B = X S R we can again apply Proposition 4.15 (now with B playing the role of M ) to say The critical points of h B v which lie on S R are well understood.There are two critical points; one birth in dimension 0 at p 1 = −Rv with value h v (p 1 ) = −R, and a death in dimension 0 at p 2 = Rv with h v (p 2 ) = R.We thus can rewrite the birth and death sets of B as and for k > 0 we have Since every critical point p(t) ∈ X must be either (+)-critical or (−)-critical, by taking the union we get the statement of the proposition.Propositions 4.15 and 4.16 have shown how the sets of birth and death parameters for X, A, and B are related.The following theorem proves the much stronger result that the pairing of endpoints of the bars is consistent, and so we have isomorphisms between various extended persistence modules.This theorem is not a new result -it was proved using Alexander duality in [12].We believe our Morse-theoretic proof may be more readily adapted to other scenarios.
Theorem 4.17.We have Proof.Let us first consider the case where k > 0. Since X ⊂ A and X ⊂ B we have an induced morphisms on persistence modules Furthermore from the ordinary and relative versions of the Mayer-Vietoris sequence we know ϕ (t,ord) , and ϕ (t,rel) are both isomorphisms for all t ∈ R.This implies that ϕ is must be injective.
Injective morphisms between persistence modules were studied extensively in [3].Bauer and Lesnick showed that an injective morphism will induce a injective map ρ on the sets of intervals in the interval decomposition of XPH k (X, v) to those in the interval decomposition of XPH As the two persistence modules have the same number of intervals, the matching ρ must in fact be a bijection.Observe that if f : S → S is a bijection from a finite set to itself such that f (s) ≤ s for all s ∈ S then we are forced to have f the identity.This argument shows ρ([b, d)) = ρ([b, d)) and the interval decompositions of XPH k (X, v) and XPH k (A, v) ⊕ XPH k (B, v) are the same and they are isomorphic as persistence modules.
For the case where k = 0 we need to consider the complication of the homology class corresponding to the sphere S R .We know from Proposition 4.16 that b 0 (X, v) {(−R, ord)} = b 0 (A, v) b 0 (B, v), which we will denote b, and Observe that [(−R, ord), (R, rel)) is an interval in the interval decomposition of XPH 0 (B, v) -this corresponds to the connected component containing S R .This implies that ρ((−R, ord)) = (−R, ord).Just as in the case for k > 0 we can consider the ordinary and relative Mayer Vietoris sequences to show that the morphisms H 0 (X t ) → H 0 (A t ) ⊕ H 0 (B t ) and H 0 (X, X t ) → H 0 (A, A t ) ⊕ H 0 (B, B t ) induced by inclusions are injective for all t and hence the morphism Combining Theorem 4.17 with Proposition 4.16 allows us to express the extended persistent homology of a height function over A as a nice submodule of the extended persistent homology of that same height over ∂A.Theorem 4.18.Let A ⊂ R n be an n-manifold with boundary X = ∂A.Let v be a direction such that h We can more readily describe the essential classes in dimensions 0 and n − 1 in terms of the minimum and maximum values on the different connected components of the boundary.Observe that a compact connected (n−1)-dimensional manifold Y embedded in R n separates R n into two connected open sets, one of which is 'inside' Y and one of which is 'outside' (this is the unbounded component of the two).This theorem is known as the Jordan-Brouwer separation theorem.We use this to define the connected components of X = ∂A as interior or exterior boundary components.Definition 4.19.Let A ⊂ R n be a compact n-dimensional manifold with boundary ∂A = X.Let X be a connected component of X, and Â the connected component of A that contains X.We say that X is an interior boundary component if Â\ X is contained in the unbounded connected component of R n \ X.We say that X is an exterior boundary component if Â\ X is contained in the bounded connected component of R n \ X. and This means that it is sufficient to prove the case where A is connected.We assume A is connected for the remainder of the proof.
Observe that for M a connected (n − 1)-dimensional manifold we have β 0 (M ) = 1 = β n−1 (M ) so there is exactly one essential persistent homology interval module in each of these homology dimensions for extended persistent homology of M with respect to h v .
The interval in Ess 0 (M, h v ) is born at the first appearance of M , that is at (min{h v (M )}, ord).Since M is connected we have this homology class is trivial in H 0 (M, L) for any non-empty subset L ⊂ M .This implies that the death of this interval in Ess 0 (M, h v ) is at parameter (max{h v (M )}, rel).We have shown that Ess 0 (M, h v ) = I [(min{hv(M )},ord),(max{hv(M )},rel)) .
Using the symmetry of extended persistent homology for manifolds (Proposition 2.8) we have Since X is the disjoint union of the interior boundary components {X 1 , . . .X k } and the exterior boundary component Y we have We can use Theorem 4.18 to deduce Ess 0 (A, h v ) and Ess n−1 (A, h v ) from the various persistence modules Ess 0 (X i , h v ), Ess n−1 (X i , h v ), Ess 0 (Y, h v ) and Ess n−1 (Y, h v ).
Consider an interior boundary component X i , and let p i ∈ X i be the global minimum of h Xi v .We know that A is contained in the infinite component of R n \X i , and p i must be a (−)-critical point for h A v .This implies that [(min{h v (X i )}, ord), (max{h v (X i )}, rel)) is not included in J 0 A (where J k A is defined in the statement of Theorem 4.18).Similarly let xi denote the global maximum of h v over X i .We know that A is contained in the infinite component of R n \X i , and pi must be a (+)-critical point for h A v .This implies that If we instead consider the exterior boundary component Y then the global minimum of h Y v will be a (+)-critical point for h A v and the global maximum of h Y v will be a (−)-critical point for h A v .This implies that 5 The Extended Persistent Homology Transform

Background
The persistent homology transform (PHT) maps the space of shapes embedded in Euclidean space into a space of topological summaries.Instead of comparing the original shapes we can compare their topological transforms.The philosophy is that the persistent homology of a height function in some direction v records geometric information from the perspective of direction v.As v changes, the persistent homology classes track geometric features in M .The key insight behind the persistent homology transform (PHT) is that by considering the persistent homology from every direction, we preserve all information about the shape.
Before giving the formal definition we should first identify the subsets of space which are allowable shapes, that is the domain of the PHT.We will want our subsets to be reasonably nice.The most general setting for which theoretical properties about the PHT are proved are compact o-minimal sets, which are called constructible in [10].For the purposes of this paper it is sufficient to know that compact and semi-algebraic or piecewise linear are sufficient conditions for a subset of Euclidean space to be constructible.We will denote the space of constructible subsets of R n by CS(R n ).
Given an constructible set M ⊂ R n , and v ∈ S n−1 , let h v be the corresponding to a height function in direction v, where •, • denotes the inner product.We can construct a persistence module PH k (M, h v ) by filtering M by the sub-level sets of h v and taking k-dimensional homology groups.The underlying parameter set for the persistence module is R, the attached vector space at ), and for s ≤ t the transition map ϕ t s is the induced map on homology from the inclusion h Let PM(R) denote the standard space of persistence modules over parameter space R.
Definition 5.1.The Persistent Homology Transform PHT of a constructible set M ∈ CS(R n ) is the map PHT(M ) : S n−1 → PM(R) n that sends a direction to the set of persistence modules by filtering M in the direction of v: where Various properties of the PHT have been proved in [23,13,10].Stability results bound the distance betwen h v and h w when v, w ∈ S n−1 are close.This implies that for each M ∈ CS(R n ), its persistent homology transform, PHT(M ), is a continuous function over S n−1 when we equip PM with a Wasserstein metric.
Another very important property about the PHT is its injectivity, that is that for as subsets of R n .This was originally proved in [23] for piecewise linear compact subsets in R 2 and R 3 , and then the more general proof was given in [10] and independently in [13].
We can now define a distance between M 1 , M 2 constructible sets via their persistent homology transforms.We basically just integrate the Wasserstein distances over all the possible directions.

Now with extended persistence
We can define a new distance function over CS(R n ) by replacing the normal persistent homology with extended persistent homology.We can construct a definition of a distance between extended persistent homology transforms by replacing the Wasserstein distance between the original persistence modules with those between extended persistence modules.
For the PHT one theoretical result was the continuity of the PHT(M ) as a function from S n−1 .This continuity justified the approximation of the PHT by a finite subset of directions.The proofs for the continuity of the PHT can be easily modified to show continuity of the XPHT.Let E denote the space of extended persistence modules.Then for all M ∈ CS(R n ), the function XPHT k (M ) : S n−1 → E is continuous when we equip E with the p-Wasserstein distance (for p ∈ [1, ∞)), or the bottleneck distance.
In [21] a stability result for the PHT was proven in the case where M 1 and M 2 were different embeddings of the same simplicial complex.This bounded the distance between PHT(M 1 ) and PHT(M 2 ) in terms of the distances between the sets of vertices in the embedding.The proof of this stability theorem can be easily modified to prove an analogous statement for the extended persistent homology transform.
Since the extended persistence module for a filtration by a height function contains strictly more information than the regular persistence module for that height function, the injectivity results for the PHT will automatically also hold for the XPHT.

Application to binary images
In this section we describe how to interpret a binary digital image as a PLmanifold with boundary, construct boundary curves as PL 1-manifolds, and adapt the results of Section 4 to this setting using a simulation of simplicity methodology.

Boundary curves
A binary digital image is a two-dimensional array, P , with elements called pixels taking values in {0, 1}.The array is indexed by integers 1 ≤ i ≤ m and 1 ≤ j ≤ n, so that P (i, j) is the element in the ith row and jth column of P .We can also treat pixels as points in the plane by mapping the array index to a Cartesian coordinate (the first axis is oriented down the page and second from left to right).Those pixels taking the value '1' are defined to be the foreground F := P −1 [1] and those with value '0' are the background G := P −1 [0].A small patch of such a binary image array is illustrated in Fig. 2. To answer questions about the connectivity of objects represented by the image, we must define a neighbourhood or adjacency relation for each pixel.Two standard options called 4-and 8-connectivity in digital topology are defined as follows.Definition 6.1.A pixel (k, l) is said to be a 4-adjacent or direct neighbour of (i, j) if their 1 distance is exactly 1: |i − k| + |j − k| = 1.Pixels are 8adjacent neighbours if the ∞ distance is 1: max{|i − k| , |j − k|} = 1.The 4-neighbourhood of pixel (i, j) consists of its four 4-adjacent neighbours and the 8-neighbourhood is defined similarly.
The connectivity of a set of pixels is then determined according to a specified adjacency relation.If we choose to use the 8-neighbourhood for pixels in both the foreground and the background, however, counter-intuitive situations may arise such as a simple closed digital curve that does not separate the plane into two pieces.The resolution of this within digital topology is to treat pixels in the foreground as connected with respect to the 8-neighbourhood and pixels in the background with the 4-neighbourhood, or vice-versa [17].
We now proceed to construct a set, C, of piecewise-linear curves that subdivide the plane so that each connected component of R 2 \ C contains pixels of only one type (either foreground or background), and such that the digital connected components of F and G are in one-to-one correspondence with those of R 2 \ C. As described above, we use 8-connectivity for the foreground and 4-connectivity for the background.We assume (and in practice add) a layer of background pixels to any given rectangular array, P , to ensure there is a single connected background component surrounding all other components.Definition 6.2.Boundary points.For every pair of 4-adjacent pixels such that P (i, j) = 1 and P (k, l) = 0, define the boundary point p = ( 1 2 (i + k), 1 2 (j + l)).There are only four possible configurations.For example if (i, j) ∈ F and its direct neighbour (i + 1, j) ∈ G, then p = (i + 1 2 , j); the other three cases are simple adjustments to this pattern.Note that since (i, j) and (k, l) are 4adjacent, the boundary point has only one coordinate with the 1  2 offset and one remaining an integer.See Fig. 2 for an illustrative example.
The next step is to connect pairs of boundary points by line segments in such a way that the foreground and background pixel connectivities are respected.This is achieved by exhaustive enumeration of 2 × 2 pixel patches as illustrated in Fig. 3. Lemma 6.3.Let C ⊂ R 2 be the union of boundary points and edges derived from a binary digital array P .The set C is a disjoint union of simple closed piecewise linear curves.
Proof.Let P be the n r × n c array with rows indexed by i = 1, . . ., n r and columns by j = 1, . . ., n c .By assumption, the outermost rows and columns of P are background, i.e., P (1, j) = P (n r , j) = P (i, 1) = P (i, n c ) = 0.Each boundary point sits half-way between two 4-adjacent pixels with distinct values, so every boundary point has first coordinate 1 < p 1 < n r and second coordinate 1 < p 2 < n c .It follows that every boundary point must belong to exactly two adjacent 2 × 2 pixel patches and that every boundary point connects to exactly two boundary edges, by construction.As a combinatorial object then, each component of C is a discrete closed 1-manifold.Also by construction (see  Clearly ∂A ⊂ C, we must now show that ∂A ⊃ C. Let p ∈ C, i.e., p is an arbitrary point on one of the boundary edge segments.We can write the coordinates of p as (i + , j + η) for integers i, j and fractional parts , η ∈ [0, 1).We know that each boundary edge divides the 2 × 2 patch with corners (i, j), (i + 1, j), (i + 1, j + 1), (i, j + 1), into two pieces such that at least one of these corners has P (k, l) = 1 and this implies that p ∈ ∂A.
The above results show that cl(A) and X = ∂A satisfy the conditions for the theorem(s) of Section 4.2 as X is a finite union of disjoint piecewise-linear 1manifolds.We then define B to be the closed complement of A in the rectangular domain of the image, B = ([1, n r ] × [1, n c ]) \ A. A straightforward argument by contradiction shows that no background pixel lies in A, so we have P −1 [0] ⊂ B. Remark 6.5.Given a three-dimensional binary array of voxels, V (i, j, k), there are analogous definitions of direct-adjacency between elements, and results that require foreground and background to be viewed with complementary adjacencies to maintain topological consistency [17].There are also established methods to construct a triangular mesh surface that separates the connected components of foreground and background.These are termed 'marching cubes algorithms' [19].

Breaking ties and other practical considerations
In this section we derive additional results required to extend theorems from Section 4 so that they hold for the digital boundary curves.In particular, Theorem 4.18 specified that the height function in direction v is a Morse function, i.e., that the critical points are isolated and the critical values are distinct.Both these conditions are challenged by the geometry of a digital grid as the boundary curve points lie at integer and half-integer coordinates, and the boundary curve edges are either horizontal, vertical or in one of two diagonal directions.Additionally, the direction vectors v are typically chosen to be equal-spaced rational fractions of π, and will often be perpendicular to some boundary edges.This means that when computing the XPHT for equiangular directions v we expect many vertices of the boundary curves to have the same height with respect to any given v.
Our computations of persistent homology involve height filtrations of boundary curves considered as simplicial complexes.The algorithm for computing persistent homology of simplicial complexes orders simplices by their maximal value with lower-dimensional simplices added before higher-dimensional ones if their maximal values are the same.It is well understood that the persistent homology of this discrete filtration of complexes gives the same persistence diagram as that of a continuous filtration of a PL-embedding of the complex.We do, however, need to explore how a filtration with multiple simplices taking the same height with respect to direction v relates to the critical points of a piecewise-linear Morse function constructed from an arbitrarily close direction v t .
We first need to generalise the notion of 0-critical point to allow for line segments Definition 6.6.Let γ be a piecewise-linear non-intersecting curve in R 2 with m vertices traversed in cyclic order, x 0 , x 1 , . . ., x m = x 0 .Note that in the following text the indices are assumed to be given as integers modulo m.We say ).We say that the line segment from x j to x k is a 0-critical segment if h v (x i ) = h v (x j ) for all i = j, j + 1, . . ., k and that h v (x j−1 ) > h v (x j ) and h v (x k ) < h v (x k+1 ).Denote this line segment as e(x j , x k ). are close to co-linear, because we want to avoid any possible issues with floating point errors in computations.Lemma 6.9.Let A ⊂ R 2 be a bounded subset whose boundary is the disjoint union of piecewise linear closed curves.Let γ = (x 0 , x 1 , x 2 , . . .x m = x 0 ) be a piecewise linear boundary curve of A with vertices listed anticlockwise with respect to A. Fix v ∈ S 2 .Let x i an isolated 0-critical vertex or a vertex in a 0-critical segment e with respect to the function h γ v .Furthermore suppose that ∠(x i−1 , x i , x i+1 ) > π/2.Let w i denote the rotation of the vector x i+1 − x i anticlockwise by π/2.
Since γ is traced anticlockwise around A and ∠(x i−1 , x i , x i+1 ) > π/2 we know that w, the rotation of x i+1 − x i anticlockwise by π/2, will point into A from x i .Set y = x i + w i .If we rotate anticlockwise around x i from direction w i we encounter x i−1 −x i within a rotation of π.This follows from ∠(y, x i , x i+1 ) = π/2 and ∠(x i−1 , x i , x i+1 ) > π/2.Since w i points into A from x i , for small > 0, we can cover A ∩ B(x i , ) by triangles ∆(x i−1 , x i , y) and ∆(y, x i , x i+1 ). If Similarly every point a ∈ ∆(y, x i , x i+1 ) also satisfies h v (a) ≥ h v (x i ).Together these imply that x i is (+)-critical.
If w i • v < 0 then h v decreases along t → x i + tw i , showing that for all > 0 there are points in a ∈ A ∩ B(x i , ) with h v (a ) < h v (x i ). this implies x i is not (+)-critical.
We are now ready to state a related theorem to Theorem 4.18 for PL subsets of the plane where we drop the Morse condition.Theorem 6.10.Let A ⊂ R 2 be a 2-dimensional piecewise linear manifold with boundary X = ∂A.Fix v ∈ S 1 .The 0-dimensional persistent homology of h X v : X → R can be written as where y j1 , . . .y jm are the set of vertex representatives, and Here we have only included intervals with positive length.Let J ord be the subset of {1, 2, . . .m} such that d i is finite and Now let J rel be the subset of {1, 2, . . .m} such that d i is finite but Proof.If h A v is a Morse function the result follows directly from Theorem 4.18, so suppose that h A v is not a Morse function.Recall that since A ⊂ R 2 is a 2-dimensional piecewise linear manifold with boundary X = ∂A, a sufficient condition for h A v to be Morse will be that all the vertices in A have distinct values under h A v .Let v t be the rotation of v anticlockwise by t.Given v there is an > 0 such that for all t < we have h v (x) < h v (y) implies h vt (x) < h vt (y).We can now break the ties that imply h A v is not Morse; where h v (x) = h v (y) we have h vt (x) = h vt (y).We choose > 0 small enough that h A vt is a Morse function for all t < .
A vertex y ji will be an isolated 0-critical vertex for h X v if and only if it is an isolated vertex for h X vt , as the order of the heights of y ji−1 , y ji and y ji+1 are the same under both h v and h vt .Since DET (y ji+1 − y ji , y ji−1 − y ji ) > 0 is independent of v we know that whether or not y ji is (+)-critical is the same under h A v and h A vt by Lemma 6.8.For ease of reference later in the proof, set x ji = y ji .Now suppose e(x k , x l ) is a 0-critical segment for h A v with y ji ∈ e(x k , x l ).Since h A vt is Morse, all the vertices in e(x k , x l ) take distinct values for h X vt , with exactly one of x k or x l now an isolated 0-critical point.Denote this endpoint by x ji .Since we choose v t to be a small anticlockwise rotation of v, this choice will be a consistent tie-break for all 0 < t < .Again since DET (x ji+1 − x ji , x ji−1 − x ji ) > 0 is independent of v we know that whether or not x ji is (+)-critical is the same under h A v and h A vt by Lemma 6.8.By construction, h v (x ji ) = h v (y ji ) for all i so we have The remainder of the proof is an argument in continuity.For t ∈ (0, ) we have for some d i,t ∈ R. Since lim t→0 + v t = v we have lim t→0 + PH 0 (X, h X vt ) = PH 0 (X, h X v ) and thus lim t→0 + d i,t = d i for all i.Since each x ji is (+)-critical with respect to h A v if and only if it is (+)-critical with respect to h A vt we can apply Theorem 4.18 to say for all t ∈ (0, ) that and Rel 1 (A, h A vt ) = ⊕ i∈J rel I [(di,t,rel),(hv t (yj i ),rel)) .Taking the limit as t → 0 + completes the proof.

Implementation details
Using the theory developed in the previous sections we have implemented a package in R which takes as input a binary image and outputs the extended  The critical points for P H 0 (∂A, v) for the given direction v.The vertices marked with crosses are 0-critical points and correspond to birth events in P H 0 (∂A, v).Vertices marked with circles are 1-critical and cause a death in P H 0 (∂A, v).The same letter label is given to the paired birth and death events of a persistent homology class from P H 0 (∂A, v).persistent homology transform of the foreground of that image.The R-package is available at https://github.com/james-e-morgan/xpht.The paragraphs below describe a simple example to illustrate the sequence of steps followed when using the package.We finish this section with a fun application using the XPHT to cluster the shapes of letters from various standard fonts.
Let A denote the foreground of the binary image and X the boundary between the foreground and background as constructed in the previous section.The user chooses the number of directions K, and the unit vectors are set to v i = (cos(2πi/K), sin(2πi)/K).We can compute the extended persistent homology of A for directions v and −v from the regular persistent homology of X in direction v together with knowledge of the minimum and maximum values of h X v on each boundary curve.Therefore, when the number of directions is even, the computational time for the XPHT is halved.If the user has a collection of shapes that require centring [23] then K is required to be a multiple of four.
The first step is to label the connected components of the foreground and construct the oriented boundary curves around each of the components, labelling which curves are interior and which are exterior.Note that by Lemma 6.3 the boundary between the foreground and background is a disconnected collection of closed curves.This set of boundary curves is independent of the choice of direction and is computed only once.For an example see Figure 4.
For each direction v the regular 0-dimensional persistent homology of the boundary curves can be computed very efficiently using the union-find data structure.Our implementation also identifies a 0-critical vertex that represents the birth of a component in the filtration of ∂A; see Figure 5.Using Lemma 6.8 or 6.9 it is also quick to determine which 0-critical points are positive critical or negative critical for the foreground.We thus can label all of the ordinary persistent homology classes as either (+)-critical or (−)-critical.This is illustrated for the example in Figure 6.
Using Theorem 6.10 we can compute the ordinary and relative persistent homology for h A v from the persistent homology of h X v together with information about which 0-critical isolated vertices and 0-critical segments are (+)-critical.Applying the duality result from Corollary 2.7 we deduce the ordinary and relative persistence modules for direction −v from those for direction v.To compute the essential classes we use Proposition 4.20.Each of the boundary curves is labelled as interior or exterior.We compute the essential classes by finding the minimum and maximum values of h X v on these boundary curves.This is illustrated for our running example in Figure 7.
Using the notation of the figure, the essential classes for the foreground A Example 7.1.We now briefly describe results from an XPHT analysis of the capital letter A and the lower-case letter g rendered using over 90 standard fonts.Each letter was created as a small binary image (130 × 130) using an 84pt font size; these are shown in Figures 8 and 9.The XPHT for each letter was computed using K = 32 directions.Fonts vary in their letter placement with respect to a baseline, so we centred the XPHT summary for each shape using the method outlined in [23].We did not scale the data as the letters have the same specified font size; this allows the different heights and widths to serve as characteristics of the font.When comparing the XPHT summaries of two shapes, we also did not need to consider angular alignment as the images are generated with a consistent orientation.We computed all pairwise distances between the XPHT summaries using both the 1-and 2-Wasserstein metrics.To demonstrate the types of shape features the XPHT captures, we use multi-dimensional scaling (MDS) to assign planar coordinates to each image.The plots in Figures 8 and 9 show that the XPHT distances capture the difference between serif and sans-serif versions of the letter A, and between single-and double-storey versions of the letter g.Of particular note is the font 'Chalkduster' (label 32) which has a textured look with small holes and rough boundary; the XPHT distances don't make this an outlier for the letter As.Chalkduster g is an outlier for that set because the bowl doesn't create a closed 1-cycle.It's also worth noting that the XPHT distances vary nicely from the double-storey letter 'g's which have β 1 = 2, to the single-storey 'g's with β 1 = 1, and that fonts such as those labelled 62 and 24, which look double-storey but have β 1 = 1 are placed in the middle of the MDS plot.
These letters are included in the R-package release and more details about the analysis are provided in the vignettes.

Future directions
This paper presents a new approach to computing persistent homology for manifolds with boundary by exploiting relationships between the extended persistent homology of a manifold with boundary to that of just the boundary.Although the focus here has been on height functions of embedded shapes in Euclidean space it is reasonable to expect that similar results could hold for other kinds of functions, such as radial functions.Future directions of research also include considering generalisations to stratified spaces, adapting ideas from stratified Morse theory as developed by Goresky and MacPherson [14].
Other areas to explore are theoretical properties of the XPHT.In particular, intuitively we would expect better stability results than for the PHT as we can introduce new essential classes with small support without dramatically changing the extended persistent homology transform.Figure 9: Lower case 'g' rendered in the same fonts and same order as used for 'A'.The 2-Wasserstein distances between each pair of letters are again visualised using MDS in 2 dimensions.In this case, there is good separation between single storey 'g' and double storey 'g' font shapes.The lower outlier labelled 32 is 'Chalkduster' and has no essential 1-cycles; its nearest neighbour is 36 'Copperplate' which renders in upper-case form.The first sample font is 'Academy Engraved'; it is placed up above the others because it has additional essential 1-cycles due to its outlined stroke style.On the far right are letters 15 and 16 (forms of 'Avantgarde'); these have the roundest bowls and largest counters, i.e., large circular upper holes.arXiv:2007.01834,2020.
[3] Ulrich Bauer and Michael Lesnick.Induced matchings of barcodes and the algebraic stability of persistence.In Proceedings of the thirtieth annual symposium on Computational geometry, pages 355-364, 2014.
Given a persistence module P = I∈S P I I we will use b(P) = {b(I I ) : I ∈ S P } and d(P) = {d(I I ) : I ∈ S P } to denote the multiset of birth parameters and death parameters in the interval decomposition of P.

Lemma 4 . 4 (
Morse Lemma).Let M be a smooth n-manifold without boundary and f : M → R a smooth function.The point p ∈ M is a regular point of f if and only if there is a chart (U, φ) where φ(p) = 0 and

Definition 4 . 5 .
Let M be an n-dimensional PL manifold without boundary and f : M → R a PL function.The point p ∈ M is a regular point of f if and only if there is a chart (U, φ) containing p of the form

Definition 4 . 6 .
Let (M, ∂M ) be a smooth (respectively PL) n-manifold with boundary and f : M → R a smooth (respectively PL) function.The point p ∈ ∂M is a regular point of f if and only if there is a

Definition 4 . 7 .
Let (M, ∂M ) be a smooth n-manifold with boundary and f : M → R a smooth function.The point p ∈ ∂M is a non-degenerate critical point of f with index (k, η) if only if there is a chart (U, φ) with φ(p) = 0 such that

Definition 4 . 9 .
Given a smooth (respectively PL) manifold with boundary (M, ∂M ), we say that f :M → R is a Morse function if • f is smooth (respectively PL)• None of the critical points of f | int(M ) and f | ∂M are degenerate.•All the critical values for f | int(M ) and f | ∂M combined are distinct and finite in number.

Definition 4 . 10 .
Suppose f : (M, ∂M ) → R is a Morse function.Let Crit(f, k) denote the set of index-k critical points of f ; these points must lie in the interior of M .Let Crit(f, (k, η)) denote the set of critical points of f | ∂M with index (k, η).If p ∈ ∂M is a critical point of f | ∂M , with index (k, η) denote the sign of p by sgn(f, p) = η.
) with the same death time and b ≤ b.By Proposition 4.16 we know that the sets of start and end parameters for the barcode decompositions satisfy b which we will denote d.This means that we can define a bijection ρ : b → b such that ρ(b) = b if there exists a d such that (b, d] ∈ XPH 0 (X, v)) ⊕ I ((−R,ord),(R,rel)) and (b , d] ∈ XPH k (A, v) ⊕ XPH k (B, v).
Again this implies there is an injective map which pairs each interval [b, d) in XPH 0 (X, v) to an interval [b , d) with the same death time and b ≤ b.This implies that our function ρ : b → b has ρ(b) ≤ b for all b ∈ b 0 (XPH 0 (X)).Together these imply ρ(b) ≤ b for all b ∈ b which, since b is finite, implies ρ is the identity.Hence the interval decompositions of XPH 0 (X, v) ⊕ I [(−R,ord),(R,rel)) and XPH 0 (A, v) ⊕ XPH 0 (B, v) are the same and they are isomorphic as persistence modules.

Proposition 4 . 20 .
Let A ⊂ R n be an n-manifold with boundary X = ∂A.Let v be a direction such that h A v : A → R is a Morse function.Let {X 1 , . . .X k } be the interior boundary components of X and {Y 1 , . . .Y l } be the exterior boundary components of X.Then Ess 0 (A, h v ) = l j=1 I [(min{hv(Yj )},ord),(max{hv(Yj )},rel))

Definition 5 . 2 .
Fix p ∈ [1, ∞) and ambient dimension n.Define the distance function d P HT p

Definition 5 . 3 .
Fix p ∈ [1, ∞) and ambient dimension n.Define the distance function d XP HT p

Figure 2 :
Figure 2: The rows and columns of a binary digital image are indexed by i and j respectively.Foreground pixels are labelled '1' and connected when 8-adjacent. Background pixels are labelled '0' and connected when 4-adjacent.Segments of the boundary curves are drawn in orange and the boundary point labelled 'p' has coordinates (i + 1 2 , j).

Figure 3 :
Figure 3: Each of the 2 4 possible 2 × 2 binary-valued pixel patches showing the associated oriented boundary edges for the case that foreground pixels connect when 8-adjacent.The edge orientation always has the foreground on the left.

Fig. 3 )
Fig. 3) any two boundary edges can only intersect at their endpoints and we conclude that each component of C is a simple closed PL-curve.Lemma 6.4.Let A be the union of components of R 2 \ C that contain at least one foreground pixel of the binary image array P .Then A is a bounded manifold with boundary ∂A = C. Proof.A is bounded because the image array is finite.Each connected component C a ∈ C is a simple closed PL-curve, so R 2 \ C a consists of two open domains.Each connected component of R 2 \ C is formed by the intersection of a finite number of these domains so is also open and it follows that A is open.Clearly ∂A ⊂ C, we must now show that ∂A ⊃ C. Let p ∈ C, i.e., p is an arbitrary point on one of the boundary edge segments.We can write the coordinates of p as (i + , j + η) for integers i, j and fractional parts , η ∈ [0, 1).We know that each boundary edge divides the 2 × 2 patch with corners (i, j), (i + 1, j), (i + 1, j + 1), (i, j + 1), into two pieces such that at least one of these corners has P (k, l) = 1 and this implies that p ∈ ∂A.

Figure 4 :
Figure 4: The input binary image A with foreground in grey.The boundary curves ∂A are oriented anticlockwise with the interior curve in orange and the exterior curves in blue.These curves are constructed using the rules illustrated in Figure 3.

Figure 5 :
Figure5: The critical points for P H 0 (∂A, v) for the given direction v.The vertices marked with crosses are 0-critical points and correspond to birth events in P H 0 (∂A, v).Vertices marked with circles are 1-critical and cause a death in P H 0 (∂A, v).The same letter label is given to the paired birth and death events of a persistent homology class from P H 0 (∂A, v).

Figure 6 :
Figure 6: Identifying the boundary curve local minima that are also local minima of h v on the foreground.The 0-critical points for X that correspond to births of finite lifetime persistence classes in P H 0 (X, v) are c, e, f, g, h and i.We have of these c, e, f and h are local minima for A and thus (+)-critical points.The remaining (g and i) are (−)-critical.

Figure 7 :
Figure 7: The minima and maxima points of h X v for each boundary curve in X.The exterior curves have (minimum,maximum) pairs labelled (m 1 , M 1 ) and (m 2 , M 2 ) and the interior curve has the pair of (m 3 , M 3 ).

Figure 8 :
Figure 8: Upper case 'A' rendered in a variety of fonts.The letter shapes are numbered 1-95 reading left to right, top to bottom in the 10 by 10 grid.The 2-Wasserstein distances between each pair of letters are visualised using MDS in two dimensions.This shows a separation between serif 'A' and sans serif 'A' fonts.The upper outlier labelled 87 is 'Trattatello' and has the smallest height and counter in this set.The lower outlier labelled 69 is 'Noteworthy-Light', a large round simple script font.On the far right is letter 59 ('Impact'), notable for having a narrow body width but heavy weight.