Finding the Homology of Manifolds using Ellipsoids

A standard problem in applied topology is how to discover topological invariants of data from a noisy point cloud that approximates it. We consider the case where a sample is drawn from a properly embedded C1-submanifold without boundary in a Euclidean space. We show that we can deformation retract the union of ellipsoids, centered at sample points and stretching in the tangent directions, to the manifold. Hence the homotopy type, and therefore also the homology type, of the manifold is the same as that of the nerve complex of the cover by ellipsoids. By thickening sample points to ellipsoids rather than balls, our results require a smaller sample density than comparable results in the literature. They also advocate using elongated shapes in the construction of barcodes in persistent homology.


Introduction
Data is often unstructured and comes in the form of a non-empty finite metric space, called a point cloud.It is often very high dimensional even though data points are actually samples from a low-dimensional object (such as a manifold) that is embedded in a high-dimensional space.One reason may be that many features are all measurements of the same underlying cause and therefore closely related to each other.For example, if you take photos of a single object from multiple angles simultaneously there is a lot overlap in the information captured by all those cameras.One of the main tasks of 'manifold learning' is to design algorithms to estimate geometric and topological properties of the manifold from the sample points lying on this unknown manifold.
One successful framework for dealing with the problem of reconstructing shapes from point clouds is based on the notion of -sample introduced by Amenta et al [2].A sampling of a shape M is an -sampling if every point P in M has a sample point at distance at most • lfs M (P), where lfs M (P) is the local feature size of P, i.e. the distance from P to the medial axis of M. Surfaces smoothly embedded in R 3 can be reconstructed homeomorphically from any 0.06-sampling using the Cocone algorithm [3].
One simple method for shape reconstructing is to output an offset of the sampling for a suitable value α of the offset parameter.Topologically, this is equivalent to taking the Čech complex or the α-complex [22].This leads to the problem of finding theoretical guarantees as to when an offset of a sampling has the same homotopy type as the underlying set.In other words, we need to find conditions on a point cloud S of a shape M so that the thickening of S is homotopy equivalent to M. This only works if the point cloud is sufficiently close to M, i.e. when there is a bound on the Hausdorff distance between S and M.
Niyogi, Smale and Weinberger [27] proved that this method indeed provides reconstructions having the correct homotopy type for densely enough sampled smooth submanifolds of R n .More precisely, one can capture the homotopy type of a Riemannian submanifold M without boundary of reach τ in a Euclidean space from a finite 2 -dense sample S ⊆ M whenever Let us denote the Hausdorff distance between S and M by κ -that is, every point in M has an at most κ-distant point in S. We can rephrase the above result as follows: whenever 2κ < 3  5 τ, the homotopy type of M is captured by a union of -balls with centers in S for every ∈ R (2κ, √ represents how dense we need the sample to be in order to be able to recover the homotopy type of M. Other authors gave variants of Niyogi, Smale and Weinberger's result.In [9,Theorem 2.8], the authors relax the conditions on the set we wish to approximate (it need not be a manifold, just any non-empty compact subset of a Euclidean space) and the sample (it need not be finite, just non-empty compact), but the price they pay for this is a lot lower upper bound on κ τ , which in their case is 1 6 ≈ 0.167.One can potentially improve the result by using local quantities (µ-reach etc.) [12], [14], [16], [28], [4], [5] instead of the global reach τ, at least in situations when these are large compared to τ. Due to the difficulty and length of our current work, we take only global features into account, and leave the generalization to local features for future work.
In practice producing a sufficiently dense sample can be difficult or requires a long time [20], so relaxing the upper bound of κ τ is desirable.The purpose of this paper is to prove that we can indeed relax this bound when sampling manifolds (though we allow a more general class than [27]) if we thicken sample points to ellipsoids rather than balls.The idea is that since a differentiable manifold is locally well approximated by its tangent space, an ellipsoid with its major semi-axes in the tangent directions well approximates the manifold.This idea first appeared in [8], where the authors construct a filtration of "ellipsoid-driven complexes", where the user can choose the ratio between the major (tangent) and the minor (normal) semi-axes.Their experiments showed that computing barcodes from ellipsoiddriven complexes strengthened the topological signal, in the sense that the bars corresponding to features of the data were longer.In our paper we make the ratio dependent on the persistence parameter and give a proof that the union of ellipsoids around sample points (under suitable assumptions) deformation retracts onto the manifold.Hence our paper gives theoretical guarantees that the union of ellipsoids captures the manifold's homotopy type, and thus further justifies the use of ellipsoid-inspired shapes to construct barcodes.
In this paper we assume that the information about the reach of the manifold and its tangent and normal spaces in the sample points are given.In practice, these quantities can be estimated from the sample, see e.g.[1,6,30,25,29].
The central theorem of this paper (Theorem 6.1) is the following: Theorem.Let n ∈ N and let M be a non-empty properly embedded C 1 -submanifold of R n without boundary.Let S ⊆ M be a subset of M, locally finite in R n (the sample from the manifold M).Let τ be the reach of M in R n and κ the Hausdorff distance between S and M. Then for all p ∈ R [0.5τ,0.96τ]which satisfy κ < 2p τ(p + 2τ) − τ − 0.55τ 2 there exists a strong deformation retraction from E p (the union of open ellipsoids around sample points with normal semi-axes of length p and tangent semi-axes of length τp + p 2 ) to M. In particular, M, E p and the nerve complex of the ellipsoid cover (E p (S)) S∈S are homotopy equivalent, and so have the same homology.
By replacing the balls with ellipsoids, we manage to push the upper bound on κ τ to approximately 0.913, an improvement by a factor of about 2.36 compared to [27].In other words, our method allows samples with less than half the density.
The paper is organized as follows.Section 2 lays the groundwork for the paper, providing requisite definitions and deriving some results for general differentiable submanifolds of Euclidean spaces.In Section 3 we calculate theoretical bounds on the persistence parameter p: the lower bound ensures that the union of ellipsoids covers the manifold and the upper bound ensures that the union does not intersect the medial axis.Part of our proof relies on the normal deformation retraction working on intersections of ellipsoids which appears too difficult to prove theoretically by hand, so we resort to a computer program, explained in Section 4. In Section 5 we construct a deformation retraction from the union of ellipsoids to the manifold.The section is divided into several subsections for easier reading.Section 6 collects the results from the paper to prove the main theorem.In Section 7 we discuss our results and future work.Glossary: pr the map A → M taking a point to the unique closest point on M prv the map taking a point X to the vector pr(X) − X V S auxiliary vector field, defined on E p (S) V auxiliary vector field, defined on E p V the vector field of directions for the deformation retraction Φ the flow of the vector field V R a deformation retraction from E p to a tubular neighbourhood of M

General Definitions
All constructions in this paper are done in an ambient Euclidean space R n , n ∈ N, equipped with the usual Euclidean metric d.We will use the symbol N for a general submanifold of R n .By definition each point X of a manifold N has a neighbourhood, homeomorphic to a Eu-clidean space or a closed Euclidean half-space.The dimension of this (half-)space is the dimension of N at X. Different points of a manifold can have different dimensions1 , though the dimension is constant on each connected component.In this paper, when we say that N is an m-dimensional manifold, we mean that it has dimension m at every point.
We quickly recall from general topology that it is equivalent for a subset of a Euclidean space to be closed and to be properly embedded.
Proposition 2.1.Let (X , d) be a metric space in which every closed ball is compact (every Euclidean space R n satisfies this property).The following statements are equivalent for any subset S ⊆ X .
1. S is a closed subset of X .
2. S is properly embedded into X , i.e. the inclusion S → X is a proper map2 .
3. S is empty or distances from points in the ambient space to S are attained.That is, for every X ∈ X there exists Y ∈ S such that d(X, S) = d(X, Y). Proof.
If S is closed in X , then its intersection with a compact subset of X is compact, so S is properly embedded into X .
• (2 ⇒ 3) If S is non-empty, pick S ∈ S. For any X ∈ X we have d(X, S) ≤ d(X, S), so d(X, S) = d X, S ∩ B d(X,S) (X) .Since S is properly embedded in X , its intersection with the compact closed ball B d(X,S) (X) is compact also.A continuous map from a non-empty compact space into reals attains its minimum, so there exists The empty set is closed.Assume that S is non-empty.Then for every point in the closure X ∈ S we have d(X, S) = 0.By assumption this distance is attained, i.e. we have Y ∈ S such that d(X, Y) = 0, so X = Y ∈ S. Thus S ⊆ S, so S is closed.
In this paper we consider exclusively submanifolds of a Euclidean space which are properly embedded, so closed subsets.We mostly use the term 'properly embedded' instead of 'closed' to avoid confusion: the term 'closed manifold' is usually used in the sense 'compact manifold with no boundary' which is a stronger condition (a properly embedded submanifold need not be compact or without boundary, though every compact submanifold is properly embedded).
A manifold can have smooth structure up to any order k If N is at least a C 1 -manifold, one may abstractly define the tangent space T X N and the normal space N X N at any point X ∈ N (X is allowed to be a boundary point).As we restrict ourselves to submanifolds of R n , we also treat the tangent and the normal space as affine subspaces of R n , with the origins of T X N and N X N placed at X.The dimension of T X N (resp.N X N ) is the same as the dimension (resp.codimension) of N at X.Because of this and because T X N and N X N are orthogonal, they together generate R n .
Definition 2.2.Let N be a C 1 -submanifold of R n , X ∈ N and m the dimension of N at X.
• A tangent-normal coordinate system at X ∈ N is an n-dimensional orthonormal coordinate system with the origin in X, the first m coordinate axes tangent to N at X and the last n − m axes normal to N at X.
• A planar tangent-normal coordinate system at X ∈ N is a two-dimensional plane in R n containing X, together with the choice of an orthonormal coordinate system lying on it, with the origin in X, the first axis (the abscissa) tangent to N at X and the second axis (the ordinate) normal to N at X.
Recall from Proposition 2.1 that distances from points to a non-empty properly embedded submanifold are attained.However, these distances need not be attained in just one point.
As usual, we define the medial axis A N of a submanifold N ⊆ R n as the set of all points in the ambient space for which the distance to N is attained in at least two points: If N is empty, so is A N , though the medial axis can be empty even for non-empty manifolds (consider for example a line or a line segment in a plane).The manifold and its medial axis are always disjoint.
The reach of N , denoted by τ N , is the distance between the manifold N and its medial axis A N (if A N is empty, the reach is defined to be ∞).
Definition 2.3.Let N be a C 1 -submanifold of R n , X ∈ N and N a non-zero normal vector to N at X.The τ N -ball, associated to X and N , is the closed ball (in R n , so n-dimensional) with radius τ N and centered at X + τ N N N , which therefore touches N at X. 3 A τ N -ball, associated to X, is the τ N -ball, associated to X and some non-zero normal vector to N at X.
The significance of associated τ N -balls is that they provide restrictions to where a manifold is situated.Specifically, a manifold is disjoint with the interior of its every associated τ N -ball.
We will approximate manifolds with a union of ellipsoids (similar as to how one uses a union of balls to approximate a subspace in the case of a Čech complex).The idea is to use ellipsoids which are elongated in directions, tangent to the manifold, so that they "extend longer in the direction the manifold does", so that we require a sample with lower density.
Let us define the kind of ellipsoids we use in this paper.Definition 2.4.Let N be a C 1 -submanifold of R n and p ∈ R >0 .The tangent-normal open (resp.closed) p-ellipsoid at X ∈ N is the open (resp.closed) ellipsoid in R n with the center in X, the tangent semi-axes of length τ N p + p 2 and the normal semi-axes of length p. Explicitly, in a tangent-normal coordinate system at X the tangent-normal open and closed p-ellipsoids are given by where m denotes the dimension of N at X.If τ N = ∞, then these "ellipsoids" are simply thickenings of T X N : Observe that the definitions of ellipsoids are independent of the choice of the tangent-normal coordinate system; they depend only on the submanifold itself.
The value p in the definition of ellipsoids serves as a "persistence parameter" [24], [10], [11], [21], [8].We purposefully do not take ellipsoids which are similar at all p (which would mean that the ratio between the tangent and the normal semi-axes was constant).Rather, we want ellipsoids which are more elongated (have higher eccentricity) for smaller p.This is because on a smaller scale a smooth manifold more closely aligns with its tangent space, and then so should the ellipsoids.We want the length of the major semi-axes to be a function of p with the following properties: for each p its value is larger than p, and when p goes to 0, the function value also goes to 0, but the eccentricity goes to 1.In addition, the function should allow the following argument.If we change the unit length of the coordinate system, but otherwise leave the manifold "the same", we want the ellipsoids to remain "the same" as well, but the reach of the manifold changes by the same factor as the unit length, which the function should take into account.The simplest function satisfying all these properties is arguably τ N p + p 2 , which turns out to work for the results we want.
Figure 1 shows an example, how a manifold, associated balls and a tangent-normal ellipsoid look like in a tangent-normal coordinate system at some point on the manifold.
We now prove a few results that will be useful later.
Lemma 2.5.Let N be a properly embedded C 1 -submanifold of R n .Let X ∈ N and let m be the dimension of N at X. Assume 0 < m < n.
1.For every Y ∈ R n a planar tangent-normal coordinate system at X ∈ N exists which contains Y. Without loss of generality we may require that the coordinates of Y in this coordinate system are non-negative (Y lies in the closed first quadrant).
2. If p ∈ R >0 , Y ∈ ∂E p (X) and N is a vector, normal to ∂E p (X) at Y, then we may additionally assume that the planar tangent-normal coordinate system from the previous item contains N .
3. Let O be a closed (n − m + 1)-dimensional ball, C 1 -embedded in R n (in particular ∂O is a C 1 -submanifold of R n , diffeomorphic to an (n − m)-dimensional sphere).Assume that O ∩ ∂N = ∅ and that N and ∂O intersect transversely in X.Then X is not the only intersection point, i.e. there exists Y ∈ N ∩ ∂O \ {X}.
4. Assume τ N < ∞.Let Y ∈ R n and let (y T , y N ) be the (non-negative) coordinates of Y in the planar tangent-normal coordinate system from the first item.Let D be the set of centers of all τ N -balls, associated to X (i.e. the (n − m − 1)-dimensional sphere within N X N with the center in X and the radius τ N ).Let C be the cone which is the convex hull of D ∪ {Y}, and assume that C ∩ ∂N = ∅.Then Proof.
2. Assume that Y ∈ ∂E p (X) and N is a direction, normal to ∂E p (X).In the n-dimensional tangent-normal coordinate system from the previous item, the boundary ∂E p (S) is given by the equation The gradient of the left-hand side, up to a scalar factor, is The vector N has to be parallel to it since ∂E p (X) has codimension 1, i.e. a non-zero λ ∈ R exists such that N = λ τ N p+p 2 a + λ p 2 b.Hence N also lies in the plane, determined by a and b.This proof works for τ N < ∞, but the required modification for τ N = ∞ is trivial.
3. Since O is a compact (n − m + 1)-dimensional disk and ∂N is closed, some thickening of O exists -denote it by T -which is diffeomorphic to an n-dimensional ball and is still disjoint with ∂N .With a small perturbation of N around N ∩ ∂T (but away from the intersection N ∩ O which must remain unchanged) we can achieve that N and ∂T only have transversal intersections [26].
Imagine R n embedded into its one-point compactification S n (denote the added point by ∞) in such a way that T is a hemisphere.Replace the part of N outside of T with a copy of N ∩ T , reflected over ∂T , and denote the obtained space by N .This is an embedding of the so-called double of the manifold N ∩ T .Then N is a manifold without boundary, closed in the sphere, and therefore compact.If necessary, perturb it slightly around the point ∞, so that ∞ / ∈ N .Hence N is a compact submanifold in R n without boundary and C 1 -smooth everywhere except possibly on N ∩ ∂T .The double of a C 1 -manifold can be equipped with a C 1 -structure.Therefore we can use Whitney's approximation theorem [26] to adjust the embedding of N on a neighbourhood of ∂T away from O, so that it is C 1 -smooth everywhere.The result is a compact manifold N without boundary satisfying all the properties we required of N , and we have N ∩ O = N ∩ O.This shows that we may without loss of generality assume that N is compact without boundary.
Any compact k-dimensional submanifold of S n without boundary represents an element in the cohomology H k (S n ; Z 2 ) (we take the Z 2 -coeficients, so that we do not have to worry about orientation).For elements is the intersection number of N and ∂O (times the generator).Since the cohomology of S n is trivial except in dimensions 0 and n, we have [N ] = [∂O] = 0, and hence [N ] [∂O] = 0.But the local intersection number in the transversal intersection X is 1, and the intersection number is the sum of local ones, so X cannot be the only point in N ∩ ∂O.

First consider the case when
Then the cone C is homeomorphic to an (n−m+1)-dimensional closed ball.This C and its boundary are smooth everywhere except in Y and on D. Let E be the (n − m + 1)-dimensional affine subspace which contains Y and N X N (thus the whole C).We can smooth ∂C around the centers of the associated balls within E without affecting the intersection with N since N is disjoint with the interiors of the associated τ Then we can also smooth ∂C around Y within E without affecting the intersection with N .The boundary smoothed in this way is diffeomorphic to an (n − m)-dimensional sphere, and so by the generalized Schoenflies theorem splits E into the inner part, diffeomorphic to an (n−m+1)-dimensional ball, and the outer unbounded part.Since N intersects ∂C and therefore also its smoothed version orthogonally in X, this intersection is transversal.By the previous item another intersection point X ∈ N ∩ ∂C \ {X} exists.It cannot lie in N X N since we would then have a manifold point in the interior of some associated ball, so X must lie on the lateral surface of the cone.That is, X lies on the line segment between Y and some associated ball center, but it cannot lie in the interior of the associated ball, so d(X , Y) is bounded by the distance between Y and the furthest associated ball center, decreased by τ N .The furthest center is the one within the starting planar tangent-normal coordinate system that has coordinates (0, −τ N ).Thus Lemma 2.6.Let A, B ∈ R ≥0 which are not both 0 and let τ ∈ R >0 .Then a unique q ∈ R >0 exists which solves the equation Moreover, this q depends continuously on A and B, and if (A, B) → (0, 0) (with τ fixed), then q → 0.
Proof.If A = 0, then clearly q = √ B > 0 works.If B = 0, then the unique positive solution to the quadratic equation Assume that A, B > 0. Multiply the equation from the lemma by q 2 (τ + q) and take all terms to one side of the equation to get since A + B > 0, both zeros are real and one is negative, the other positive.Let z denote the positive zero.We have f (0) = −τB < 0 and f is ≤ 0 on R [0,z] , so f cannot have a zero here, and f (z) < 0. Since f is strictly increasing on R >z and lim x→∞ f (x) = ∞, we conclude that f has a unique zero on R >z and therefore also on R >0 .
Since q is the root of the polynomial q 3 + τq 2 − (A + B)q − τB and polynomial roots depend continuously on the coefficients, q depends continuously on A and B as well.In particular, if A and B tend to 0, then q tends to one of the roots of q 3 + τq 2 .It cannot tend to −τ since it is positive, so it tends to 0.
Given a properly embedded C 1 -submanifold N ⊆ R n without boundary and a point Y ∈ N , the dimension of N at which we denote by m, let us define the continuous function q Y : R n → R ≥0 in the following way.
Definition 2.7.If τ N = ∞, then q Y (X) := d(X, N ) (this also covers the case m = n since then necessarily N = R n ).Otherwise, if N has dimension 0, then q Y (X) := d(X, Y).If both the dimension and codimension of N are positive and τ N < ∞, we split the definition of q Y into two cases.Let q Y (Y) := 0. For X ∈ R n \ {Y} introduce a tangent-normal coordinate system with the origin in Y (it exists by Lemma 2.5(1)).Let X = (x 1 , . . ., x n ) be the coordinates of X in this coordinate system.Define q Y (X) to be the unique element in R >0 which satisfies the equation Since the sum of squares of coordinates is independent of the choice of an orthonormal coordinate system, this equation depends only on X and Y. Lemma 2.6 guarantees existence, uniqueness and continuity of q Y (X).
The point of this definition is that (except in the case m = n, when all ellipsoids are the whole R n ) the unique ellipsoid of the form E r (Y) which has X in its boundary has r Let m be the dimension of For 0 < m < n we rely on Lemma 2.5.There is a planar tangent-normal coordinate system which has the origin in Y and contains X.We can additionally assume that the axes are oriented so that X is in the closed first quadrant.Since X ∈ ∂E qY(X) (Y), there exists such that the coordinates of X in this coordinate system are τ N q + q 2 cos(ϕ), q sin(ϕ) , where we have shortened q := q Y (X).Hence Clearly, the last expression is the largest where the function Thus the distance d(N , X) is the largest in the normal space at Y, where we get Let us also recall some facts about Lipschitz maps that we will need later.A map f between subsets of Euclidean spaces is Lipschitz when it has a Lipschitz coefficient C ∈ R ≥0 , so that for all X, Y in the domain of f we have A function is locally Lipschitz when every point of its domain has a neighbourhood such that the restriction of the function to this neighbourhood is Lipschitz.
Let f and g be maps with Lipschitz coefficients C and D, respectively.Then clearly C + D is a Lipschitz coefficient for the functions f + g and f − g, and C • D is a Lipschitz coefficient for g • f (whenever these functions exist).
For bounded functions the Lipschitz property is preserved under further operations.A function being bounded is meant in the usual way, i.e. being bounded in norm.
Lemma 2.9.Let f and g be maps between subsets of Euclidean spaces with the same domain.
Assume that f and g are bounded and Lipschitz.
is bounded Lipschitz. 4.Assume g takes values in R and has a positive lower bound m ∈ R >0 .Then the map x → f (X) g(X) is bounded Lipschitz.
Proof.Let M be an upper bound for the norms of f and g and let C be a Lipschitz coefficient for f and g.Let X, X and X be elements of the domain of f and g.
Lipschitz property: Corollary 2.10.Let (U i ) i∈I be a locally finite open cover of a subset U of a Euclidean space, (f i ) i∈I a subordinate smooth partition of unity and (g i : U i → R n ) i∈I a family of maps.Let g : U → R n be the map, obtained by gluing maps g i with the partition of unity f i , i.e.
Then if all g i are locally Lipschitz, so is g.
Proof.Every continuous map is locally bounded, including the derivative of a smooth map, the bound on which is then a local Lipschitz coefficient for the map.We can apply this for f i .
Given x ∈ U , pick an open set V ⊆ U , for which the following holds: x ∈ V , there is a finite set of indices F ⊆ I such that V intersects only U i with i ∈ F and V ⊆ i∈F U i , and the maps f i and g i are bounded and Lipschitz on V for every i ∈ F .Then which is Lipschitz on V by Lemma 2.9.

Calculating Bounds on Persistence Parameter
Having derived some results for more general manifolds, we now specify the manifolds for which our main theorem holds.We reserve the symbol M for such a manifold.
Let M be a non-empty m-dimensional properly embedded C 1 -submanifold of R n without boundary, and let A be its medial axis.Let τ denote the reach of M. In this section we assume τ < ∞ and in Sections 4 and 5 we assume τ = 1.We will drop these assumptions on τ for the main theorem in Section 6.
By Proposition 2.1 and the definition of a medial axis the map pr : R n \A → M, which takes a point to its closest point on the manifold M, is well defined.We also define prv : R n \A → M, prv(X) := pr(X) − X.We view prv(X) as the vector, starting at X and ending in pr(X).This vector is necessarily normal to the manifold, i.e. it lies in N pr(X) M. By the definition of the reach, the maps pr and prv are defined on M τ .
Lemma 3.1.For every r ∈ R (0,τ) the maps pr and prv are Lipschitz when restricted to M r , with Lipschitz coefficients τ τ−r and τ τ−r +1, respectively.Hence these two maps are continuous on M τ .
Proof.The map pr is Lipschitz on M r by [13, Proposition 2] with a Lipschitz coefficient τ τ−r [23,Theorem 4.8(8)].As a difference of two Lipschitz maps, the map prv is Lipschitz as well, with a Lipschitz coefficient τ τ−r + 1.The maps pr and prv are therefore continuous on M r for all r ∈ R (0,τ) , and hence also on the union M τ = r∈R (0,τ) M r .
We want to approximate the manifold M with a sample.We assume that the sample set S is a non-empty discrete subset of M, locally finite in R n (meaning, every point in R n has a neighbourhood which intersects only finitely many points of S).It follows that S is a closed subset of R n .
Let κ denote the Hausdorff distance between M and S. We assume that κ is finite.This value represents the density of our sample: it means that every point on the manifold M has a point in the sample S which is at most κ away.
Since M is properly embedded in R n and κ < ∞, the sample S is finite if and only if M is compact.A properly embedded non-compact submanifold without boundary needs to extend to infinity and so cannot be sampled with finitely many points (think for example about the hyperbola in the plane, x 2 − y 2 = 1).As it turns out, we do not need finiteness, only local finiteness, to prove our results.
If the sample is dense enough in the manifold, it should be a good approximation to it.Specifically, we want to recover at least the homotopy type of M from the information, gathered from S. A common way to do this is to enlarge the sample points to balls, the union of which deformation retracts to the manifold, so has the same homotopy type (in other words, we consider a Čech complex of the sample).
In this paper we use ellipsoids instead of balls.The idea is that a tangent space at some point is a good approximation for the manifold at that point, so an ellipsoid with the major semi-axes in the tangent directions should better approximate the manifold than a ball.Consequently we should require a less dense sample for the approximation.This idea indeed pans out (as demonstrated by Theorem 6), though it turns out that the standard methods, used to construct the deformation retraction from the union of balls to the manifold, do not work for the ellipsoids.We want a deformation retraction from E p to M. Clearly this will not work for all p ∈ R >0 .If p is too small, E p covers only some blobs around sample points, not the whole M. If p is too large, E p reaches over the medial axis A, therefore creates connections which do not exist in the manifold, so differs from it in the homotopy type.This suggests that the lower bound on p will be expressed in terms of κ (the denser the sample, the smaller the required p for E p to cover M), and the upper bound on p will be expressed in terms of τ (the further away the medial axis, the larger we can make the ellipsoids so that they still do not intersect the medial axis).Lemma 3.2.
is an open cover of M.
1. Take any X ∈ M. By assumption there exists S ∈ S such that d(X, S) ≤ κ.We claim that X ∈ E p (S). Assume hereafter that 0 < m < n.Choose a planar tangent-normal coordinate system with the origin in S which contains X (use Lemma 2.5(1)).In this coordinate system the boundary of E p (S) is given by the equation x 2 τp+p 2 + y 2 p 2 = 1.A routine calculation shows that it intersects the boundaries of the τ -balls, associated to S (with centers in C = (0, τ) and C = (0, −τ)), given by the equations x 2 + (y ± τ) 2 = τ 2 , in the points the norm of which is r := 2p τ(p + 2τ) − τ > κ ≥ d(X, S).It follows that within the given two-dimensional coordinate system Since S ∈ M and the reach of M is τ, the manifold M does not intersect the open τ -balls, associated to S, so X ∈ E p (S).
2. The derivative of the given function is which is positive for p, τ > 0 which assures the existence of the required λ.Calculated with Mathematica, the actual value is We can strengthen this result to thickenings of M. Given r ∈ R >0 , we denote the open and closed r-thickening of M by Let us now also get an upper bound on p.
Lemma 3.4.Assume p ∈ R (0,τ) .Then E p ⊆ M τ ; in particular E p and E p do not intersect the medial axis of M.
Proof.Take any S ∈ S and X ∈ E p (S).By Lemma 2.8 we have d(M, X) ≤ q S (X) ≤ p < τ.
The results in this section give the theoretical bounds on the persistence parameter p, within which we look for a deformation retraction from E p to M, which we summarize in the following corollary.

Program
In this section (as well as the next one) we assume that τ = 1 and 0 < m < n.
Our goal is to prove that if we restrict the persistence parameter p to a suitable interval, the union of ellipsoids E p deformation retracts to M. Recall that the normal deformation retraction is the map retracting a point to its closest point on the manifold, i.e. the convex combination of a point and its projection: (X, t) → (1 − t) X + t pr(X) = X + t prv(X).For example, in [27] this is how the union of balls around sample points is deformation retracted to the manifold.
The same idea does not in general work for the union of ellipsoids, or any other sufficiently elongated figures.Figure 3 shows what can go wrong.However, it turns out that the only places where the normal deformation retraction does not work are the neighbourhoods of tips of some ellipsoids which avoid all other ellipsoids.This section is dedicated to proving the following form of this claim: for all points in at least two ellipsoids the normal deformation retraction works.This means that the line segment between a point X and pr(X) is contained in the union of ellipsoids, but actually more holds: the line segment is contained already in one of the ellipsoids.More formally, the rest of the section is the proof of the following lemma.
Lemma 4.1.For every X ∈ E p , if there are S , S ∈ S, S = S such that X ∈ E p (S ) ∩ E p (S ), then there exists S ∈ S such that X, pr(X) ∈ E p (S).By convexity the entire line segment between X and pr(X) is therefore in E p (S).
To prove this, we would in principle need to examine all possible configurations of ellipsoids and a point.However, we can restrict ourselves to a set of cases, which include the "worst case scenarios".
Let S , S ∈ S be two different sample points, and let X ∈ E p (S ) ∩ E p (S ) (we purposefully take closed ellipsoids here).Denote Y := pr(X).We claim that there is S ∈ S (not necessarily distinct from S and S ) such that X ∈ E p (S) and Y ∈ E p (S). Due to convexity of ellipsoids, the line segment XY is in E p (S); with the possible exception of the point X, this line segment is in E p (S).
Assuming p ∈ R (λ,1) , the point Y is covered by at least one open ellipsoid.Suppose that none of the closed ellipsoids, containing Y in their interior, contains X.Let us try to construct a situation where this is most likely to be the case.We will derive a contradiction by showing that even in these "worst case scenarios" we fail in satisfying this assumption.
To determine whether a point X is in the ellipsoid with the center S , the following two pieces of information are sufficient: the distance between X and S , and the angle between the line segment XS and the normal space N S M.Moreover, membership of X in the ellipsoid is "monotone" with respect to these two conditions: if a point is in the ellipsoid, it will remain so if we decrease its distance to S or increase the angle to the normal space.
We will produce a set of configurations which include the extremal points for these two criteria (maximal distance from the ellipsoid center, minimal angle to the normal space).If every such point is still in the ellipsoid, then all possible points are.
Consider a planar tangent-normal coordinate system with the origin in S which contains X in the fourth quadrant (nonnegative tangent coordinate, nonpositive normal coordinate).In this coordinate system, the manifold passes horizontally through S .Consider the part of the manifold with positive tangent coordinate (i.e. the part of the manifold rightwards of S ).
The fastest that this piece can turn away from X is in this plane along the boundary of the upper τ-ball, associated to S . 5Suppose the manifold continues along this path until some point X , and consider a plane containing the points X, X and S where the distance between S ∈ S and Y is bounded by κ, so Y ∈ E p (S). Going from X to S, the quickest way to turn the normal direction towards X is within this plane, and along a τ-arc.While this second plane need not be the same as the first one, they intersect along the line containing X and X .We can turn the half-plane containing S and the half-plane containing S along the line so that they form one plane, and that will be the configuration where it is equally (un)likely for E p (S) to contain X, but where S , X, X , Y and S all lie in the same plane.
We can make the same argument starting from S instead of S , so we conclude the following: if our claim fails for some configuration of X, Y, S , S , S, then it fails in a planar case where the part of the manifold connecting points S and S consists of (at most) three τ-arcs, as in Figure 4. We started with the assumption X ∈ E p (S ) ∩ E p (S ), but we may without loss of generality additionally assume X ∈ ∂E p (S ).If we had a counterexample X to our claim in the interior of all ellipsoids containing X, we could project it in the opposite direction of pr(X) to the first ellipsoid boundary we hit, and declare the center of that ellipsoid to be S .
Although the reduction of cases we have made is already a vast simplification of the necessary calculations, we find that it is still not enough to make a theoretical derivation of the desired result feasible.Instead, we produce a proof with a computer.
We can reduce the possible configurations to four parameters (see Figure 5): • α denotes the angle measuring the length of the first τ-arc, • σ denoted the angle for the second τ-arc until S, • p is, as usual, the persistence parameter, • χ determines the position of X in the boundary ∂E p (S ).Notice that Figure 5 does not include both ellipsoids containing X but not Y, like Figure 4 does.It turns out that as soon as Y is not in the first ellipsoid, both X and Y will be in an ellipsoid, the center of which is within κ distance from Y.This allows us to restrict ourselves to just the four aforementioned variables, which makes the program run in a reasonable time.
The space of the configurations we restricted ourselves to -let us denote it by C -is compact (we give its precise definition below).We want to calculate for each configuration in C that X is in some ellipsoid with the center within κ distance from Y (it follows automatically that Y is in this ellipsoid).Let us now define C precisely and then calculate the Lipschitz coefficient of v.We may orient the coordinate system so that the point X is in the closed fourth quadrant.Hence we have X = p + p 2 cos(χ), −p sin(χ) , where χ ranges over the interval R [0, π 2 ] .Unfortunately due to our method we cannot allow p to range over the whole interval R (λ,1) ; if we did, the values of v would come arbitrarily close to zero, in particular below C • δ, so the program would not prove anything.Let us set p ∈ R [mp,Mp] , where we have chosen in our program m p := 0.5 and M p := 0.96.The closer M p is to 1, the smaller the density we prove is required.However, larger M p necessitates smaller δ which increases the computation time.Through experimentation, we have chosen bounds, so that the program ran for a few days.Ultimately, with better computers (and more patience) one can improve our result.We note that experimentally we never came across any counterexample to our claims even outside of C (so long as the configuration satisfied the theoretical assumptions from Corollary 3.5).We discuss this further in Section 7.
We can now calculate the upper bound on α (the lower bound is just 0).For fixed p and χ we claim that the case α ≥ arctan √ p+p 2 cos(χ) 1+p sin(χ) is impossible.In this case the point (0, 1) + X−(0,1) X−(0,1) lies on the manifold, and is the closest to X among points on M.This is because its distance to X is bounded by p (by Lemma 2.8) which is smaller than τ = 1, so its associated τ-ball includes all points, closer to X, and M cannot intersect an open associated τ-ball -see Figure 6.We claim that the point pr(X) = (0, 1) + X−(0,1) X−(0,1) lies in E p (S ).This is a contradiction since then Y = pr(X) ∈ E p (S ).Clearly, it suffices to verify pr(X) ∈ E p (S ) for χ = 0 (for larger χ the point pr(X) lies on the τ-arc further towards the ellipsoid center S ).If we put the coordinates of pr(X) for χ = 0 into the equation for the ellipsoid, we see that we need . The value of this polynomial at m p = 0.5 is −1.15625 < 0, so the polynomial is negative on R [mp,Mp] , as required. 6ith this we have confirmed that it suffices to restrict ourselves to α ≤ arctan √ p+p 2 cos(χ) 1+p sin(χ) .As mentioned, this bound will be the largest at χ = 0, so we will cover the relevant configurations for α ≤ arctan p + p 2 , or equivalently (for α ∈ R [0, π 2 ) and p ∈ R (0,1) ) tan 2 (α) ≤ p + p 2 , in particular tan 2 (α) ≤ M p + M 2 p .
Finally, we claim that we can restrict ourselves to σ ∈ R [0,π] .If the manifold were to trace a τ-circle within a plane for longer than π, it would necessarily be that τ-circle.If it were to veer away from the circle, the medial axis would continue from the center of the circle to the area between the two parts of the manifold (see Figure 7) which would contradict that the manifold's reach is τ.
medial axis Having calculated the bounds on the variables, we may now define For the sake of a later calculation we also define a slightly bigger area, Both C and C are 4-dimensional rectangular cuboids with a small piece removed; in the α-p-plane they look as shown in Figure 8.Given (α, σ, p, χ) ∈ C , we have X = (X T , X N ) = p + p 2 cos(χ), −p sin(χ) .Let us denote the center of the τ-ball, along the boundary of which lies the arc containing S, by C. Observe from Figure 9 that C = (0, 1) + 2 sin(α), − cos(α) and It will be convenient to define v on the larger area C (although we are still only interested in positivity of v on C ). Recall that we want v to be a function, so that its 0-level set is the boundary of E p (S), and is positive on E p (S) itself.Let x, y be coordinates in our current coordinate system, x , y the coordinates in the coordinate system, translated by S, and x , y the coordinates if we rotate the translated coordinate system by α − σ in the positive direction.Hence In the rotated translated coordinate system, the equation for the boundary of the ellipse is Recall that it follows from the multivariate Lagrange mean value theorem that for any where the maximum of the norm of the gradient is taken over the line segment connecting the points a and b.In particular, the maximum over the entire C is a Lipschitz coefficient for v.
This theorem holds for any pair of conjugate norms.We take the ∞-norm on C , and the 1-norm for the gradient.The reason is that we cover the region C by cuboids which are almost cubes (in the centers of which we calculate the function values).The smaller the distance between the center of a cube and any of its points, the better the estimate we obtain.Hence Before we estimate the absolute values of partial derivatives, let us make several preliminary calculations.
First we put the function into a more convenient form.
Now we calculate the bound on X − S and its partial derivatives.
We can now estimate the partial derivatives of v.
Hence a Lipschitz coefficient for v on C is which is a little less than 125.
The idea behind the program is that it accepts a value δ ∈ R >0 , sets each of the variables α, σ, p, χ at δ away from the edge of C and calculates the values of v in a lattice of points, of which any two consecutive ones differ in the values of the variables by 2δ.The idea is that ∞-balls (cubes) with the centers in the lattice points and radius δ cover C , so if the values of v in these points is > L δ, then v is positive.This requires two remarks, however.First, if one tries to evenly cover a cuboid by cubes with edge length 2δ and with centers within the cuboid, the cuboid will not be covered, if dividing any cuboid edge length by 2δ yields a remainder, greater than δ.For this reason, in the program we decrease each cube edge length slightly (by reducing the step of each variable) so that the now slightly distorted cubes exactly cover the cuboid enclosing C and C if we take their centers from the lattice spanning the cuboid (though since we are trying to only cover C , we do not need to take these centers from the entire cuboid).
The second problem is that C is not actually a cuboid and might not get covered by the distorted cubes if we only took those with the centers in C .However, we claim that the distorted cubes cover C if we take the centers from C , as long as δ is small enough.
Recall Figure 8; since the dependence of the lower bound for p on α is increasing for both C and C , it suffices to check that if α, σ, p, χ ∈ C , then min α + δ, arctan( We have arctan( Hence tan 2 (α) has a Lipschitz coefficient of 9 on the relevant region.Similarly, p 2 has a Lipschitz coefficient of 2. Then for δ ∈ R (0, mp  13 ] .In particular, δ ∈ R (0,0.01] suffices, also in the cases where we hit the edges at arctan( M p + M 2 p ) and/or m p (we did not have to be too picky about these particular estimates; the actual value of δ we run the program with is far smaller, at 0.0004, as we explain below).
There is one more issue which prevents us from getting as nice of a result with a computer program as we would get with a theoretical derivation.Recall that we require , where S is chosen within κ-distance from Y, hence the entire line segment from X to Y is in E p (S).However, if we allow κ to get arbitrarily close to 2p √ p + 2 − 1 , then the value of v gets arbitrarily close to zero, and we cannot use our method to prove that v is positive.To avoid this, we decrease the upper bound on the distance between S and Y to 2p √ p + 2 − 1 − κ off for κ off = 0.55 (we chose this value experimentally, so that the result of the program was sufficiently good).
After some experimentation, we ran the program with δ = 0.0004.The resulting smallest value of v that the program returned, was 0.068546.
Recall that v has a Lipschitz coefficient of 125.Since any possible configuration is at most δ = 0.0004 away from some point in the lattice where the program calculates v, the values that v can take are at most 125 • 0.0004 = 0.05 smaller than the values, calculated by the program.In particular, v is necessarily positive.
The price of this method is that we had to decrease the size of the theoretical interval for the persistence parameter p ∈ R (λ,τ) which in particular requires greater density sample for the proof than is strictly necessary.We discuss this in Section 7.
Let us summarize the results we have obtained in this section.We have seen that if a point X ∈ E p is in at least two of the closed ellipsoids, then there exists S ∈ S such that X ∈ E p (S) and pr(X) ∈ E p (S).This happened in one of two ways.The first closed ellipsoid we took X from could already satisfy this property, or we could find an ellipsoid with the center close to pr(X) which contained both X and pr(X) in its interior.If we start with X ∈ E p though, we can pick as the first ellipsoid one that has X in its interior, which means that we can always conclude the statement of Lemma 4.1.

Construction of the Deformation Retraction
In this section we show that under the same assumptions on τ and p as in the previous section, the union of the open ellipsoids around sample points deformation retracts onto the manifold M.
Informally, the idea of the deformation retraction is as follows.For a point X in an open ellipsoid E p (S), consider the closed ellipsoid E q (S) where q := q S (X) (Definition 2.7), the boundary of which contains X.If the vector prv(X) points into the interior of this closed ellipsoid, we move in the direction of prv(X), i.e. we use the normal deformation retraction.Otherwise, we move in the direction of the projection of the vector prv(X) onto the tangent space T X ∂E q (S).This causes us to slide along the boundary ∂E q (S).Either way, we remain within E q (S) (and therefore within E p ) and eventually reach the manifold M.This procedure is problematic for points which are in more than one ellipsoid, but we can glue together the directions of the deformation retraction with a suitable partition of unity.Figure 10 illustrates this idea.
To make this work, we will need precise control over the partition of unity, which is the topic of Subsection 5.1.Then in Subsection 5.2 we define the vector field which gives directions, in which we deformation retract.Subsection 5.3 proves that the flow of this vector field has desired properties.We then use this flow to explicitly give the definition of the requisite deformation retraction in Subsection 5.4.

The Partition of Unity
For each S ∈ S define Proof.For any The only way B S could be empty is if the sample S is a singleton which can only happen when M is a singleton, but this possibility is excluded by the assumption that the dimension of M is positive.The distance to any non-empty set is a well-defined real-valued function, the zeroes of which form the closure of the set.Define The sets A S and B S are disjoint.If we had X ∈ A S ∩ B S , then d(A S , X) ≤ 1 2 d(B S , X) ≤ 3 4 d(A S , X), so d(A S , X) = d(B S , X) = 0, meaning X ∈ A S ∩ B S , a contradiction.Note also that B S ⊆ B S and The sets A S and B S are closed in E p because they are (empty or) preimages of R ≥0 under continuous maps X → 1 2 d(B S , X) − d(A S , X) and X → 3 2 d(A S , X) − d(B S , X).Using the smooth version of Urysohn's lemma [26], choose a smooth function f S : E p → R [0,1] such that f S is constantly 1 on A S and constantly 0 on B S .
Recall that the support supp(f ) of a continuous real-valued function f is defined as the closure of the complement of the zero set, where both the complementation and the closure are calculated in the domain of f .Proposition 5.2.For every S ∈ S and Proof.Since X ∈ supp(f S ), the support of f S is non-empty, so Since supp(f S ) ⊆ E p (S) for all S ∈ S, the family of supports of f S is also locally finite.Thus any X ∈ E p has a neighbourhood U ⊆ E p which intersects only finitely many supports, at most one of which contains X.The intersection of the complements of the rest with the set U is a neighbourhood of X which intersects at most one support.
From these results we can conclude that X → S∈S f S (X) gives a well-defined smooth map E p → R [0,1] .We may therefore define a smooth map f P : Thus the family of maps f S , S ∈ S, together with f P , forms a smooth partition of unity on E p .
We will need two more subsets of E p : Assume that X ∈ E p \ V.If X was in any A S , we would have 0 = d(A S , X) ≥ 1 2 d(B S , X), so X ∈ A S ∩ B S , a contradiction.Since X is in no A S , it must be in at least two E p (S), so X ∈ W by Lemma 4.1.

The Velocity Vector Field
Let us define for each S ∈ S the vector field V S : E p (S) → R n as follows.Given X ∈ E p (S), let H S X denote the n-dimensional closed half-space which is bounded by the hyperplane T X ∂E qS(X) (S) and which contains E qS(X) (S).Define V S (X) to be the projection of the vector prv(X) to the closest point in H S X. Explicitly, if we introduce any orthonormal coordinate system with the origin in X such that the last coordinate axis points orthogonally to ∂E qS(X) (S) into the interior of E qS(X) (S), then the projection in these coordinates is given by (x 1 , . . ., x n−1 , x n ) → (x 1 , . . ., x n−1 , max {x n , 0}).
Proposition 5.5.The vector field V S : E p (S) → R n is Lipschitz with a bound on a Lipschitz coefficient independent from S.
Proof.The projection onto a half-space is 1-Lipschitz.By setting τ = 1 and r = p in Lemma 3.1, we see that the map prv is ( 11−p + 1)-Lipschitz on E p (S) ⊆ M 1−p .As the composition of these two maps, the vector field V S is Lipschitz with the product Lipschitz coefficient, i.e. also 1  1−p + 1.
For any S ∈ S and X ∈ E p (S) \ M let α S X denote the angle between the vectors prv(X) and V S (X), and let hl S X denote the closed half-line which starts at X, is orthogonal to ∂E qS(X) (S) and points into the exterior of E qS(X) (S).
Proof.Let q := q S (X).We have q ≤ p < 1 and since X / ∈ M, in particular X = S, we have q > 0. Let n be the unit vector, orthogonal to the boundary of E q (S) and pointing into the exterior of E q (S), so that hl S X = {X + t n | t ∈ R ≥0 }.Let := prv(X) ; by assumption X / ∈ M, so > 0, and we may define m := prv(X) .Also, since E p ⊆ M τ = M 1 , we have < 1.
Use Lemma 2.5 to introduce a planar tangent-normal coordinate system with the origin at S which contains X as well as n, hence the whole hl S X.Without loss of generality assume that X lies in the closed first quadrant, so that we have χ ∈ R [0, π 2 ] with X = q + q 2 sin(χ), q cos(χ) (the angle is measured from N S M).
Let us first prove that pr(X) / ∈ hl S X. Assume to the contrary that this were the case, so that m = n.We will derive the contradiction by showing that the open τ-ball with the center in X − (1 − ) m, associated to M at pr(X), intersects all open τ-balls, associated to M at S. Two of those have their centers in the tangent-normal plane we are considering, and necessarily one of those is the τ-ball at S which is the furthest away from the τ-ball with the center in X − (1 − ) m.It thus suffices to check that the latter intersects the former two.
We verify that this expression is < 4 for q, ∈ R (0,1) and χ ∈ R [0, π 2 ] with the help from In particular, these two scalar products are non-zero outside M. Hence the fields V S and V have no zeros outside M. Proof.
1. Assume first that prv(X) points into the half-space bounded by T X ∂E qS(X) (S) which contains E qS(X) (S).Then V S (X) = prv(X) and α S X = 0, so the statement is clear.
Otherwise, V S (X) is the orthogonal projection of prv(X) onto T X ∂E q (S), so V S (X) = prv(X) • cos(α S X) and For the inequality, we use Lemma 5.6 to get cos 2 (α S X) ≥ 2 3 .2. We have There is one more problem with taking V as the direction vector field of the deformation retraction.The closer X is to the manifold, the shorter the vector prv(X), and thus V (X), is.
If we used V as the velocity vector field for the flow, we would need infinite time to reach the manifold M. If we scale the vector field in the way that the distance to the manifold decreases with speed 1, we are sure to reach the manifold within time 1 which is how one usually gives a deformation retraction (or more generally any homotopy).
Since d(X, M) = d(X, pr(X)), we need to divide V (X) with the length of its projection onto the vector prv(X).Hence the following definition of the vector field V : Corollary 5.7 ensures that the vector field V is well defined and that it has the same direction as V .
Proposition 5.8.For every S ∈ S the field V S : E p (S) → R n is bounded Lipschitz.The fields V : E p → R n and V : E p \ M → R n are bounded and locally Lipschitz.
Proof.The projection onto a half-space is 1-Lipschitz; since the map prv is bounded in norm (by Lemma 2.8 we have prv(X) = d(X, pr(X)) = d(X, M) ≤ q S (X) < p), the field V S is also bounded.Lemma 3.1 tells us that the map prv is ( 1 1−p + 1)-Lipschitz on E p (S) ⊆ M 1−p .As the composition of two Lipschitz maps, the vector field V S is Lipschitz with the product Lipschitz coefficient, i.e. also 1  1−p + 1.
Since the norm of the map prv, as well as all V S , has the same bound p, this is also a bound on the norm of V : The field V is locally Lipschitz by Corollary 2.10.
Assume now that X ∈ E p \ M. Recall from Lemma 5.6 that cos(α S X) ≥ 2 3 .Thus .
It follows that the norm of V is bounded by 3 2 .
Let U be a neighbourhood of X, where V is Lipschitz.Let r ∈ R >0 be such that B r (X) ⊆ U and r < d(M, X) = prv(X) and r < 1 − d(M, X).We claim that V is Lipschitz on B r (X) and therefore locally Lipschitz.
By Lemma 3.1 the map prv is Lipschitz on B r (X) ⊆ M 1−d(M,X) .The map prv(-) is a composition of Lipschitz maps and therefore Lipschitz on B r (X).Clearly, it is also bounded.
Since V is also bounded and Lipschitz on B r (X) ⊆ U, so is the scalar product Y → V (Y), prv(Y) by Lemma 2.9.Recall from Corollary 5.7 that Hence Lemma 2.9 also tells us that the map Y → prv(Y) V (Y),prv(Y) is bounded Lipschitz on B r (X), and then so is its product with V , i.e. the field V .
The reason we consider the local Lipschitz property is that it allow us to define the flow of the field V .

The Flow of the Vector Field
We will use the flow of the vector field V as part of the definition of the desired deformation retraction.Generally the flow of a vector field need not exist globally, and in our case the whole point is that the flow takes us to the manifold where the vector field is not defined.However, before we can establish what the exact domain of definition for the flow is, we will already need to refer to the flow to prove some of its properties.As such, it will be convenient to treat the flow as a partial function.Also, it is convenient to use Kleene equality in the context of partial functions: a b means that a is defined if and only if b is, and is they are defined, they are equal.
The flow of the vector field V : E p \ M → R n can thus be given as a partial map Φ : (E p \ M) × R ≥0 E p \ M which satisfies the following for all X ∈ E p \M and t, u ∈ R ≥0 : 1. the domain of definition of Φ is an open subset of (E p \ M) × R ≥0 , 2. the flow Φ is continuous everywhere on its domain of definition, . if Φ(X, u) is defined, the derivative of the function Φ(X, -) exists at u, and is equal to V Φ(X, u) .
A standard result [18] tells us that if a vector field is locally Lipschitz, it has a local vector flow.That is, for every X ∈ E p \ M there exists ∈ R >0 such that Φ(X, t) is defined for all t ∈ R [0, ) .
We claim that if we move with the flow Φ of the vector field V , we approach the manifold M with constant speed.
Proof.Consider the functions R [0,u] → R, given by t → d M, Φ(X, t) and t → d(M, X) − t.
To show that these two functions are the same (and thus in particular coincide for t = u), it suffices to show that they match in one point and have the same derivative.
For t = 0, we have d M, Φ(X, 0) = d(M, X).The derivative of the second function is constantly −1.We calculate the derivative of the first function via the chain rule.Take t ∈ R [0,u] and introduce an orthonormal n-dimensional coordinate system with the origin in Y := Φ(X, t), such that the first coordinate axis points in the direction of prv(Y).In this coordinate system, the Jacobian matrix of the map d(M, -) at Y is a matrix row with the first entry −1 and the rest 0. We need to multiply this matrix with the column, the first entry of which is Φ (X, t), prv(Y) prv(Y) , i.e. the scalar projection onto the direction prv(Y) prv(Y) of the derivative of Φ(X, -) at Y. • L is a lower subset of I (i.e.∀t, u ∈ I .u ∈ L ∧ t ≤ u ⇒ t ∈ L), • 0 ∈ L, • for every t ∈ L <a there exists u ∈ I >t such that u ∈ L, • for every t ∈ I, if R [0,t) ⊆ L, then t ∈ L. Because L contains 0, it is non-empty.Since L is a lower subset of I, the third assumption on L is equivalent to openness of L, and the fourth assumption is equivalent to closedness of L.
interior of or is at worst tangent to E qS(Φ(X,t)) (S) on this neighbourhood, so the flow stays for awhile in E p− which gives us the requisite t .On the other hand, if Φ(X, t) ∈ W, then we have t by Lemma 5.11.
We are now ready to prove that the domain of definition of Φ is D := (X, t) ∈ (E p \ M) × R ≥0 t < d(M, X) .Proposition 5.13.The flow Φ is defined on D.
Proof.The flow is defined as long as it remains within the domain of V , i.e.E p \ M. Take any X ∈ E p \ M and define L := t ∈ R [0,d(M,X)) Φ is defined at (X, t) .
We verify the properties for L from Lemma 5.10 to get L = R [0,d(M,X)) .The basic properties of the flow tell us that 0 ∈ L and that L is an open lower subset of R [0,d(M,X)) .Take t ∈ R [0,d(M,X)) such that R [0,t) ⊆ L. Because the vector field V is bounded (Proposition 5.8), the map Φ(X, -) : R [0,t) → E p \ M (of which the field is the derivative) is Lipschitz, in particular uniformly continuous.Hence it has a (uniformly) continuous extension R [0,t] → E p (since E p , as a closed subspace of R n , is complete).Thus the limit Y := lim t t Φ(X, t ) exists and is in E p .
We need to show that Y ∈ E p \ M. Using Lemma 5.9, we get We also have Y ∈ E p .Before the flow could leave E p , it would have to get arbitrarily close to ∂E p which would contradict Lemma 5.12.

The Deformation Retraction
We can now define a deformation retraction from E p to M. The flow Φ takes us arbitrarily close to the manifold without actually reaching it, so we will define the deformation retraction in two parts: first from E p to a small neighbourhood of M, and then from this neighbourhood to M itself.
Recall that by assumption p > λ, and we have the reach is half the distance between two closest points.In this case we necessarily have S = M and the set E p is a union of p-balls around points in M which clearly deformation retracts to M.
We now consider the case τ < ∞ and 0 < m < n.
All our conditions and results are homogeneous in the sense that they are preserved under uniform scaling.In particular, we may rescale the whole space R n by the factor 1 τ and may thus without loss of generality assume τ = 1.The result now follows from Proposition 5.14.

Discussion
As already mentioned in the introduction, the ratio κ τ is a measure of the density of the sample.We want the required sample density to be small, i.e. the ratio κ τ should be as large as possible.Corollary 6.2 gave us the upper bound κ τ < 0.913.For comparison, recall that Niyogi, Smale and Weinberger [27] obtained the bound κ τ < 1 which would be a further improvement of the above result κ τ < 0.913 by around a third (more than three times an improvement over the Niyogi, Smale and Weinberger's result).The only reason we had to settle for the worse result was because to prove Lemma 4.1 in Section 4, we used a computer program.We can get closer to the theoretical bound by increasing M p and decreasing m p and κ off , in which case the program yields a smaller lower bound on the values of the calculated function.Hence we would need to run the program with smaller δ, but since the loops in the program, the number of steps in which is inversely proportional to δ, are nested four levels deep, dividing δ by some t makes the program run approximately t 4 -times longer.The parameters, given in Section 4, are what we settled for in this paper in order for the program to complete the calculation in a reasonable amount of time -the program which computes in parallel ran for around 2.7 days on a 4-core Intel i7-7500 processor.Our bound on the sample density can thus be improved by anyone with more patience and better hardware.
One of the questions we do not answer in this paper concerns robustment of our results to noise.This is relevant because in practice, the reach and the tangent spaces are estimated from the sample, and are thus only approximately known.The sample points might also lie only in the vicinity of the manifold, not exactly on it.In this paper we wanted to establish the new methods, and we leave their refinement to take noise into account for future work.
Our result is expressed in terms of the reach of a manifold, which is a global feature.As the classical sampling theory advanced, researchers refined the notion of reach to local feature size, weak feature size, µ-reach and related concepts [2], [15], [17], [12], [28], [19].A natural question arises whether we can apply these concepts to improve the bounds on a (local) density of a sample when using ellipsoids.In particular, it would be interesting to see whether we can improve our result by allowing differently sized ellipsoids around different sample points, with the upper bound on the size given in terms of the local feature size (local distance to the medial axis) or the distance to critical points.

< 3 5
τ by showing that the union of -balls with centers in sample points deformation retracts to M.
sample (a subset of M), non-empty and locally finite κ the Hausdorff distance between M and S A the medial axis of M τ the reach of M p persistence parameter E p (S) open ellipsoid with the center in a sample point S ∈ S with the major semi-axes tangent to M E p (S) closed ellipsoid with the center in a sample point S ∈ S with the major semi-axes tangent to M ∂E p (S) the boundary of E p (S), i.e.E p (S) \ E p (S) E p the union of open ellipsoids over the sample, S∈S E p (S) E p the union of closed ellipsoids over the sample, S∈S E p (S) Given a persistence parameter p ∈ R >0 , let us denote the unions of open and closed tangent-normal p-ellipsoids around sample points by E p := S∈S E p (S) , E p := S∈S E p (S) .As a union of open sets, E p is open in R n .As a locally finite union of closed sets, E p is closed in R n .

Figure 3 :
Figure 3: Normal deformation retraction does not always work.

Figure 4 :
Figure 4: Point in two ellipsoids, whose projection is in another ellipsoid

Figure 5 :
Figure 5: Notation of parameters in the program

Figure 7 :
Figure 7: Medial axis of a manifold tracing an arc for longer than π

Figure 8 :
Figure 8: Regions C and C

Figure 9 :
Figure 9: Position of C and S

Figure 10 :
Figure 10: Idea for the deformation retraction

Proposition 5 . 1 .
The sets A S and B S are closed in E p because they are complements within E p of open sets.Note that A S ⊆ E p (S) = E p \ B S .In particular, A S and B S are disjoint.If S , S ∈ S and S = S , then A S ⊆ B S and A S ∩ A S = ∅.

Proposition 5 . 3 .
The supports7 of functions f S are pairwise disjoint.Hence every point in E p has a neighbourhood which intersects the support of at most one f S .Proof.Take S , S ∈ S, S = S and let X ∈ f S ∩f S .Then d(A S , X) ≤2  3 d(B S , X) ≤ 2 3 d(A S , X) and likewise d(A S , X) ≤ 2 3 d(A S , X), implying d(A S , X) = d(A S , X) = 0, so X ∈ A S ∩ A S , a contradiction.

Lemma 5 . 4 .
The sets V and W are open in E p and in R n , and V ∪ W = E p .Proof.The given sets are open in

= − 1 ,
By the chain rule, the derivative of the functiont → d M, Φ(X, t) is therefore (−1) • Φ (X, t), prv(Y) prv(Y) = − V (Y), prv(Y) prv(Y) = − prv(Y) V (Y),prv(Y) V (Y), prv(Y) prv(Y)as required.The next lemma is a tool which serves as a form of induction for real intervals.Lemma 5.10.Let a ∈ R ≥0 and let I be either the interval R [0,a) or the interval R [0,a] .Let L ⊆ I have the following properties: Then L = I.Proof.To prove L = I, it suffices to show that L is non-empty, open and closed in I since I is connected.

Corollary 6 . 2 .
Let n ∈ N and let M be a non-empty properly embedded C 1 -submanifold of R n without boundary.Let M have the same dimension m around every point.Let S ⊆ M be a subset of M, locally finite in R n (the sample from the manifold M).Let τ be the reach of M in R n and κ the Hausdorff distance between S and M. Then wheneverκ τ < 2M p ( 2 + M p − 1) − κ off ≈ 0.913, there exists p ∈ R >0 such that M is homotopy equivalent to E p .Proof.The expression 2p τ(p + 2τ) − τ − κ off τ 2 is increasing in p. Hence we get the required result from Theorem 6.1 by taking p = M p τ.

2 35< 2 ( √ 3 − 1 )
≈ 0.387.Hence our result allows approximately 2.36-times lower density of the sample.There is clear room for improvement of our result.The bounds we obtained from the theoretical parts of the proof yield κ τ ≈ 1.21, The boundary of the ellipsoid is a level set of a smooth function.We can compose it with a suitable linear function so that X is in the open ellipsoid if and only if the value of the adjusted function is positive.Let us denote this adjusted function by v : C → R; we have our claim if we show that v is positive for all configurations in C .
Of course, the program cannot calculate the function values for all infinitely many configurations in C .We note that the (continuous) partial derivatives of v are bounded on compact C , hence the function is Lipschitz.If we change the parameters by at most δ, the function value changes by at most C • δ where C is the Lipschitz coefficient.The program calculates the function values in a finite lattice of points, so that each point in C is at most a suitable δ away from the lattice, and verifies that all these values are larger than C • δ.This shows that v is positive on the whole C .