Dimensionality reduction for k-distance applied to persistent homology

Given a set P of n points and a constant k, we are interested in computing the persistent homology of the Čech filtration of P for the k-distance, and investigate the effectiveness of dimensionality reduction for this problem, answering an open question of Sheehy (The persistent homology of distance functions under random projection. In Cheng, Devillers (eds), 30th Annual Symposium on Computational Geometry, SOCG’14, Kyoto, Japan, June 08–11, p 328, ACM, 2014). We show that any linear transformation that preserves pairwise distances up to a (1±ε)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1\pm {\varepsilon })$$\end{document} multiplicative factor, must preserve the persistent homology of the Čech filtration up to a factor of (1-ε)-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1-{\varepsilon })^{-1}$$\end{document}. Our results also show that the Vietoris-Rips and Delaunay filtrations for the k-distance, as well as the Čech filtration for the approximate k-distance of Buchet et al. [J Comput Geom, 58:70–96, 2016] are preserved up to a (1±ε)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1\pm {\varepsilon })$$\end{document} factor. We also prove extensions of our main theorem, for point sets (i) lying in a region of bounded Gaussian width or (ii) on a low-dimensional submanifold, obtaining embeddings having the dimension bounds of Lotz (Proc R Soc A Math Phys Eng Sci, 475(2230):20190081, 2019) and Clarkson (Tighter bounds for random projections of manifolds. In Teillaud (ed) Proceedings of the 24th ACM Symposium on Computational Geom- etry, College Park, MD, USA, June 9–11, pp 39–48, ACM, 2008) respectively. Our results also work in the terminal dimensionality reduction setting, where the distance of any point in the original ambient space, to any point in P, needs to be approximately preserved.


Introduction
Persistent homology is one of the main tools to extract information from data in topological data analysis.Given a data set as a point cloud in some ambient space, the idea is to construct a filtration sequence of topological spaces from the point cloud, and extract topological information from this sequence.The topological spaces are usually constructed by considering balls around the data points, in some given metric of interest, as the open sets.However the usual distance function is highly sensitive to the presence of outliers and noise.One approach is to use distance functions that are more robust to outliers, such as the distance-to-a-measure and the related k-distance (for finite data sets), proposed recently by Chazal et al. [9] Although this is a promising direction, an exact implementation can have significant cost in run-time.To overcome this difficulty, approximations of the k-distance have been proposed recently that led to certified approximations of persistent homology [23,7].Other approaches involve using kernels [33] and de-noising algorithms [8,38].
In all the above settings, the sub-routines required for computing persistent homology have exponential or worse dependence on the ambient dimension, and rapidly become unusable in real-time once the dimension grows beyond a few dozens -which is indeed the case in many applications, for example in image processing, neuro-biological networks, and data mining (see e.g.[21]).This phenomenon is often referred to as the curse of dimensionality.
The Johnson-Lindenstrauss Lemma.One of the simplest and most commonly used mechanisms to mitigate this curse, is that of random projections, as applied in the celebrated Johnson-Lindenstrauss lemma (JL Lemma for short) [25].The JL Lemma states that any set of n points in Euclidean space can be embedded into a space of dimension O(ε −2 log n) with (1 ± ε) distortion.Since the initial non-constructive proof of this fact by Johnson and Lindenstrauss [25], several authors have given successive improvements, e.g., Indyk, Motwani, Raghavan and Vempala [24], Dasgupta and Gupta [14], Achlioptas [1], Ailon and Chazelle [2], Matoušek [30], Krahmer and Ward [27], and Kane and Nelson [26].These address the issues of efficient construction and implementation, using random matrices that support fast multiplication.Dirksen [15] gave a unified theory for dimensionality reduction using subgaussian matrices.
In a different direction, variants of the Johnson-Lindenstrauss lemma giving embeddings into spaces of lower dimension than the JL bound have been given under several specific settings.For point sets lying in regions of bounded Gaussian width, a theorem of Gordon [22] implies that the dimension of the embedding can be reduced to a function of the Gaussian width, independent of the number of points.Sarlos [34] showed that points lying on a d-flat can be mapped to O(d/ε 2 ) dimensions independently of the number of points.Baraniuk and Wakin [5] proved an analogous result for points on a smooth submanifold of Euclidean space, which was subsequently sharpened by Clarkson [12] (see also Verma [36]), whose version directly preserves geodesic distances on the submanifold.Other related results include those of Indyk and Naor [12] for sets of bounded doubling dimension and Alon and Klartag [3] for general inner products, with additive error only.Recently, Nelson and Narayanan [31], building on earlier results [18,29], showed that for a given set of points or terminals, using just one extra dimension from the Johnson-Lindenstrauss bound, it is possible to achieve dimensionality reduction in a way that preserves not only interterminal distances, but also distances between any terminal to any point in the ambient space.
Remark 1.Our results are based on the notion of weighted points, and as in most applications of the JL lemma, give a reduced dimensionality typically of the order of hundreds.This is very useful if the ambient dimensionality is much higher magnitude (e.g. 10 6 ).Moreover, some of the above-mentioned variants and generalizations such as for point sets having bounded Gaussian width or lying on a lower-dimensional submanifold, the reduced dimensionality is independent of the number of input points, which allows for still better reductions.
Dimension Reduction and Persistent Homology.The JL Lemma has also been used by Sheehy [35] and Lotz [28] to reduce the complexity of computing persistent homology.Both Sheehy and Lotz show that the persistent homology of a point cloud is approximately preserved under random projections [35,28], up to a (1±ε) multiplicative factor, for any ε ∈ [0, 1].Sheehy proves this for an n-point set, whereas Lotz's generalization applies to sets of bounded Gaussian width, and also implies dimensionality reductions for sets of bounded doubling dimension, in terms of the spread (ratio of the maximum to minimum interpoint distance).However, their techniques involve only the usual distance to a point set and therefore remain sensitive to outliers and noise as mentioned earlier.The question of adapting the method of random projections in order to reduce the complexity of computing persistent homology using the k-distance, is therefore a natural one, and has been raised by Sheehy [35], who observed that "One notable distance function that is missing from this paper [i.e.[35]] is the so-called distance to a measure or . . .k-distance . . . it remains open whether the k-distance itself is (1 ± ε)-preserved under random projection." Our Contribution In this paper, we combine the method of random projections with the k-distance and show its applicability in computing persistent homology.It is not very hard to see that for a given point set P , the random Johnson-Lindenstrauss mapping preserves the pointwise k-distance to P (Theorem 17).However, this is not enough to preserve intersections of balls at varying scales of the radius parameter, and thus does not suffice to preserve the persistent homology of Čech filtrations, as noted by Sheehy [35] and Lotz [28].We show how the squared radius of a set of weighted points can be expressed as a convex combination of pairwise squared distances.From this, it follows that the Čech filtration under the k-distance, will be preserved by any linear mapping that preserves pairwise distances.
Extensions Further, as our main result applies to any linear mapping that approximately preserves pairwise distances, the analogous versions for bounded Gaussian width, points on submanifolds of R D , terminal dimensionality reduction and others apply immediately.Thus, we give several extensions of our results.The extensions provide bounds which do not depend on the number of points in the sample.The first one, analogous to [28], shows that the persistent homology with respect to the k-distance, of point sets contained in regions having bounded Gaussian width, can be preserved via dimensionality reduction, using an embedding with dimension bounded by a function of the Gaussian width.Another result is that for points lying in a low-dimensional submanifold of a high-dimensional Euclidean space, the dimension of the embedding preserving the persistent homology with k-distance depends linearly on the dimension of the submanifold.Both these settings are commonly encountered in high-dimensional data analysis and machine learning (see, e.g., the manifold hypothesis [19]).We mention that analogous to [31], it is possible to preserve the k-distance based persistent homology while also preserving the distance from any point in the ambient space to every point (i.e., terminal) in P (and therefore the k-distance to P ), using just one extra dimension.

Run-time and Efficiency
In many other applications of the Johnson-Lindenstrauss dimensionality reduction, multiplying by a dense gaussian matrix is a significant overhead, and can seriously affect any gains resulting from working in a lower dimensional space.However, as is pointed out in [28], in the computation of persistent homology the dimensionality reduction step is carried out only once for the n data points at the beginning of the construction.Having said that, it should still be observed that most of the recent results on dimensionality reduction using sparse subgaussian matrices [2,26,27] can also be used to compute the k-distance persistent homology, with little to no extra cost.Remark 2. It should be noted that the approach of using dimensionality reduction for the k-distance, is complementary to denoising techniques such as [8] as we do not try to remove noise, only to be more robust to noise.Therefore, it can be used in conjunction with denoising techniques, as a pre-processing tool when the dimensionality is high.
Outline The rest of this paper is organized as follows.In Section 2, we briefly summarize some basic definitions and background.Our theorems are stated and proven in Section 3. Some applications of our results are derived in Section 4. We end with some final remarks and open questions in Section 5.

Preliminaries
We need a well-known identity for the variance of bounded random variables, which will be crucial in the proof of our main theorem.A short probabilistic proof of (1) is given in the Appendix.Let A be a set of points p 1 , . . ., p l ∈ R m .A point b ∈ R m is a convex combination of the points in A if there exist non-negative reals λ 1 , . . ., λ l ≥ 0 such that b = l i=1 λ i p i and In particular, if λ i = 1/k for all i, we have (2)

The Johnson-Lindenstrauss Lemma
The Johnson-Lindenstrauss Lemma [25] states that any subset of n points of Euclidean space can be embedded in a space of dimension O(ε −2 log n) with (1±ε) distortion.We use the notion of an ε-distortion map with respect to P (also commonly called a Johnson-Lindenstrauss map).
Definition 1.Given a point set P ⊂ R D , and ε ∈ (0, 1), a mapping f : R D → R d for some d ≤ D is an ε-distortion map with respect to P , if for all x, y ∈ P , A random variable X with mean zero is said to be subgaussian with subgaussian norm K if E exp X 2 /K 2 ≤ 2. In this case, the tails of the random variable satisfy We focus on the case where the Johnson-Lindenstrauss embedding is carried out via random subgaussian matrices, i.e., matrices where for some given K > 0, each entry is an independent subgaussian random variable with subgaussian norm K.This case is general enough to include the mappings of Achlioptas [1], Ailon and Chazelle [2], Dasgupta and Gupta [14], Indyk, Motwani, Raghavan, and Vempala [24], and Matoušek [30] (see Dirksen for a unified treatment [15]).
Lemma 2 (JL Lemma).Given 0 < ε, δ < 1, and a finite point set Then a random linear mapping f : d Gv where G is a d × D subgaussian random matrix, is an ε-distortion map with respect to P , with probability at least 1 − δ.Definition 3.For ease of recall, we shall refer to a random linear mapping f : While in the version given here the dimension of the embedding depends on the number of points in P and subgaussian projections, the JL lemma has been generalized and extended in several different directions, some of which are briefly outlined below.The generalization of the results of this paper to these more general settings is straightforward.In several areas like geometric functional analysis, compressed sensing, machine learning, etc. the Gaussian width is a very useful measure of the width of a set in Euclidean space (see e.g.[20] and the references therein).It is also closely related to the statistical dimension of a set (see e.g.[37,Chapter 7]).The following analogue of the Johnson Lindenstrauss lemma for sets of bounded Gaussian width was given in [28].It essentially follows from a result of Gordon [22].

Sets of Bounded Gaussian Width
where G is a random d × D Gaussian matrix G, is a subgaussian εdistortion map with respect to P , with probability at least 1 − δ.
The result extends to subgaussian matrices with slightly worse constants.One of the benefits of this version is that the set P does not need to be finite.We refer to [28] for more on the Gaussian width in our context.

Submanifolds of Euclidean Space
For point sets lying on a low-dimensional submanifold of a high-dimensional Euclidean space, one can obtain an embedding with a smaller dimension using the bounds of Baraniuk and Wakin [5] or Clarkson [12], which will depend only on the parameters of the submanifold.Clarkson's theorem is summarised below.
Theorem 6 (Clarkson [12]).There exists an absolute constant c > 0 such that, given a connected, compact, orientable, differentiable µ-dimensional submanifold M ⊂ R D , and ε, δ ∈ (0, 1), a random projection map f : R D → R d , given by v → D d Gv, where G is a d × D subgaussian random matrix, is an ε-distortion map with respect to P , with probability at least 1 − δ, for where C(M ) depends only on M .
Terminal Dimensionality Reduction In a recent breakthrough result, Narayanan and Nelson [31] showed that it is possible to (1 ± O(ε))-preserve distances from a set of n terminals in a high-dimensional space to every point in the space, using only one dimension more than the Johnson-Lindenstrauss bound.A summarized version of their theorem is as follows.The derivation of the second statement is given in the Appendix.
Theorem 7 ([31], Theorem 3.2, Lemma 3.2).Given terminals x 1 , . . ., x n ∈ R D and ε ∈ (0, 1), there exists a non-linear map f : is the bound given in Lemma 2, such that f is an ε-distortion map for any pairwise distance between x i , x j ∈ P , and an O(ε)-distortion map for the distances between any pairs of points (x, u), where x ∈ P and u ∈ R D .Further, the projection of f to its first d − 1 coordinates is a subgaussian ε-distortion map.
As noted in [31], any such map must necessarily be non-linear.Suppose not, then on translating the origin to be a terminal, it follows that the Euclidean norm of each point on the unit sphere around the origin must be O(ε)-preserved, which means that the dimension of any embedding given by a linear map would not be any less than the original dimension.

k-Distance
The distance to a finite point set P is usually taken to be the minimum distance to a point in the set.For the computations involved in geometric and topological inference, however, this distance is highly sensitive to outliers and noise.To handle this problem of sensitivity, Chazal et al. in [9] introduced the distance to a probability measure which, in the case of a uniform probability on P , is called the k-distance.
Definition 8 (k-distance).For k ∈ {1, ..., n} and x ∈ R D , the k-distance of x to P is where NN k P (x) ⊂ P denotes the k nearest neighbours in P to the point x ∈ R D .
It was shown in [4], that the k-distance can be expressed in terms of weighted points and power distance.A weighted point p is a point p of R D together with a (not necessarily positive) real number called its weight and denoted by w(p).The power distance between a point x ∈ R D and a weighted point p = (p, w(p)), denoted by D(x, p) is x − p 2 − w(p), i.e. the power of x with respect to a ball of radius w(p) centered at p.The distance between two weighted points pi = (p i , w(i)) and pj = (p j , w(j)) is defined as . This definition encompasses the case where the two weights are 0, in which case we have the squared Euclidean distance, and the case where one of the points has weight 0, in which case, we have the power distance of a point to a ball.We say that two weighted points are orthogonal when their weighted distance is zero.Let B P,k be the set of iso-barycentres of all subsets of k points in P .To each barycenter Note that, despite the notation, this weight does not only depend on b, but also on the set of points in P for which b is the barycenter.Writing BP,k = { b = (b, w(b)), b ∈ B P,k }, we see from (2) that the k-distance is the square root of a power distance [4] Observe that in general the squared distance between a pair of weighted points can be negative, but the above assignment of weights ensures that the k-distance d P,k is a real function.Since d P,k is the square root of a non-negative power distance, the α-sublevel set of . However, some of the balls may be included in the union of others and be redundant.In fact, the number of barycenters (or equivalently of balls) required to define a level set of d P,k is equal to the number of the nonempty cells in the kth-order Voronoi diagram of P .Hence the number of non-empty cells is Ω n ⌊(D+1)/2⌋ [13] and computing them in high dimensions is intractable.It is then natural to look for approximations of the k-distance, as proposed in [7].In other words, we replace the set of barycenters with P .As in the exact case, dP,k is the square root of a power distance and its α-sublevel set, α ∈ R, is a union of balls, specifically the balls B(p, α 2 − d 2 P,k (p)), p ∈ P .The major difference with the exact case is that, since we consider only balls around the points of P , their number is n instead of n k in the exact case (compare Eq. ( 5) and Eq. ( 4)).Still, dP,k (x) approximates the k-distance [7]: We now make an observation for the case when the weighted points are barycenters, which will be useful in proving our main theorem.
Proof.We have Applying the identity (2), we get b where in (7), we again applied (2) to each of the points p 2,s , with respect to the barycenter b 1 .

Persistent Homology
Simplicial Complexes and Filtrations Let V be a finite set.An (abstract) simplicial complex with vertex set V is a set K of finite subsets of V such that if A ∈ K and B ⊆ A, then B ∈ K.The sets in K are called the simplices of K.A simplex F ∈ K that is strictly contained in a simplex A ∈ K, is said to be a face of A.
A simplicial complex K with a function f : K → R such that f (σ) ≤ f (τ ) whenever σ is a face of τ is a filtered simplicial complex.The sublevel set of f at r ∈ R, f −1 (−∞, r], is a subcomplex of K.By considering different values of r, we get a nested sequence of subcomplexes (called a filtration) of K, ∅ = K 0 ⊆ K 1 ⊆ ... ⊆ K m = K, where K i is the sublevel set at value r i .
The Čech filtration associated to a finite set P of points in R D plays an important role in Topological Data Analysis.

rad(σ) = min
When the threshold α goes from 0 to +∞, we obtain the Čech filtration of P .Čα (P ) can be equivalently defined as the nerve of the closed balls B(p, α), centered at the points in P and of radius α: By the nerve lemma, we know that the union of balls U α = ∪ p∈P B(p, α), and Čα (P ) have the same homotopy type.
Persistence Diagrams.Persistent homology is a means to compute and record the changes in the topology of the filtered complexes as the parameter α increases from zero to infinity.Edelsbrunner, Letscher and Zomorodian [17] gave an algorithm to compute the persistent homology, which takes a filtered simplicial complex as input, and outputs a sequence (α birth , α death ) of pairs of real numbers.Each such pair corresponds to a topological feature, and records the values of α at which the feature appears and disappears, respectively, in the filtration.Thus the topological features of the filtration can be represented using this sequence of pairs, which can be represented either as points in the extended plane R2 = (R ∪ {−∞, ∞}) 2 , called the persistence diagram, or as a sequence of barcodes (the persistence barcode) (see, e.g., [16]).A pair of persistence diagrams G and H corresponding to the filtrations (G α ) and (H α ) respectively, are multiplicatively β-interleaved, (β ≥ 1), if for all α, we have that G α/β ⊆ H α ⊆ G αβ .We shall crucially rely on the fact that a given persistence diagram is closely approximated by another one if they are multiplicatively c-interleaved, with c close to 1 (see e.g.[10]).The Persistent Nerve Lemma [11] shows that the persistent homology of the Čech complex is the same as the homology of the α-sublevel filtration of the distance function.
The Weighted Case.Our goal is to extend the above definitions and results to the case of the k-distance.As we observed earlier, the k-distance is a power distance in disguise.Accordingly, we need to extend the definition of the Čech complex to sets of weighted points.
Definition 12 (Weighted Čech Complex).Let P = {p 1 , ..., pn } be a set of weighted points, where pi = (p i , w(i)).The α-Čech complex of P , Čα ( P ), is the set of all simplices σ satisfying In other words, the α-Čech complex of P is the nerve of the closed balls B(p i , r 2 i = w(i) + α 2 ), centered at the p i and of squared radius w(i) + α 2 (if negative, B(p i , r 2 i ) is imaginary).The notions of weighted Čech filtrations and their persistent homology now follow naturally.Moreover, it follows from (4) that the Čech complex Čα (P ) for the k-distance is identical to the weighted Čech complex Čα ( BP,k ), where BP,k is, as above, the set of iso-barycenters of all subsets of k points in P .
In the Euclidean case, we equivalently defined the α-Čech complex as the collection of simplices whose smallest enclosing balls have radius at most α.We can proceed similarly in the weighted case.Let X ⊆ P .We define the squared radius of X as and the weighted center or simply the center of X as the point, noted c( X), where the minimum is reached.
Our goal is to show that preserving smallest enclosing balls in the weighted scenario under a given mapping, also preserves the persistent homology.Sheehy [35] and Lotz [28], proved this for the unweighted case.Their proofs also work for the weighted case but only under the assumption that the weights stay unchanged under the mapping.In our case however, the weights need to be recomputed in f ( P ).We therefore need a version of [28,Lemma 2.2] for the weighted case which does not assume that the weights stay the same under f .This is Lemma 16, which follows at the end of this section.The following lemmas will be instrumental in proving Lemma 16 and in proving our main result.Let X ⊆ P and assume without loss of generality that X = {p 1 , ..., pm }, where pi = (p i , w(i)).
Lemma 13. c( X) and rad( X) are uniquely defined.
Proof of Lemma 13.The proof follows from the convexity of D (see Lemma 10).Assume, for a contradiction, that there exists two centers c 0 and c 1 = c 0 for X.For convenience, write r = rad( X).By the definition of the center of X, we have For any λ ∈ (0, 1), we have Moreover, for any i, Thus, for any i and any λ ∈ (0, 1), D(c λ , pi ) < r 2 .Hence c λ is a better center than c 0 and c 1 , and r is not the minimal possible value for rad( X).We have obtained a contradiction.Lemma 14.Let I be the set of indices for which D(c, pi ) = rad 2 ( X) and let XI = {p i , i ∈ I}.Then there exist (λ i > 0) i∈I such that c( X) = i∈I λ i p i with i∈I λ i = 1.
Proof of Lemma 14.We write for convenience c = c( X) and r = rad( X) and prove that c ∈ conv(X I ) by contradiction.Let c ′ = c be the point of conv(X I ) closest to c, and c = c be a point on [cc ′ ].Since c − p i < c − p i for all i ∈ I, D(c, pi ) < D(c, pi ) for all i ∈ I.For c sufficiently close to c, c remains closer to the weighted points pj , j ∈ I, than to the pi , i ∈ I.We thus have It follows that c is not the center of X, a contradiction.
Combining the above results with [28,Lemma 4.2] gives the following lemma.
Lemma 15.Let I, (λ i ) i∈I be as in Lemma 14.Then the following holds.
Proof of Lemma 15.From Lemma 14, and writing c = c( X) for convenience, we have We use the following simple fact from [28, Lemma 4.5] (a probabilistic proof is included in the Appendix, Lemma 25).
Substituting in the expression for rad 2 ( X), Let X ∈ R D be a finite set of points and X be the associated weighted points where the weights are computed according to a weighting function w : X → R − .Given a mapping f : R D → R d , we define f (X) as the set of weighted points {(f (x), w(f (x))), x ∈ X}.Note that the weights are recomputed in the image space R d .Lemma 16.In the above setting, if f is such that for some ε ∈ (0, 1) and for all subsets Ŝ ⊆ X we have then the weighted Čech filtrations of X and f ( X) are multiplicatively (1 − ε) −1/2 interleaved.

ε-Distortion maps preserve k-distance Čech filtrations
For the subsequent theorems, we denote by P a set of n points in R D .Our first theorem shows that for the points in P , the pointwise k-distance d P,k is approximately preserved by a random subgaussian matrix satisfying Lemma 2.
Theorem 17.Given ε ∈ (0, 1], any ε-distortion map with respect to P f : R D → R d , where d = O(ε −2 log n) satisfies for all points x ∈ P : Proof of Theorem 17.The proof follows from the observation that the squared k-distance from any point p ∈ P to the set P , is a convex combination of the squares of the Euclidean distances to the k nearest neighbours of p. Since the mapping in the JL Lemma 2 is linear and (1 ± ε)preserves squared pairwise distances, their convex combinations also get (1 ± ε)-preserved.
As mentioned previously, the preservation of the pointwise k-distance does not imply the preservation of the Čech complex formed using the points in P .Nevertheless, the following theorem shows that this can always be done in dimension O(log n/ε 2 ).
Let BP,k be the set of iso-barycenters of every k-subset of P , weighted as in Section 2.2.Recall from Section 2.3 that the weighted Čech complex Čα ( BP,k ) is identical to the Čech complex Čα (P ) for the k-distance.We now want to apply Lemma 16, for which the following theorem will be needed.
Theorem 18 (k-distance).Let σ ⊆ BP,k be a simplex in the weighted Čech complex Čα ( BP,k ).Then, given d ≤ D such that there exists a ε-distortion map f : R D → R d with respect to P , it holds that Proof of Theorem 18.Let σ = { b1 , b2 , ..., bm }, where bi is the weighted point defined in Sec- . Applying Lemma 15 to σ, we have that By Lemma 10, the distance between bi and bj is As this last expression is a convex combination of squared pairwise distances of points in P , it is (1 ± ε)-preserved by any ε-distortion map with respect to P , which implies that the convex combination rad 2 (σ) = 1 2 i,j∈I λ i λ j D(p i , pj ) corresponding to the squared radius of σ in R D , will be (1 ± ε)-preserved.
Let f : R D → R d be an ε-distortion map with respect to P , from R D to R d , where d will be chosen later.By Lemma 15, the centre of f (σ) is a convex combination of the points (f (b i )) m i=1 .Let the centre c( f (σ)) be given by c( f (σ)) = i∈I ν i D( f (b i )).where for i ∈ I, ν i ≥ 0, i ν i = 1.Consider the convex combination of power distances i,j∈I ν i ν j D( bi , bj ).Since f is an ε-distortion map with respect to P , by Lemmas 10 and 2 we get On the other hand, since the squared radius is a minimizing function by definition, we get that .
Combining the inequalities ( 9), (10), (11) gives where the final inequality follows by Lemma 2, since f is an ε-distortion map with respect to P .Thus, we have that which completes the proof of the theorem.
Theorem 19 (Approximate k-distance).Let P be the weighted points associated with P , introduced in Definition 9 (Equ.5).Let, in addition, σ ⊆ P be a simplex in the associated weighted Čech complex Čα ( P ).Then an ε-distortion mapping with respect to P , f : R D → R d satisfies: Proof of Theorem 19.Recall that, in Section 2.2, we defined the approximate k-distance to be dP,k (x) := min p∈P D(x, p), where p = (p, w(p)) is a weighted point, having weight w(p) = −d 2 P,k (p).So, the Čech complex would be formed by the intersections of the balls around the weighted points in P .The proof follows on the lines of the proof of Theorem 18.Let σ = {p 1 , p2 , ..., pm }, where p1 , . . ., pm are weighted points in P , and let c(σ) be the center of σ.Applying again Lemma 15, we get where w(p) = d 2 P,k (p).In the second equality, we used the fact that the summand corresponding to a fixed pair of distinct indices i < j is being counted twice and that the contribution of the terms corresponding to indices i = j is zero.An ε-distortion map with respect to P preserves pairwise distances and the k-distance in dimension O(ε −2 log n).The result then follows as in the proof of Theorem 18.
Applying Lemma 16 to the theorems 18 and 19, we get the following corollary.
Corollary 20.The persistent homology for the Čech filtrations of P and its image f (P ) under any ε-distortion mapping with respect to P , using the (i) exact k-distance, as well as the (ii) approximate k-distance, are preserved upto a multiplicative factor of (1 − ε) −1/2 .However, note that the approximation in Corollary 20 (ii) is with respect to the approximate k-distance, which is itself an approximation of the k-distance by a distortion factor 3 √ 2, (i.e.bounded away from 1 -see ( 6)).

Extensions
As Theorem 18 applies to arbitrary ε-distortion maps, it naturally follows that many of the extensions and variants of the JL Lemma, e.g.discussed in Section 2.1, have their corresponding versions for the k-distance as well.In this section we elucidate some of the corresponding extensions of Theorem 18.These can yield better bounds for the dimension of the embedding, stronger dimensionality reduction results, or easier to implement reductions in their respective settings.
The first result in this section, is for point sets contained in a region of bounded Gaussian width.
Theorem 21.Let P ⊂ R D be a finite set of points, and define S := {(x−y)/ x−y : x, y ∈ P }.Let w(S) denote the Gaussian width of S.Then, given any ε, δ ∈ (0, 1), any subgaussian εdistortion map from R D to R d preserves the persistent homology of the k-distance based Čech filtration associated to P , up to a multiplicative factor of (1 − ε) −1/2 , given that Note that the above theorem is not stated for an arbitrary ε-distortion map.Also, since the Gaussian width of an n-point set is at most O(log n) (using e.g. the Gaussian concentration inequality, see e. + 1.Now applying Theorem 18 to the point set P with the mapping f , immediately gives us that for any simplex σ ∈ Čα ( BP,k ), where Čα ( BP,k ) is the weighted Čech complex with parameter α, the squared radius rad 2 (σ) is preserved up to a multiplicative factor of (1 ± ε).By Lemma 16, this implies that the persistent homology for the Čech filtration is (1 − ε) −1/2 -multiplicatively interleaved.
For point sets lying on a low-dimensional submanifold of a high-dimensional Euclidean space, one can obtain an embedding having smaller dimension, using the bounds of Baraniuk and Wakin [5] or Clarkson [12], which will depend only on the parameters of the submanifold.
Theorem 22.There exists an absolute constant c > 0 such that, given a finite point set P lying on a connected, compact, orientable, differentiable µ-dimensional submanifold M ⊂ R D , and ε, δ ∈ (0, 1), an ε-distortion map f : R D → R d preserves the persistent homology of the Čech filtration computed on P , using the k-distance, provided where C(M ) depends only on M .
Proof of Theorem.The proof follows directly, by applying the map in Clarkson's bound (Theorem 6) as the ε-distortion map in Theorem 18.
Next, we state the terminal dimensionality reduction version of Theorem 18.This is a useful result when we wish to preserve the distance (or k-distance) from any point in the ambient space, to the original point set., such that the persistent homology of the k-distance based Čech filtration associated to P is preserved up to a multiplicative factor of (1 − ε) −1/2 , and the k-distance of any point in R D to P , is preserved up to a (1 ± O(ε)) factor.
Proof.The second part of the theorem follows immediately by applying Theorem 7, with the point set P as the set of terminals.By Theorem 7 (ii), the dimensionality reduction map of [31] is an outer extension of a subgaussian ε-distortion map Π : R D → R d−1 .Now applying Theorem 18 to Π gives the first part of the theorem.

Conclusion and Future Work
k-Distance Vietoris-Rips and Delaunay filtrations.Since the Vietoris-Rips filtration [32,Chapter 4] depends only on pairwise distances, it follows from Theorem 17 that this filtration with k-distances, is preserved upto a multiplicative factor of (1 − ε) −1/2 , under a Johnson-Lindenstrauss mapping.Furthermore, the k-distance Delaunay and the Čech filtrations [32,Chapter 4] have the same persistent homology.Corollary 20 (i) therefore implies that the kdistance Delaunay filtration of a given finite point set P is also (1 − ε) −1/2 -preserved under an ε-distortion map with respect to P .Thus, Corollary 20 (ii) apply also to the approximate k-distance Vietoris-Rips and k-distance Delaunay filtrations.Kernels.Other distance functions defined using kernels have proved successful in overcoming issues due to outliers.Using a result analogous to Theorem 17, we can show that random projections preserve the persistent homology for kernels up to a C(1 − ε) −1/2 factor where C is a constant.We don't know if we can make C = 1 as for the k-distance.Substituting in (13), the claim follows.
Lemma 25 (Lotz [28], Lemma 4.2).Given a set P = {p 1 , . . ., p l } ⊂ R D of points, and a point c ∈ R D such that c = i∈I λ i p i , where I is a subset of indices from [l], and λ i ≥ 0, with i∈I λ i = 1.Then Consequently, where rad 2 (P ) is the squared radius of the minimum enclosing ball of the set P of points.
Proof.The proof again follows directly from Eqn. (13).Suppose we choose two random points X 1 , X 2 independently from P , with the point p i being chosen with probability λ i .Then, Evaluating, we get that where in the last line, we used the fact that X 1 and X 2 are independent.Substituting the above values in the variance identity (13), completes the proof.
A probabilistic proof of Lemma 10 is also provided below.
Lemma 10 -a probabilistic proof.Consider the following random experiment: pick a random point X from p 1 , . . ., p k according to the distribution (λ i ) k i=1 and another independently random point Y from q 1 , . . ., q k according to (µ i ) k i=1 .Using the law of total variance on the variable X − Y , conditioning on Y , we get that Let us consider the terms in the above equation one by one.
1.The LHS has E X − Y 2 , which by the independence of X and Y is clearly equal to k i,j=1 λ i µ j p i − q j 2 .
2. In the RHS, the first term is 3. The second term is V ar Y (E X (X − Y )|Y ), which is equal to V ar Y (b 1 − Y ) = V ar(Y ) = k i=1 µ i b 2 − q i 2 , where the last expression was evaluated directly from the definition of variance, i.e.V ar(Z) = E (Z − E [Z]) 2 , and that for constant a, V ar(a − Z) = V ar(Z).Substituting the above expressions for the terms in (14), we get k i,j=1

Definition 4 .
Given a set S ⊂ R D , the Gaussian width of S is w(S) := E sup x∈S x, g , where g ∈ R D is a random standard D-dimensional Gaussian vector.

Definition 9 (
Approximation).Let P ⊂ R D and x ∈ R D .The approximate k-distance dP,k (x) is defined as dP,k (x) := min p∈P D(x, p) (5) where p = (p, w(p)) with w(p) = −d 2 P,k (p), the negative of the squared k-distance of p.

Lemma 10 .
If b 1 , b 2 ∈ B P,k , and p i,1 , . . ., p i,k ∈ P for i = 1, 2, such that b i = 1 k k l=1 p i,l , and w( g. [6, Section 2.5]), Theorem 21 strictly generalizes Corollary 20.Proof of Theorem 21.By Theorem 5, the scaled random Gaussian matrix f : x → D/d Gx is an ε-distortion map with respect to P , having dimension d ≥

Theorem 23 .
Let P ∈ R D be a set of n points.Then, given any ε ∈ (0, 1], there exists a map f : R D → R d , where d = O log n ε 2

4 .i=1 λ i b 1 − p i 2 .
The final term is E Y (V ar X (X − Y |Y )).Conditioning on Y , the variance V ar X (X − Y |Y ) = V ar(X), i.e.k Since this holds for each value of Y , we get thatE Y [V ar X (X − Y |Y )] = E Y [V ar(X)] = k i=1 λ i b 1 − p i 2 .