A topological approach to inferring the intrinsic dimension of convex sensing data

We consider a common measurement paradigm, where an unknown subset of an affine space is measured by unknown continuous quasi-convex functions. Given the measurement data, can one determine the dimension of this space? In this paper, we develop a method for inferring the intrinsic dimension of the data from measurements by quasi-convex functions, under natural assumptions. The dimension inference problem depends only on discrete data of the ordering of the measured points of space, induced by the sensor functions. We construct a filtration of Dowker complexes, associated to measurements by quasi-convex functions. Topological features of these complexes are then used to infer the intrinsic dimension. We prove convergence theorems that guarantee obtaining the correct intrinsic dimension in the limit of large data, under natural assumptions. We also illustrate the usability of this method in simulations.


Introduction
Data in many scientific applications are often obtained by "sensing" the phase space via sensors/functions that are convex.Convex sensing is a class of problems of inferring the geometry of data that are sampled via such functions.To be precise, let us recall the following The following is perhaps the shortest, albeit naive and incomplete, formulation of a convex sensing problem.A collection of n points X = {x a } n a=1 in an open convex region K ⊂ R d is sensed by measuring the values of m sensors, i.e. quasi-convex functions F = {f i : K → R} m i=1 .Suppose that one has access only to the m × n data matrix M = [M ia ] of sensor values, where but does not have direct access to the information about the dimension d of the underlying space, the open convex region K, the points x a ∈ K, or any further details of the quasi-convex functions f i .Can one recover any geometric information about the sampled region K? At the very minimum, can one infer the dimension d?

Motivation from neuroscience
While the convex sensing problems may be not uncommon in many scientific applications, our chief motivation comes from neuroscience.Neurons in the brain regions that represent sensory information often possess receptive fields.A paradigmatic example of a receptive field is that of a hippocampal place cell [9].Place cells are a class of neurons in rodent hippocampus that act as position sensors.Here the relevant stimulus space K ⊂ R d is the animal's physical environment [13], with d ∈ {1, 2, 3}, and x ∈ K is the animal's location in this space.Each neuron is activated with a certain probability that is a continuous function f : K → R ≥0 of the animal's position in space.
In other words, the probability of a single neuron's activation at a time t is given by p(t) = f (x(t)), where x(t) is the animal's position.For each neuron, the function f is called its place field, and is approximately quasi-concave 1 (see examples of place fields in Figure 1).Place fields can be easily computed when both the neuronal activity data and the relevant stimulus space are available.A number of other classes of sensory neurons in the brain also possess quasi-concave receptive fields, that is, each such neuron responds with respect to a quasi-concave probability density function f : K → R ≥0 on the stimulus space.
In many situations, the relevant stimulus space for a given neural population may be unknown.This raises a natural question: can one infer the dimension of the stimulus space with quasi-concave receptive fields from neural activity alone?More precisely, given the neural activity of m neurons with quasi-concave receptive fields f i : K → R, can one "sense" the stimulus space by sampling the neural activity at n moments of time as M ia = f i (x(t a ))?Here one has access to the measurements M ia , but not the objects on the right-hand-side.This motivates the naive formulation of the convex sensing problem above.

The geometry of convex sensing data
The convex sensing problem possesses a natural transformation group.If φ : R → R is a strictly monotone-increasing function, 2 then the sub-level sets of the composition φ • f and f are identical up to an order-preserving re-labeling.Thus, if φ is a strictly monotone-increasing function, then f is quasi-convex if and only if φ • f is quasi-convex.It is easy to show that two sets of real numbers have the same ordering, that is, a 1 < a 2 < • • • < a n and b 1 < b 2 < • • • < b n if and only if there exists a strictly monotone function φ : R → R, such that b i = φ(a i ) for all i.It thus follows that it is only the total order of each row in the matrix M in equation ( 1) that constrain the geometric features of the point cloud X n = {x 1 , ..., x n } in a convex sensing problem.This motivates the following definition.
Definition 1.2.Let V be a finite set.A sequence of length k in V is a k-tuple s = (v 1 , ..., v k ) of elements in V without repetitions.We denote by S k [V ] the set of all sequences of length k on V .
If M is an m × n real matrix that has distinct entries in each row, then each row yields a sequence of length n.For the sake of an example, consider a real-valued matrix Since the first row has the ordering 2.56 < 3.96 < 4.19 < 8.23, the total order < 1 on V = {1, 2, 3, 4} is 3 < 1 4 < 1 2 < 1 1.Thus, the order sequence for the first row is s 1 = (3, 4, 2, 1) ∈ S 4 [V ].Similarly, the order sequence for the second row is s 2 = (2, 1, 3, 4).
It is easy to see that if the sampled points X n and the quasi-convex functions {f i } i∈[m] are generic in some natural sense 3 , then each row of the data matrix M ia = f i (x a ) has no repeated values with probability 1.We denote the set of all "generic" data matrices as M o m,n def = {m × n real-valued matrices with no repeated entries in each row}.
For any such matrix M = [M ia ] ∈ M o m,n , one can define a collection S(M ) of m maximal-length sequences as S(M ) = {s 1 , . . ., s m }, where each sequence s i = (a i1 , ..., a in ) ∈ S n [n] 4 is obtained from the total order of the i-th row: The geometry of a convex sensing problem for a data matrix M ∈ M o m,n is constrained only by the set of m sequences S(M ) ⊂ S n [n].The following observation makes it possible to re-state any convex sensing problem purely in terms of embedding a set of points that satisfy certain convex hull non-containment conditions.Let conv(x 1 , ..., x k ) denote the convex hull of a collection of points x 1 , ..., x k in R d .Lemma 1.3.For any collection of n distinct points {x 1 , x 2 , . . ., x n } ⊂ R d , the following statements are equivalent: (i) There exists a continuous quasi-convex function f : R d → R such that (ii) For each k = 2, ..., n, x k / ∈ conv(x 1 , ..., x k−1 ).
An important implication of Corollary 1.4 is that a convex sensing problem without any further constraint always has a two-dimensional solution. 7Recall that a set of points is convexly independent if none of these points lies in the convex hull of the others.
Corollary 1.5.For every matrix M ∈ M o m,n and convexly independent points Note that the situation where all the sampled points are convexly independent is non-generic, for large n.If one explicitly excludes this situation, then the combinatorics of S(M ) constrains the minimal possible dimension d of the geometric realization, as illustrated by the following example.
Example 1.6.Let n > 2, and M ∈ M o n−1,n be a matrix obtained as in equation ( 1) with continuous quasi-convex functions f i , whose (n − 1) sequences S(M ) = {s 1 , s 2 , . . ., s n−1 } are of the form where each of the "• • • " in s i is an arbitrary permutation of [n] \ {n, i}.Assume that at least one point in The proof is given in Section 6.1 of the Appendix.

Dimension inference in a convex sensing problem
It is clear from Corollary 1.5 and Example 1.6, that the problem of dimension inference is wellposed only in the presence of some genericity assumptions that guarantee convex dependence of the sampled points.Instead of making such an assumption explicit, we take a probabilistic perspective, wherein points are drawn from a probability distribution that is generic is some natural sense.We assume that there are three objects (which are unknown) that underly any "convex sensing" data: (iii) a probability measure P K on K.
In relation to the neuroscience motivation in Section 1.1, K ⊆ R d is the stimulus space, each function f i is the negative of the receptive field of a neuron, and P K is the measure that describes the probability distribution of the stimuli.To guarantee that the convex sensing data are generic, we impose the following regularity assumptions.
Definition 1.7.A regular pair is a pair (F, P K ) that satisfies the conditions (i)-(iii) above as well as the following two conditions: (R1) The probability measure P K is equivalent to the Lebesgue measure on K.
(R2) Level sets of all functions in F are of measure zero, i.e. for every i ∈ [m] and ∈ R, The assumption (R1) ensures that the domain K is well-sampled, and thus the probability that the points x 1 , ..., x n are convexly independent approaches zero in the limit of large n.The assumption (R2) guarantees, with probability 1, that the data matrix M has no repeated values in each row, and thus is in M o m,n .
In this paper, we develop a method for estimating the dimension of convex sensing data.Intuitively, such an estimator needs to be consistent, i.e. "behave well" in the limit of large data.In addition to the conditions imposed on a regular pair, other properties of a pair (F, P K ) may be needed, depending on the context.It is therefore natural to define a consistent dimension estimator in relation to a particular class of regular pairs.Since an estimator may rely on different parameters for different regular pairs, we consider a one-parameter family of such estimators, motivating the following definition of consistency.Definition 1.9.Let RP be a class of regular pairs.For each regular pair (F, P K ) ∈ RP we denote by d(F, P K ) the dimension d, where the open convex set K ⊆ R d is embedded.A one-parameter family of functions d(ε) : M o m,n → N is called an asymptotically consistent estimator in RP , if for every regular pair (F, P K ) ∈ RP , there exists l > 0, such that for every ε ∈ (0, l) and each sequence of matrices M n ∈ M o m,n , sampled from (F, P K ), The structure of this paper is as follows.In Section 2, we define two multi-dimensional filtrations of simplicial complexes: the empirical Dowker complex Dow(S(M )) that can be associated to a data matrix M , and the Dowker complex Dow(F, P K ), that can be associated to a regular pair (F, P K ).Using an interleaving distance between multi-filtered complexes, we prove (Theorem 2.9) that for a sequence {M n } of data matrices, sampled from a regular pair (F, P K ), Dow(S(M n )) → Dow(F, P K ) in probability, as n → ∞.
In Section 3, we develop tools for estimating the dimension of (F, P K ) using persistent homology.We define a set of maximal persistent lengths associated to Dow(F, P K ) and prove (Lemma 3.8) that a lower bound of the dimension of (F, P K ) can be derived from these persistent lengths.Next we define another set of maximal persistence lengths from Dow(S(M n )) and prove (Theorem 3.10) that they converge to the maximal persistence lengths associated to Dow(F, P K ) in probability, in the limit of large sampling of the data.The rest of Section 3 is devoted to two subsampling procedures for different practical situations, as well as simulation results that illustrate that the correct dimension can be inferred with these two methods.
In Section 4, we introduce complete regular pairs and prove (Theorem 4.3) that the lower bound in Lemma 3.8 is equal to the dimension d(F, P K ) for complete regular pairs.This estab-lishes (Theorem 4.4) that the dimension estimator introduced in Section 3.3 is an asymptotically consistent estimator in the class of complete regular pairs.In Section 5, we define an estimator that can be used to test (Theorem 5.5) whether the data matrix is sampled from a complete regular pair.The Appendix (Section 6) contains the proofs of the main theorems as well as some technical supporting lemmas.

Empirical Dowker complex and the interleaving convergence theorem
In this section, we define the empirical Dowker complex from the m sequences induced from the rows of the data matrix M and the Dowker complex from the regular pair (F, P K ) and prove that the empirical Dowker complex converges to the Dowker complex in probability.These complexes are both examples of multi-filtered simplicial complexes.We define the empirical Dowker complexes from a collection of sequences of maximal length Here ∆({σ a } a∈[n] ) denotes the smallest simplicial complex containing the faces {σ a } a∈[n] .This filtered complex is called the empirical Dowker complex of S.
Recall from Section 1.2 that the relevant geometric information of the m × n data matrix M ∈ M o m,n is contained in the collection of m sequences S(M ) = {s 1 , ..., s m }, where s i ⊂ S n [n] is of length n and records the total order induced by the i-th row of M .Therefore, we can consider the empirical Dowker complex Dow(S(M )) derived from the data matrix M .Note that our definition of empirical Dowker complex is a multi-parameter generalization of the Dowker complex defined in [5].Specifically, the one-dimensional filtration of simplicial complex (indexed over t) Dow(S(M ))(n • t, ..., n • t) is equal to the Dowker complex defined in [5].
Recall that, for a collection A = {A i } i∈[m] of sets, the nerve of A, denoted nerve(A), is the simplicial complex on the vertex set [m] defined as nerve(A) The following lemma is immediate from Definition 2.2.
Lemma 2.3.Let S = {s 1 , ..., s m } be a collection of sequences on [n] of length n.For each i ∈ [m] and t ∈ R, consider where ≤ i is the total order on [n] induced by s i .Then .
Next we connect the combinatorics of Dow(S(M )) to the geometry.From Lemma 2.3, we know that Dow(S(M )) is the nerve of {A (i) To define an analogue of Dow(S(M )) from the regular pair (F, P K ), we use the following lemma (see the proof in Section 6.2) to define an analogue of A (i) (t) from (F, P K ).Lemma 2.4.Let f : K → R be a continuous function with P K (f −1 ( )) = 0 for all ∈ R, where P K is a probability measure on a convex open set K and P K is equivalent to the Lebesgue measure on K. Then there exists a unique strictly increasing continuous function λ : (0, 1) → R such that, for all t ∈ (0, 1), For a regular pair (F, P K ) = ({f i : K → R} i∈[m] , P K ), by Lemma 2.4, for each i ∈ [m], there exists a unique strictly increasing continuous function λ i : (0, 1) → R such that P K (f i (−∞, λ i (t)) = t.Using λ i (t), the following definition provides a continuous analogue of A (i) (t).Definition 2.5.Let (F, P K ) = ({f i } i∈[m] , P K ) be a regular pair.For each i ∈ [m] and t ∈ (0, 1), define where λ i : (0, 1) → R is the unique function that satisfies P K (f −1 i (−∞, λ i (t))) = t.For convenience, we also define is the sublevel set of f i whose P K measure is equal to t.
An illustration of K (i) (t) can be found in Figure 2.They are simply sublevel sets of f i rescaled with respect to the P K measure.On the other hand, for a point cloud X n = {x 1 , • • • , x n } sampled from P K , if we identify [n] with X n via a ↔ x a , then A (i) (t) may be interpreted as the set of points in X n that is inside an approximation of K (i) (t).Informed by Lemma 2.3, we use K (i) (t) to define the continuous version of Dowker complex.Definition 2.6.Let (F, P K ) be a regular pair.Define a multi-filtered complex Dow(F, P K ), indexed over [0, 1] m , by This multi-filtered complex is called the Dowker complex induced from (F, P K ).

The complex Dow(S(M )
) is what we can obtain from the data matrix M , but it does not capture the whole geometric information of (F, P K ).On the other hand, Dow(F, P K ) reflects the whole geometric information but is not directly computable.Since A (i) (t i ) is an approximation of K (i) (t i ), we might expect Dow(S(M )) approximates Dow(F, P K ).As we shall see, this is the case but, for comparing them formally, we need the concept of the interleaving distance.
Definition 2.7.For a multi-filtered complex K indexed over R m and > 0, the -shift of K, denoted K + , is the multi-filtered complex defined by For two multi-filtered complexes K and L indexed over R m , the simplicial interleaving distance between K and L is defined as Note that this interleaving distance is between multi-filtered simplicial complexes while the standard interleaving distance in topological data analysis is between persistence modules, namely, the level where the homology functor has been applied on the multi-filtered complex (see, e.g.[8], for the standard definition of interleaving distance between multi-dimensional persistence modules).Similar to the standard interleaving distance, the simplicial interleaving distance d INT defined here is also a pseudo-metric; namely, The definition of simplicial interleaving distance involves a shift of indices and that is why the two multi-filtered complexes to be compared are required to be indexed over the whole R m .Since both Dow(S(M )) and Dow(F, P K ) are indexed only over [0, 1] m , to compare them in terms of interleaving distance, we first need to extend their indexing domain to R m .The definition below is a natural way to extend the indexing domain.
Definition 2.8.For D = Dow(S(M )) or Dow(F, P K ) and (t 1 , ..., t m ) ∈ R m , define With the above notations, we state one of our main theorems.Theorem 2.9 (Interleaving Convergence Theorem).Let (F, P K ) be a regular pair and M n be an m × n data matrix sampled from (F, P K ).Then the simplicial interleaving distance between Dow(S(M n )) and Dow(F, P K ) converges to 0 in probability as n → ∞; namely, for all > 0, The proof of Theorem 2.9 is given in Section 6.2.In Section 3, we use Theorem 2.9 to infer a lower bound for the dimension of (F, P K ).
3 Estimating the stimulus space dimension

Persistence modules and maximal persistence length
First we recall the definition of persistence modules, persistence intervals and persistence diagrams; for more details see, e.g.Chapter 1 of [11].Then we define the maximal persistence length for a 1-dimensional filtration of simplicial complexes.We fix a ground field F, which is normally taken to be F 2 for computational reasons; all the statements here do not depend on the choice of the field.Definition 3.1.A persistence module M indexed over an interval [0, T ] is a collection {M t } t∈[0,T ] of vector spaces over F along with linear maps φ t s : M s → M t for every s ≤ t in [0, T ] such that φ u s = φ u t • φ t s , and φ t t = id Mt for all s ≤ t ≤ u in [0, T ].
A well-known structural characterization of a persistence module is via its persistence intervals (or equivalently, its persistence diagram).To talk about persistence intervals, we would need to define the direct sum of persistence modules and interval modules.Definition 3.2.Let M = {M t } t∈[0,T ] and N = {N t } t∈[0,T ] be persistence modules over the same index interval [0, T ].Let {φ t s : s, t ∈ [0, T ], s ≤ t} and {ψ t s : s, t ∈ [0, T ], s ≤ t} be the linear maps of M and N .The direct sum of M and N , denoted M ⊕ N , is the persistence module, defined by (M ⊕ N ) t def = M t ⊕ N t along with the linear maps (φ t s ) ⊕ (ψ t s ) : defined by (I J ) t def = F for all t ∈ J and (I J ) t def = 0 for all t / ∈ J, along with the identity linear maps from (I J ) s to (I J ) t for every s ≤ t in J and zero maps from (I J ) s to ( The next decomposition theorem is a structural theorem that characterizes persistence modules and guarantees the existence and uniqueness of persistence intervals (see, e.g., Section 1.1 and 1.2 of [11] and references therein).An important class of persistence modules is obtained from a 1-dimensional filtration of simplicial complexes by applying the homology functors H k ( • ; F), k = 0, 1, 2, .... Specifically, for a 1-dimensional filtration of simplicial complexes K = {K t } t∈[0,T ] and a fixed nonnegative integer k, we have the persistence module For each k, we may use the persistence diagram of H k (K; F) for analysis.For our purpose, instead of the whole diagram, we summarize the diagram by only looking at the longest length among all persistence intervals, which we formally define below: and call it the maximal persistence length in dimension k.
This definition is similar to the one used in Section 3 of [2]. 9Normally, the length of a persistence interval in H k (K; F) is viewed as its significance in dimension k.Therefore, l max (k, K), the maximum among such interval lengths, is viewed as the significance of K in dimension k.
3.2 L k (F, P K ) and its relation to the dimension of (F, P K ) In this section, from the regular pair (F, P K ), we define quantities that we use to bound the dimension d(F, P K ) from below.We start with the following notation.Definition 3.6.Given (F, P K ), where F = {f i : K → R} i∈[m] is a collection of quasi-convex functions defined on a convex open set K and P K is a probability measure on K.For x ∈ K, we define T i (x) Figure 3: The function T i (x) may be regarded as the P K -rescaled version of f i (see Figure 3 for an illustration).Now we define a one dimensional filtration of simplicial complexes K x that are used to infer a lower bound of the dimension d(F, P K ).The geometry underlying the definition is depicted in Figure 4. Throughout Section 3, we fix an arbitrary coefficient field F when taking homology; namely, for a filtration of simplicial complexes K and a nonnegative integer k, H k (K) def = H k (K; F).Definition 3.7.Let (F, P K ) be a regular pair, where Define a one dimensional filtered complex K x , indexed over t ∈ [0, t max (x)], by For every nonnegative integer k, we define As illustrated in Figure 4, if x is "central" in some appropriate sense (see Definition 4.1 in Section 4), a (d(F, P K ) − 1)-dimensional sphere is expected to show up and persist for a significant amount of time.In general, L k (F, P K ) can at least be used to derive a lower bound for the dimension of the regular pair (F, P K ) due to the following lemma.Lemma 3.8.Let (F, P K ) be a regular pair.Then, for k ≥ d(F, P K ), L k (F, P K ) = 0.In particular, Proof.For notational simplicity, in this proof, we denote Since the functions f i are quasi-convex, intersections of convex sets are convex and convex sets are contractible, the set {f i=1 is a good cover.Thus, by nerve lemma (see, e.g., Theorem 10.7 in [1] or Corollary 4G.3 in [7]), we have the following homotopy equivalence: for all k ≥ d (see, e.g., Proposition 3.29 in [7]).Thus, for k Combining with (11), we obtain, for k ≥ d and (t 1 , ..., t m ) ∈ [0, 1] m , H k (Dow(F, P K )(t 1 , ..., t m )) = 0. Therefore, for k ≥ d, l max (k, K x ) = 0, for all x ∈ K, and L k (F, P K ) is defined with respect to a regular pair (F, P K ) and thus is not directly computable from discrete data.In Section 3.3, we follow an analogous approach in defining L k (F, P K ) to define L k (M ) and prove that L k (M ) converges to L k (F, P K ).

L k (M ) and its convergence to L k (F, P K )
In Theorem 2.9, we see that, for the data matrix M , Dow(S(M )) approximates Dow(F, P K ) with high probability.Thus, it is natural to use Dow(S(M )) to define an analogue L k (M ) of L k (F, P K ).Definition 3.9.Let M ∈ M o m,n and S(M ) = {s 1 , ..., s m } be the collection of m sequences induced from the rows of M (s i corresponds to row i).For a ∈ [n] and i ∈ [m], denote For a ∈ [n], which corresponds to the a-th column of the data matrix M , let tmax (a) Define a one dimensional filtered complex Ka , indexed over t ∈ [0, tmax (a)], by See Definition 2.2 for the definition of Dow(S(M )).For every nonneative integer k, we define Since Dow(S(M )) approximates Dow(F, P K ), intuitively, Ka (t) approximates K xa (t) and L k (M ) approximates L k (F, P K ).With the help of Theorem 2.9 and the Isometry Theorem in topological data analysis (see e.g.Theorem 6.16 in Section 6 of [11]), these intuitions are justified as follows: Theorem 3.10.Let (F, P K ) be a regular pair.Assume that K is bounded and each f i ∈ F can be continuously extended to the closure K. Let M n be an m × n matrix sampled from (F, Moreover, the rate of convergence is independent of k. 10 The proof of Theorem 3.10 is given in Section 6.4.According to Theorem 3.10, for each nonnegative integer k, L k (M n ) are consistent estimators of L k (F, P K ) and they converge uniformly in probability.Thus, by Lemma 3.8 and Theorem 3.10, we can estimate a lower bound for the dimension of (F, P K ) from the data matrix M , via looking at the values of L k (M n ).Formally, we can define the following estimator of d low (F, P K ).Definition 3.11.For > 0 and M ∈ M o m,n , we define dlow (M, ) As a consequence of Lemma 3.8 and Theorem 3.10, it is immediate that dlow (M n , ) is a consistent estimator, for appropriately chosen .From Corollary 3.12, dlow (M n , ) can be used as a consistent estimator of d low (F, P K ).However, we need to know how to choose an appropriate for dlow (M n , ), and hence estimation of L k (F, P K ) is still necessary.Therefore, in practice, we suggest one use a statistical approach estimating L k (F, P K ) to infer d low (F, P K ), instead of using dlow (M n , ) directly.The details are discussed in Section 3.5.

Algorithm for Ka and L k (M )
For ease of implementation, we combine Definition 3.9 and Definition 2.2 and summarize them as algorithms for the computaion of Ka and L k (M ).Algorithm 1 is for Ka .OUTPUT: [ Ka (t)] t : a filtration of simplicial complexes, where t ∈ {0, 1/n, 2/n, ..., tmax (a)} (see Step 2 or Denfinition 3.9 for definition of tmax (a)).
The next algorithm, Algorithm 2, is for computing L k (M ).Note that, in the algorithm, PersistenceIntervals is a function with two inputs, a filtration of simplicial complexes and a positive integer that is set to limit the dimension of the computation of persistent homology to avoid possible intractable computational complexities.As the name suggests, the output of PersistenceIntervals is the persistence intervals of the first input in dimensions less than or equal to the second input.
(2) d up : a positive integer, used to limit the dimension of the computation of persistent homology; namely, the persistent homology is only computed for dimension 0, 1, ..., d up to make it computationally feasible.

STEPS:
Step 1: For a ∈ [n], compute where PersistenceIntervals( Ka , d up ) computes the persistence intervals of Ka in dimensions up to d up .

How to use the algorithms under different situations
The worst case complexity of a standard algorithm for computing the persistent homology of a 1-dimensional filtration of simplicial complexes is cubical in the number of simplices (see, e.g.Section 5.3.1 in [10] and references therein).Since each Ka starts from the empty simplicial complex and ends at the full simplex ∆ m−1 , we would need to go through all faces of ∆ m−1 .However, since we limit the computation only in dimension 0, 1, ..., d up , where d up ≤ m − 1 is pre-set, we only need to consider the (min{d up + 1, m − 1})-skeleton of ∆ m−1 .Therefore, for our algorithm, the number of faces in the 1-dimensional filtration is min{dup+2,m} k=0 m k which is O(m dup+2 ).Since there are n such a ∈ [n], the worst case complexity of computing ), which is of degree 3d up + 6 in m but only linear in n.
Since the algorithm is linear in n, even in the case when n is large, as long as m is not too large, the algorithm is still tractable.Moreover, to use the full power of Theorem 3.10, we would want n to be large.In the case when n is large, we may subsample the points (i.e. the columns) to see how large the variance of L k (M n ) is; this is called bootstrap in statistics.Moreover, we can implement the subsampling for different numbers of columns and get the convergence trend.
On the other hand, to infer the dimension d(F, P K ), we will need at least m ≥ d(F, P K ) + 1.Thus, we want m to be not too small.However, since the computational complexity of L k (M n ) goes up in high degree order with respect to m, we cannot have m being too large.In the case when m is too large, we can overcome the computational difficulty by subsampling the functions (i.e. the rows); namely, pick randomly m s , say m s = 10, functions, which correspond to their respective m s rows of M n and compute the L k of the submatrix thus formed; repeat this process many times and see how the result is distributed.
We elaborate on these two methods (subsampling points or functions) in the following two subsections.We also implement the methods for estimating the embedding dimension in their appropriate situations, plot the results and give some principles for decision making (i.e.deciding, given the the plot and k, whether we accept L k (F, P K ) > 0 or not).

Subsample points when n is sufficiently large
In the case when n is sufficiently large, say n ≥ 300, we are allowed to subsample, say n s , points (i.e.columns of M n ) and obtain the variance information.Moreover, letting n s go up, we can obtain further how the trend of convergence goes, which, by Theorem 3.10, should converge to the true L k (F, P K ).The technique of subsampling is called bootstrap in statistics.
Figure 5 is the boxplots 11 of L k (M ns ) obtained by implementing this idea under different settings of (d, m, n), where d = d(F, P K ) is the dimension of (F, P K ), m is the number of functions and n is the number of data points.Here, we choose (m, n) to be (10, 350) throughout, where m is 11 The boxplot of a collection of real numbers is a box together with a upper whisker and a lower whisker attached to the top and bottom of the box and possibly some dots on top of the upper whisker or below the lower whisker.From the box part, one can read out the first quartile Q1 (25th percentile), medium Q2 (50th percentile) and third quartile Q3 (75th percentile) which are the bottom end, line in-between, and the top end of the box.The value Q3 − Q1 is called the interquartile range (IQR).The lower whisker and upper whisker, resp., label the values Q1 − 1.5IQR and Q3 + 1.5IQR, resp.Values outside of the whiskers are regarded as outliers and labelled by dots.moderate for computation and n is sufficiently large for subsampling.Subsampling is repeated 100 times for each boxplot.To compare with the result of a purely random matrix, we also generate a 10 × 350 matrix whose entries are iid from Unif(0, 1) and compute its L k 's.The details of how the boxplots are generated are in the caption of Figure 5.Let us elaborate a little more on Figure 5.The decision principle we propose to follow is that, on each boxplot of L k (M ns ), if the first quartile14 Q1 is not greater than 0, reject L k (F, P K ) > 0; otherwise, accept L k (F, P K ) > 0.
For panel (a) where d = 3, we can see that, as n s goes up, the variance of L k (M ns ) for each k goes down.For k = 2, the first quartile of L k (M ns ) is greater than 0 even for n s = 50; for k ≥ 3, L k (M n ) stays at 0 with only some noise-like dots all the time.According to this principle, we can conclude d ≥ 3 for this regular pair.In fact, as we know in advance, d = 3.
For It is observed that, for higher d, we would need n s to be larger to make the best conclusion (i.e.inferring the true dimension).However, by making n s go up, the variance of L k (M ns ) goes down and we may also use this information.Therefore, for small sample case, one may count on this convergence behavior and develop other principles by quantifying the trend of convergence.For example, in panel (c) where d = 5, when n s goes up from 50 to 150, we observe that L 4 (M ns ) pokes out from noiselike outliers to a filled box.This trend suggests that we "may accept" L 4 (F, P K ) > 0. We will leave it to the practitioners to decide their own principles on how to use the convergence trend information in their fields of interest.

Subsample functions when m is large
As we mentioned earlier, the worst case computational complexity of L k (M n ) goes up although polynomially but with degree 3d up + 6 (high degree) in m, the number of rows of M n .To overcome this difficulty, we propose to subsample the rows (i.e. the collection of functions) of M n .Specifically, for a fixed number m s < m, we randomly choose m s rows of M n and construct the m s ×n submatrix M ms×n accordingly, compute L k (M ms×n ) and repeat the process as many times as assigned.Figure 6 is the boxplot of L k (M ms×n ) with m s = 10, repeated N rep = 1000 times, under different settings.Notice that throughout the plots, m = 100, n = 150, m s = 10 and N rep = 1000.We still adopt the principle as last subsection that we only accept L k (F, P K ) > 0 when the first quartile Q1 is above 0. Therefore, the concluding lower bounds for the plots are 2, 3, 4 and 4, resp., for panel (a), (b), (c) and (d) in Figure 6.A lower bound for d(F, P K ) may not be very satisfactory.In Section 4 and 5, we develop some theory and methods to decide whether the lower bound obtained in this section is indeed the dimension d(F, P K ).

dlow (M, ε) as an asymptotically consistent dimension estimator in the class of complete regular pairs
We establish in Section 3 that a lower bound d low (F, P K ) of d(F, P K ) is generally inferable from sampled data.Here we provide a sufficient condition for d low (F, P K ) = d(F, P K ); this ensures that the dimension d(F, P K ) can be inferred with high probability.Recall that the conic hull of a set S ⊆ R d , denoted cone(S), is the set cone(S) Definition 4.1.Let (F, P K ) be a regular pair, where is called the type 1 central region of (F, P K ).
Definition 4.2.A regular pair (F, P K ) is said to be complete if its Cent 1 is non-empty.
It is perhaps intuitive (see Figure 4 on page 14) that, for a sufficiently nice complete regular pair, the lower bound in Lemma 3.8 is indeed the dimension d(F, P K ).More precisely, Theorem 4.3.Let (F, P K ) be a regular pair, where F = {f i } i∈[m] and each f i : K → R is differentiable.If (F, P K ) is complete, then the lower bound in Lemma 3.8 is indeed the dimension of the regular pair, i.e. d low (F, The proof is given in Section 4.1.An immediate corollary of the above theorem and Corollary 3.12, is the following Theorem 4.4.Let (F, P K ) be a regular pair satisfying the conditions in Theorem 3.10, where F = {f i } i∈[m] and each f i : K → R is differentiable.If (F, P K ) is complete regular pair with dimension d = d(F, P K ), and matrices M n ∈ M o m,n are sampled from (F, P K ), then for every In other words, d(ε) : ) is an asymptotically consistent estimator in the class of complete regular pairs.Proof.By Theorem 4.3, d low (F, P K ) = d(F, P K ).Moreover, by Corollay 3.12, Thus, the result follows.

Proof of Theorem 4.3
Recall, the following notation from Section 3. Let (F, P K ) = ({f i } i∈[m] , P K ) be a regular pair; for any t ∈ [0, 1], we denote , where λ i (t) is a monotone-increasing function that satisfies P K (f −1 i (−∞, λ i (t))) = t.For any x ∈ K, we also denote Theorem 4.3 follows from the following key lemma.
Lemma 4.5.Let ({f i } m i=1 , P K ) be a complete regular pair.Suppose x 0 ∈ Cent 1 , then there exists ε > 0 such that nerve Proof of Theorem 4.3.By Lemma 3.8, d low (F, P K ) ≤ d(F, P K ).We therefore only need to prove Therefore, L d−1 (F, P K ) > 0, completing the proof.
Proof of Lemma 4.5.For each i ∈ [m], we denote by ⊆ Ω i for any t ≥ 0, and the nerve in the left-hand-side of ( 20) is a subcomplex of the nerve For each non-empty σ ∈ nerve {Ω i } i∈[m] , the subset i∈σ Ω i is open and non-empty and hence has a nonzero P K measure.Thus 15 there exists ε σ > 0, such that i∈σ K (i) (T i (x 0 ) − t) is non-empty for any t ∈ [0, ε σ ).Choosing ε to be the minimum of all such ε σ thus guarantees that nerve It thus suffices to prove (20) for t = 0. Since each Ω i is open and convex, by the nerve lemma 16 it is enough to show that i∈[m] Ω i ∼ S d−1 .Moreover, since x 0 lies on the boundary of each Ω i , the union {x 0 } ∪ i∈[m] Ω i is star-shaped.Therefore, it suffices to prove that there exists η > 0 such that where Suppose no such η > 0 exists, then, for all n ∈ N, there exists a unit vector , there is an infinite subsequence {v n j }, that converges to a particular v * ∈ S d−1 .Since all f i are differentiable, using 15 See Lemma 6.24 in Section 6.5. 16See, e.g., Theorem 10.7 in [1] or Corollary 4G.3 in [7].
5 Testing the completeness of (F, P K ) from sampled data Theorem 4.3 establishes that completeness of (F, P K ) implies d low (F, P K ) = d(F, P K ), and thus the data dimension d(F, P K ) can be inferred from sampled data.Unfortunately, completeness cannot be directly tested from sampled data, since the gradient information is not directly accessible from discrete samples.Here we consider a different notion of central region, Cent 0 ⊆ K, which, under some generic assumtion, is indistinguishable from Cent 1 in the probability measure P K (Lemma 5.3).We also establish that the probability measure of Cent 0 can be approximated from sampled data (Theorem 5.5).This enables one to test completeness of a regular pair from sampled data.
Definition 5.1.Let (F, P K ) be a regular pair, the subset is called the type 0 central region of (F, P K ).
The proof is given in Section 5.1.It can be shown that Cent 1 of a regular pair is an open set (see Lemma 6.25 in the Appendix).Thus completeness of a regular pair (F, P K ) is equivalent to P K (Cent 1 ) > 0. Lemma 5.3 ensures that completeness of a regular pair in general position is equivalent to P K (Cent 0 ) > 0. In order to test whether P K (Cent 0 ) > 0, one can use the following natural discretization.Definition 5.4.For a matrix M ∈ M o m,n , the set is called the discretized central region.
If a matrix M ∈ M o m,n is sampled from a regular pair, then for each a ∈ [n], the set {b ∈ ), and Cent 0 (M ) can be thought of as an approximation of Cent 0 .The following theorem confirms this intuition.
Theorem 5.5.Let M n ∈ M o m,n be sampled from a regular pair, then 1 n #( Cent 0 (M n )) converges to P K (Cent 0 ) in probability: . The proof involves technicalities used in proving the Interleaving Convergence Theorem (Theorem 2.9) and is given in Section 6.7 in the Appendix.Theorem 5.5 establishes that 1 n #( Cent 0 (M n )) serves as an approximation of P K (Cent 0 ), and thus enables one to to test whether P K (Cent 0 ) > 0. Thus, by Lemma 5.3, this provides a way to test the completeness of the underlying regular pair (F, P K ).

Proof of Lemma 5.3
First we prove the first part of Lemma 5.3.Lemma 5.6.Let (F, P K ) = ({f i } i∈[m] , P K ) be a regular pair, where each function in F is differentiable.Assume that F is in general position, then denote the union of critical points of functions in F. Since F is in general position, K has Lebesgue measure zero.Assume x 0 ∈ Cent 1 \ K , and thus It can be easily shown, see e.g.Theorem 3.2.3 in [4], that if f is differentiable and quasi-convex on an open convex K with ∇f where the last equality follows from (24), as one can chose u = x − x 0 .This implies x 0 ∈ Cent 0 .Therefore, Cent 1 \ K ⊆ Cent 0 and P K (Cent 1 \ Cent 0 ) ≤ P K (K ) = 0.
To prove the second half of Lemma 5.3, we first recall that a convex cone C ⊆ R d is called flat if there exists w = 0 such that both w ∈ C and −w ∈ C. Otherwise, it is called salient.If a convex cone C is closed and salient, then there exists 17 w ∈ R d such that u, w < 0, for all non-zero u ∈ C. Lemma 5.7.Let (F, P K ) = ({f i } i∈[m] , P K ) be a regular pair, where each f i is differentiable, then Thus, it suffices to prove that the cone C 0 is flat.Suppose that the cone C 0 is not flat, then there exists w such that u, w < 0, for all non-zero u ∈ C 0 .In particular, ∇f i (x 0 ), w < 0, ∀ i ∈ [m].Let us show that ∀ i ∈ [m], there exists α i > 0 such that x 0 + α i w ∈ f −1 i (−∞, f i (x 0 )).Suppose not, then there exists i ∈ [m], such that f i (x 0 + αw) ≥ f (x 0 ), ∀ α > 0, and we have ∇f i (x 0 ), w = lim inf which is a contradiction.Thus, such positive α i 's exist, and we obtain that This contradicts the assumption that x 0 ∈ Cent 0 .Therefore the cone C 0 is flat.
It can be shown that the inclusion in (25) is in fact an equality.However, since we do not need the equality here, it was left out the proof.To finish the proof of Lemma 5.3, we use the following 17 Salient cones are also called pointed cones.It is well-known (see, e.g., Section 2.6.1 in [3] If C is closed and salient, then −C * has nonempty interior.Note that, if d > 0, then {w ∈ R d : w, u = 0, ∀ u ∈ C} has measure 0 and hence {w ∈ R d : w, u < 0, ∀ u ∈ C} is nonempty and any vector in it satisfies the wanted property. Lemma 5.8.Let V = {v 1 , ..., v m } ⊂ R d be a set of vectors in general direction, then cone(V ) = R d or cone(V ) is salient.
To prove Lemma 5.8, we use the following lemma.Lemma 5.9 (see e.g.Theorem 2.5 in [12]).Let V = {v 1 , ..., v m } be a set of non-zero vectors in R d , then the following two statements are equivalent: Proof of Lemma 5.8.For m ≤ d, the vectors {v 1 , ..., v m } are linearly independent.Suppose there exists w ∈ R d , such that w, −w ∈ cone(V ).Thus there exist Since the vectors {v 1 , ..., v m } are linearly independent, a i + b i = 0 for all i ∈ [m], and thus w = 0. Therefore cone(V ) is salient.
For m > d, we prove by induction on the size of V .Suppose the result holds for any set of m ≥ d vectors in general direction.Let V = {v 1 , ..., v m+1 } be a set of m + 1 vectors in general direction.Since any d vectors in V is a basis in R d , span(V ) = R d .Suppose the result is false for V ; equivalently, cone(V ) = R d and cone(V ) is flat.By Lemma 5.9, there exists j ∈ Since cone(V ) is flat, there exists a nonzero w ∈ R d such that w, −w ∈ cone(V ), and thus w = m+1 i=1 a i v i = − m+1 i=1 b i v i , with a i , b i ≥ 0 for all i.Let us prove that a j +b j > 0. If a j +b j = 0, then a j = b j = 0. Thus, w, −w ∈ cone(V \ {v j }) and cone(V \ {v j }) is not salient.Since |V \ {v j }| = m, by the induction hypothesis, we must have cone(V \ {v j }) = R d .However, cone(V ) = R d and hence cone(V \ {v j }) = R d , a contradiction.Therefore a j + b j > 0, and we can conclude that contradicting to (26).Therefore, the result holds for any V in general direction of size |V | = m + 1.This completes the proof by induction.
We now finish the proof of Lemma 5.3.
Proof of Lemma 5.3.The first half of the proof of Lemma 5.3 is done in Lemma 5.6.To prove the second half, we combine Lemma 5.7 and Lemma 5.8 to obtain Since {f i } i∈[m] is in general position, the right hand side of (27) has measure zero, completing the proof.
6 Appendix: proofs of the main theorems and supporting lemmas 6.1 Proof of the dimension bound in Example 1.6 Proof.For any a ≤ n − 1, the point x a is ordered the last in the sequence s a = (• • • , n, a); thus, by Lemma 1.3 each such point x a cannot be in the interior of the convex hull of the other points, therefore x n ∈ conv(x 1 , . . ., x n−1 ).Assume that the embedding dimension is d ≤ n − 3, then by the Caratheodorys theorem we conclude that there exists b ∈ [n − 1], such that However, by assumptions (3) there exists a continuous quasi-convex function 3 yields a contradiction with (28).Therefore, the matrix is not embeddable in dimension d ≤ n − 3.
To prove that these sequences are embeddable in dimension d = n − 2, one can place points x 1 , ..., x n−1 to the vertices of an (n − 2)-simplex in R n−2 , and place x n to the barycenter of that simplex.By construction, {x 1 , ..., x n−1 } are convexly independent and we have for following convex hull relations for every i < n: x n / ∈ conv ({x 1 , ..., x n−1 } \ {x i }) , and x i / ∈ conv({x 1 , ..., x n } \ {x i }).Therefore, by Lemma 1.3 there exist quasi-convex continuous functions that realize the sequences in (3).

Existence and Continuity of λ i (t) for t ∈ (0, 1)
Lemma (Lemma 2.4).Let f : K → R be a continuous function with P K (f −1 ( )) = 0 for all ∈ R, where P K is a probability measure on a convex open set K and P K is equivalent to the Lebesgue measure on K. Then there exists a unique strictly increasing continuous function λ : (0, 1) → R such that, for all t ∈ (0, 1), ). Rewriting Equation ( 5) as p f (λ(t)) = t, we note that λ(t) (if exists) is the inverse of p f , proving uniqueness of λ(t).For the existence and continuity of λ(t), it suffices to prove p f is continuous and strictly increasing.
To prove p f is continuous, we prove p f is continuous from the right and from the left.Let 19 Since P K is a finite measure, taking P K on both sides, we obtain p f ( n ) p f ( ).Thus p f is continuous from the left.On the other hand, for n in f (K), from definition, ) and p f is continuous from the right.Therefore, p f is a continuous function.Now we turn to prove p f is strictly increasing.For 1 < 2 in f (K), we need to prove , where the last equality follows from openness and convexity of U 1 .Thus, there exists . Hence, p f is strictly increasing.

Proof of Interleaving Convergence Theorem
The goal of this subsection is to prove Theorem 2.9, the Interleaving Convergence Theorem.The asymptotic behavior of Dow(S(M )) actually follows from the asymptotic behaviors of several building blocks of Dow(S(M )).We will first define these building blocks and prove their own asymptotic theorems and then put these asymptotic theorems together to prove the Interleaving Convergence Theorem.
We start with an object that, as will be seen, can be used to express Dow(F, P K ).Recall that, for t ∈ [0, 1] and i ∈ [m], where Definition 6.1.For a regular pair (F, It is easy to see that R ∞ is a cumulative distribution function (CDF).We next introduce another CDF, denoted R n , which will be used as an intermediate between Dow(F, P K ) and Dow(S(M )).

An
A is similarly defined.
For those familiar with nonparametric statistics, it is easy to see that R n is in fact the empirical cumulative distribution function (empirical CDF) of R ∞ .However, R n is still not obtainable from the m × n data matrix M n = [f i (x a )] since K (i) (t i ) is not directly accessible from M n .The next definition is introduced to solve this problem by considering a step-function approximation of m,n be sampled from a regular pair (F, P K ), where For a pictorial illustration of K In the following lemma, we rewrite Rn in a form that is similar to the definition of R n , which helps build a connection between them.Lemma 6.5.Let X n = {x 1 , • • • , x n } ⊂ K be a point cloud, sampled from a regular pair, and M n be the corresponding m × n data matrix.Then, for all (t 1 , ..., t m ) ∈ [0, 1] m , For n (t i ) = K and the above equality still holds.Thus, for any (t 1 , ..., t m ) ∈ [0, 1] m , By the definition of Rn , the equality follows.
Now the intuition behind the approximations is quite clear: since n is an approximation of K (i) , by Lemma 6.5, Rn is an approximation of R n .Therefore, Rn also approximates R ∞ .
Next we connect R ∞ and Rn with our target objects Dow(S(M )) and Dow(F, P K ).For simplicity, we introduce the following convenient notations: Definition 6.6.For (t 1 , ..., t m ) ∈ [0, 1] m and σ ⊆ [m], define With these notations, we have Theorem 6.7.Let (F, P K ) be a regular pair and M n be an m × n data matrix, sampled from (F, P K ).Then, for all (t 1 , ..., t m ) ∈ [0, 1] m , we have Proof.For the first equality, recall that Dow(F, Therefore, the first equality follows.
For the second equality, recall, from Lemma 2.3, that where n (t i ).Thus, by Lemma 6.5, Using this inequality, it is now easy to obtain that R ∞ is uniformly continuous.
Now we arrive at a theorem that is key to the proof of Interleaving Convergence Theorem.In the rest of the discussion, we use w.h.p. to refer to with high probability.Namely, if we state, as n → ∞, w.h.p., a sequence (A n ) ∞ n=1 of events holds, then this means that, as n → ∞, the probability Pr[A n ] approaches 1. Theorem 6.9.(1st Asymptotic Theorem) The sup-norm R n − R ∞ ∞ converges to 0 in probability.In other words, for any > 0, For the proof, we recall an intuitive fact from probability theory: Proof of Theorem 6.9.
In other words, I : K → {0, 1} is a function defined by Notice that, since (K, P K ) is a probability space (with Borel σ-algebra), I is a random variable.Moreover, by Definition 6.2, if I 1 , ..., I n are i.i.d copies of I, then Thus we have obtained the pointwise convergence version of the result.
To prove uniform convergence, consider the following: By Lemma 6.8, R ∞ is uniformly continuous.Thus there exists δ > 0 such that, for all (t 1 , ..., t m ) and (t 1 , ..., t m ) with max i∈ Subdivide [0, 1] m into finitely many (m-dimensional) rectangles of sides shorter than δ.Let V be the collection of all vertices of all rectangles in the subdivision.Since V is a finite set, by the above pointwise result, as n → ∞, w.h.p., In other words, We now claim that Equation (30) implies Let t = (t 1 , ..., t m ) be an arbitrary element in [0, 1] m .Then t lies in some small rectangle of the subdivision.Let t 1 and t 0 be the unique maximum and minimum, respectively, in the rectangle.Then Rescaling 2 to , the uniform result follows.
For people familiar with non-parametric statistics, it is immediate that Theorem 6.9 is a natural m-dimensional generalization of the standard Glivenko-Cantelli theorem20 under specific conditions.
Recall that, for x ∈ K and i ∈ [m], T i (x) Lemma 6.10.For all > 0, as n → ∞, w.h.p., where K (i) (t) def = ∅ for t < 0 and to be Proof.For t = 1, K . By monotonicity of K (i) , it suffices to prove that, w.h.p., In the last expression, by Theorem 6.9, w.h.p., the first term is less than 2 , not depending on t, and the second term is less than 2 by our choice of sufficient large n.Thus, w.h.p., |T i (x a+1 ) − t| < , for all t ∈ [0, 1] and the result follows.
Corollary 6.11.For all > 0, as n → ∞, w.h.p., where, for the variables of R n , negative inputs are automatically replaced by 0 and inputs greater than 1 are automatically replaced by 1.
Motivated from Theorem 6.7, the key to prove the Interleaving Convergence Theorem is the zero sets of R ∞ , R n , and Rn , explicitly defined below.Definition 6.12.Let R ∞ , R n , and Rn be defined as in Definition 6.1, 6.2, and 6.4.Define the following subsets of [0, 1] m : For Z ⊆ R m and > 0, define Note that, since R ∞ , R n , and Rn are all monotone, Z ∞ , Z n , and Ẑn are closed under lower parital order; namely, for Z = Z ∞ , Z n or Ẑn , if t * ∈ Z, then t ∈ Z for all t ≤ t * .Lemma 6.13.Let Z ∞ , Z n , and Ẑn be defined as in Definition 6.12.Then, for all > 0, as n → ∞, w.h.p., (i) Z n ⊆ Ẑn + and Ẑn ⊆ Z n + .
We are now able to prove Theorem 2.9.
Theorem.(Theorem 2.9, Interleaving Convergence Theorem) Let M n ∈ M o m,n be sampled from a regular pair (F, P K ).Then, for all > 0, as n → ∞, Proof.Let > 0. We need to prove: as n → ∞, w.h.p.
6.4 Proof of Theorem 3.10, the convergence of L k (M n ) to L k (F, P K ) In this subsection, we state the well-known Isometry Theorem in topological data analysis and use it to prove Theorem 3.10.We begin with the definition of a quadrant-tame persistence module.Definition 6.15 (Definition 1.12 in [11]).A persistence module V = (V i , v j i ) over R is quadranttame if rank v j i < ∞ for all i < j.
Theorem 6.16 (Isometry Theorem, Theorem 3.1 in [11]).Let V, W be quadrant-tame persistence modules over R. Then where d b is the bottleneck distance between persistence diagrams and d i is the interleaving distance between persistence modules.
Notice that, throughout the paper, all simplicial complexes are subcomplexes of 2 [m] and hence all vector spaces in the persistence modules we consider are finite dimensional and thus quadranttame.Therefore, we have the Isometry Theorem available.In the rest of this section, the proof of Theorem 3.10 is broken into several lemmas based on some newly developed tools.Since the presentation is in logical order instead of the order of ideas, we give a quick overview of how they are pieced together.
The central observation throughout the proof is Lemma 6.21, which writes both L k (F, P K ) and L k (M n ) in terms of double supremum expressions.Notice that their expressions only differ in Dow(F, P K ) and Dow(S(M n )), and in T (F, P K ) + and T (M n ) + , whcih are introduced in Definition 6.18 and Definition 6.17.
With this in mind, it is easy to see that a result that bounds the variation of the double supremum expression when Dow(F, P K ) is replaced by Dow(S(M n )) is needed, which is exactly Lemma 6.23.Similarly, a result that bounds the variation of the double supremum expression when T (F, P K ) + is replaced by T (M n ) + is also needed, which is Lemma 6.22.We still need to justify the applicability of Lemma 6.23 and Lemma 6.22, respectively, which corresponds to the Interleaving Convergence Theorem (Theorem 2.9) and Lemma 6.19, respectively.Now the pieces can be connected and combined to complete the proof of Theorem 3.10.Notice that Isometry Theorem (Theorem 6.16) is lurking in the proofs of Lemma 6.23 and Lemma 6.22 and thus playing an important role in the proof of Theorem 3.10.Definition 6.17.Let T ⊆ [0, 1] m .Define the set of diagonal rays of T , denoted T + , by where ray Definition 6.18.Let (F, P K ) be a regular pair and M n ∈ M o m,n be sampled from (F, P K ).Define the following two subsets of [0, 1] m : Recall that the Hausdorff distance between two subsets S 1 , S 2 of R m is defined as where B(0, ) is the -ball in R m centered at 0 and + inside the inf is the operation of Minkowski sum.In the next lemma, we prove that T (M n ) approximates T (F, P K ) in Hausdorff distance.
Lemma 6.19.Let M n ∈ M o m,n be sampled from a regular pair (F, P K ).Then, as n → ∞, Proof.Recall that, for each i ∈ [m], T i = φ i • f i , where φ i is a monotone increasing function.Since there is no measure jump in a regular pair (i.e.P K (f −1 i ( )) = 0 for all i ∈ [m], ∈ R), each φ i is continuous and so is each T i .Since each f i can be extended continuously to K, we also have each T i continuously extendable to K. Since K is compact, the function (T 1 , ..., T m ) : K → R m is uniformly continuous.
Let > 0. We need to prove, as n → ∞, w.h.p., By uniform continuity, there exists δ > 0 such that, for all x, y ∈ K with x − y 2 ≤ δ, Let X n = {x 1 , ..., x n } be a sample of size n, i.i.d.from (F, P K ).Let us prove that, as n → ∞, w.h.p., K ⊆ X n + B(0, δ) Since K is bounded, we may cover K by finitely many small rectangles of diameters smaller than δ, where each rectangle intersects K and the rectangles intersect each other only on their boundaries.
In the following, we develop the convention of restricting a multi-filtered complex to a diagonal ray as defined in Definition 6.17.Definition 6.20.Let T ⊆ [0, 1] m and K = {K(T ) = K(t 1 , ..., t m )} T ∈R m be a multi-filtered complex indexed over R m with K(T ) ⊆ 2 [m] for all T ∈ R m .For T = (t 1 , ..., t m ) ∈ T , let ray T be as in Definition 6.17.Define the restriction of K to ray T as the 1-dimensional filtered complex K| ray T = {K| ray T (t)} t , indexed over t ∈ [0, max i∈[m] t i ], by Since we usually need to use interleaving distance to compare two filtered complexes, we extend the indexing set of K| ray T to R by With these conventions, we state the following lemma: Lemma 6.21.For each k ∈ {0} ∪ N, Proof.The first equality follows from Equation ( 9) in Definition 3.7 and the second equality can be obained from Equation (15) in Definition 3.9.
The next lemma will be used to connect Lemma 6.19 and Lemma 6.21.
Proof.For any constant η 1 > 0, we may choose ray 1 ∈ (T 1 ) + such that sup Let T 1 be the element in T 1 that ray For any constant η 2 > 0, there exists (a By Equation (44), there exists (a Combining Equation ( 43), ( 45) and (46), we obtain sup Since η 1 , η 2 > 0 are arbitrary, we obtain sup Reversing the role of T 1 and T 2 , we may obtain the other direction, completing the proof.
Lemma 6.23.Let K and L be multi-filtered complexes indexed over Reversing the role of K and L, the other direction can be otained, completing the proof.
With all the above lemmas, we may now present a rigorous proof of Theorem 3.10.Let us restate Theorem 3.10 for easy reference.
Theorem (Theorem 3.10).Let M n ∈ M o m,n be sampled from a regular pair (F, P K ).Assume that K is bounded and each f i can be continuously extended to the closure K.Then, for all k ∈ {0} ∪ N, as n → ∞, L k (M n ) converges to L k (F, P K ) in probability; namely, for all > 0, Moreover, the rate of convergence is independent of k. (51) Hence, combining Equation (50) and Equation (51), the result follows.
Proof.Let n be a sequence with n 0. Let us first prove that K (i) (t i − n ) K (i) (t i ); equivalently, K (i) (t i ) = ∞ n=1 K (i) (t i − n ).For any n, K (i) (t i − n ) ⊆ K (i) (t i ) by definition.Therefore, For the other inclusion, assume x ∈ K (i) (t i ) = f −1 (−∞, λ i (t i )).Then f i (x) < λ i (t i ).By Lemma 2.4, λ i is continuous and strictly increasing.Hence, there exists n such that λ i (t i − n ) > f i (x); in other words, x ∈ K (i) (t i − n ).Therefore, x ∈ ∞ n=1 K (i) (t i − n ), proving the claim.Now we have, as n ∞, K (i) (t i − n ) K (i) (t i ).Thus i∈σ K (i) (t i − n ) i∈σ K (i) (t i ).Since i∈σ K (i) (t i ) = ∅, there must exist n such that i∈σ K (i) (t i − n ) = ∅.Taking = n , the result follows.

Cent 1 is open
This subsection is devoted to the proof of openness of Cent 1 .Proof.Define a function h : K × S d−1 → R by h(x, u) = max i∈[m] u, ∇f i (x) .Since each f i is C 1 , the functions (x, u) → u, ∇f i (x) are continuous and hence h is also continuous.For x ∈ K, we define ρ(x) = min u∈S d−1 h(x, u).Let us prove that, for x ∈ K, ρ(x) > 0 if and only if cone({∇f i (x)} i∈[m] ) = R d .
For one direction, let x ∈ K satisfy ρ(x) > 0, or, equivalently, max i∈[m] u, ∇f i (x) > 0, for all u ∈ S d−1 .If 0 ∈ bd(conv({∇f i (x)} i∈[m] )), then the nonzero vector v pointing outward of conv({∇f i (x)} i∈[m] ) and orthogonal to the hyperface containing 0 will make max i∈ By what has been claimed, this is equivalent to ρ(x 0 ) > 0. We want to prove that there exists > 0 such that for all x ∈ B(x 0 , ), cone({∇f i (x)} i∈[m] ) = R d , or equivalently, ρ(x) > 0. Suppose not, then there exists a sequence (x n , u n ) ∈ K × S d−1 such that x n → x 0 and h(x n , u n ) ≤ 0 for all n.By compactness of S d−1 , there is a subsequence u n j → u 0 and thus by continuity of h, h(x 0 , u 0 ) ≤ 0. However, h(x 0 , u 0 ) ≥ min u∈S d−1 h(x 0 , u) = ρ(x 0 ) > 0, a contradiction.Thus the proof is complete.
6.7 Proof of Theorem 5.5 Throughout this subsection, Cent 0 is as defined in Definition 5.1, Cent 0 (M n ) is as defined in Definition 5.4, and Ẑn and Z ∞ are as defined in Definition 6.12.The following two functions play a crucial role throughout the proof of Theorem 5.5.In order to prove Theorem 5.5, we first prove the following lemmas (Lemma 6.27 -Lemma 6.29).Proof.To prove (i), To prove (ii), (by Lemma 6.28) ≤ P K (Cent 0 ) + /2 (by Lemma 6.29).

Figure 1 :
Figure 1: The activities of three different experimentally recorded place cells in a rat's hippocampus.The color represents the probability of each neuron's firing as a function of the animal's location.

Definition 2 . 1 .
Let I = i∈[m] I i be an m-orthotope in R m , where each I i is an interval (open, closed, half-open, finite, or infinite are all allowed) in R. Let ≤ be the natural partial order on I induced from R m .A multi-filtered simplicial complex D indexed over I is a collection {D α } α∈I of simplicial complexes on a fixed finite vertex set, such that, D α ⊆ D β , for all α ≤ β in I.

Theorem 3 . 4 .
Let M = {K t } t∈[0,T ] be a persistence module over a closed interval [0, T ].If, for each t ∈ [0, T ], M t is a finite dimensional vector space over F, then M can be decomposed as a direct sum of interval modules; namely, M = J I J where {J} is a collection of some intervals (could be open, closed, or half-open) in [0, T ].The decomposition is unique in the sense that, for every such decomposition, the collection of intervals is the same.Each interval J in the decomposition stated in Theorem 3.4 is called a persistence interval of M. We may summarize all persistence intervals as a 2D diagram in [0, T ] × [0, T ], called the persistence diagram of M: for each persistence interval with left end α and right end β, we mark a point (α, β) in [0, T ] × [0, T ].The diagram consisting of all such points is called the persistence diagram of M, denoted dgm(M).Rigorously speaking, we should distinguish open, closed, and half-open intervals.For our purpose, we only use the lengths of the persistence intervals, and hence the distinction of open, closed, and half-open intervals does not really matter.

Figure 4 :
Figure 4: From left to right, the filtration K x (t), as the nerve of the sublevel sets of {f i } i∈[m] , starts at t = 0 as the empty simplicial complex and increases as t goes up to t max (x), where the sublevel sets of {f i } i∈[m] touch the point x on their boundaries.The formal formulation of the process is in (8) of Definition 3.7.

Corollary 3 . 12 .
Let (F, P K ) be a regular pair satisfying the conditions in Theorem 3.10 and M n ∈ M o m,n be sampled from (F, P K ).Denote d low = d low (F, P K ).Then, for all 0 < < L d low −1 (F, P K ), lim n→∞ Pr dlow (M n , ) = d low (F, P K ) = 1.(17) Proof.For notational simplicity, in this proof, we denote d = d(F, P K ).By Lemma 3.8 and Theorem 3.10, for k ≥ d, L k (M n ) → 0 in probability and L d low −1 (M n ) → L d low −1 (F, P K ) > 0 in probability, as n → ∞, with the same rate of convergence.Since 0 < < L d low −1 (F, P K ), as n → ∞, w.h.p., L d low −1 (M n ) > and L k (M n ) < .Therefore, w.h.p., dlow (M n , ) = d low , and the result follows.

Algorithm 1
Computation of KaINPUTS:(1) M : an m × n real matrix without repeated values on each row; namely, a matrix in M o m,n .(2) a: an integer in [n], referring to the a-th column of M .

Figure 5 :
Figure 5: The four panels are boxplots of L k obtained from subsampling the points.Throughout the panels, m = 10 and n = 350, where m is the number of functions and n is the number of total sample poionts.The panels correspond to (a) d = 3, (b) d = 4, (c) d = 5, where d = d(F, P K ) is the dimension of (F, P K ).The functions are chosen to be random quadratic functions defined on the unit d-ball in R d .Panel (d) is obtained by computing the L k 's of an m × n matrix M n with entries i.i.d.from Unif(0,1), which is treated as a purely random matrix and whose main purpose is for comparison with other panels.Each figure in each panel is generated by subsampling n s = 50, 150, 200 columns of M n , repeated 100 times.By the decision principle, every figure in panel (a) successfully infer their respective true dimensions; in panel (b), n s = 50 fails to infer the true dimension 4 but only infers a lower bound 3 while n s = 150, 200 successfully infer the true dimension 4; in panel (c), both n s = 50, 150 fail to infer the true dimension 5 but only infer a lower bound 4 while n s = 200 successfully infer the true dimension 5.In panel (d), the figure has a quite different behavior.13 panel (b) where d = 4, the same shrinking variance behavior can be observed.Moreover, the principle concludes d ≥ 4 after n s = 150, where the first quartile starts to stay away from 0. Similarly, for panel (c) where d = 5, in n s = 50 and n s = 150, our principle concludes d ≥ 4 and in n s = 200, it concludes d ≥ 5.

Figure 6 :
Figure 6: Boxplots of L k obtained from subsampling functions.Throughout the panels, m = 60, n = 150 and m s = 10, where m is the number of functions (rows of M n ), n is the number of total sample points and m s is the number of functions used in each function subsampling.Each panel is generated under a fixed regular pair with (a) d = 2, (b) d = 3, (c) d = 4, (d) d = 5, where d is the true dimension of the regular pair.Function subsampling is repeated 1000 times for each panel.

Figure 7 :
Figure 7: An illustration of K

n
(t) defined in Definition 6.3.Definition 6.2.For a point cloud X n ⊂ K of size n, sampled from a regular pair, we define a function R n : [0, 1] m → [0, 1] by

n
(t), please refer to Figure7.Notice that there is a subscript n in K

n
(t), indicating its dependence on the sampled matrix M n .The object in the next definition is obtainable solely from the data matrix M n , sampled from a regular pair.Definition 6.4.Let M n = [M ia ] be an m × n data matrix, sampled from a regular pair.Define a function Rn :

Lemma 6 .
25. Let {fi : K → R} i∈[m] be a collection of quasi-convex C 1 functions, where K is open convex in R d .Then the set Cent 1 = x ∈ K : cone({∇f i (x)} i∈[m] ) = R d is open in K.In particular, Cent 1 = ∅ is equivalent to P K (Cent 1 ) > 0.