A topological approach to inferring the intrinsic dimension of convex sensing data

Wu, Min-Chun; Itskov, Vladimir

doi:10.1007/s41468-021-00081-3

A topological approach to inferring the intrinsic dimension of convex sensing data

Open access
Published: 11 November 2021

Volume 6, pages 127–176, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Applied and Computational Topology Aims and scope Submit manuscript

A topological approach to inferring the intrinsic dimension of convex sensing data

Download PDF

Min-Chun Wu¹ &
Vladimir Itskov¹

2460 Accesses
Explore all metrics

Abstract

We consider a common measurement paradigm, where an unknown subset of an affine space is measured by unknown continuous quasi-convex functions. Given the measurement data, can one determine the dimension of this space? In this paper, we develop a method for inferring the intrinsic dimension of the data from measurements by quasi-convex functions, under natural assumptions. The dimension inference problem depends only on discrete data of the ordering of the measured points of space, induced by the sensor functions. We construct a filtration of Dowker complexes, associated to measurements by quasi-convex functions. Topological features of these complexes are then used to infer the intrinsic dimension. We prove convergence theorems that guarantee obtaining the correct intrinsic dimension in the limit of large data, under natural assumptions. We also illustrate the usability of this method in simulations.

Banach Space Theory

Theorems of Carathéodory, Helly, and Tverberg Without Dimension

Article 19 June 2020

Optimal Design of Sensors via Geometric Criteria

Article Open access 23 May 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Data in many scientific applications are often obtained by “sensing” the phase space via sensors/functions that are convex. Convex sensing is a class of problems of inferring the underlying geometry of data that was sampled via such functions. To be precise, let us recall the following

Definition 1.1

Let $K\subseteq {\mathbb {R}}^d$ be open convex. A function $f:K\rightarrow {\mathbb {R}}$ is quasi-convex if each sublevel set $f^{-1}(-\infty ,\ell )=\{ x \in K \, \vert \, f(x)<\ell \}$ is convex or empty, for all $\ell \in {\mathbb {R}}$.

The following is perhaps the shortest, albeit naive, formulation of a convex sensing problem. A collection of n points $X=\{x_a\}_{a=1}^n$ in an open convex region $K\subset {\mathbb {R}}^d$ is sensed by measuring the values of m sensors, i.e. continuous quasi-convex functions ${\mathcal {F}}=\{f_i:K\rightarrow {\mathbb {R}}\}_{i=1}^m$ . Suppose that one has access only to the $m\times n$ data matrix $M = [M_{ia}]$ of sensor values, where

$$\begin{aligned} M_{ia} = f_i(x_a), \end{aligned}$$

(1)

but does not have direct access to the information about the dimension d of the underlying space, the open convex region K, the points $x_a\in K$, or any further details of the quasi-convex functions $f_i$. Can one recover any geometric information about the sampled region K? At the very minimum, can one infer the dimension d?

1.1 Motivation from neuroscience

While the convex sensing problems may be not uncommon in many scientific applications, our chief motivation comes from neuroscience. Neurons in the brain regions that represent sensory information often possess receptive fields. A paradigmatic example of a receptive field is that of a hippocampal place cell (O’Keefe and Dostrovsky 1971). Place cells are a class of neurons in rodent hippocampus that act as position sensors. Here the relevant stimulus space $K\subset {\mathbb {R}}^d$ is the animal’s physical environment (Yartsev and Ulanovsky 2013), with $d\in \{1,2,3\}$, and $x\in K $ is the animal’s location in this space. Each neuron is activated with a certain probability that is a continuous function $f:K\rightarrow {\mathbb {R}}_{\ge 0}$ of the animal’s position in space. In other words, the probability of a single neuron’s activation at a time t is given by $p(t)=f(x(t))$, where x(t) is the animal’s position.

For each place cell, the function f is called its place field. In neuroscience experiments, place fields are observed to be approximately quasi-concave^{Footnote 1}; in other words, a place field can be approximated by a quasi-concave function (see Fig. 1). Fore example, in Fig. 1, quasi-concavity is slightly violated in cells 1 and 3, as the superlevel sets bounded by the light blue level are not perfectly convex. Note that the violations of quasi-concavity occur at the levels where the function values are close to zero. Regardless of the physical reasons of why the place fields have this property, such violations are often thought to be a result of the intrinsic noisiness of the neurons as well as the measurement noise. Place fields can be easily computed when both the neuronal activity data and the relevant stimulus space are available. A number of other classes of sensory neurons in the brain also possess quasi-concave receptive fields, that is, each such neuron responds with respect to a quasi-concave probability density function $f:K\rightarrow {\mathbb {R}}_{\ge 0}$ on the stimulus space.

In many situations, unlike place cells, the relevant stimulus space for a given neural population is unknown. This raises a natural question: can one infer the dimension of the stimulus space with quasi-concave receptive fields from neural activity alone? More precisely, given the neural activity of m neurons with quasi-concave receptive fields $f_i: K \rightarrow {\mathbb {R}} $, can one “sense” the stimulus space by sampling the neuronal activity at n moments of time as $M_{ia} =f_i(x(t_a))$? Here one has access only to the measurements $M_{ia}$, but not the points $x_a=x(t_a)$ in the domain K. The above natural question motivates our naive formulation of the convex sensing problem, as introduced in the beginning of Sect. 1.

Here we would like to emphasize that, for the case of the experimentally observed place cells (Fig. 1), we have the perfect knowledge of the geometry of the stimulus space K (the rectangular environment), the coordinates of the stimuli $x_a=x(t_a)$, and the functions $f_i:K\rightarrow {\mathbb {R}}$. In contrast, in the convex sensing problem, we only assume access to the $m\times n$ matrix $M = [M_{ia}] = [f_i(x_a)]$, recording the function values on the stimuli.

1.2 The geometry of convex sensing data

Convex sensing problems possess a natural transformation group. If $\phi :{\mathbb {R}} \rightarrow {\mathbb {R}} $ is a strictly monotone-increasing function,^{Footnote 2} then the sub-level sets of the composition $\phi \circ f $ and f are identical up to an order-preserving re-labeling. Thus, if $\phi $ is a strictly monotone-increasing function, then f is quasi-convex if and only if $\phi \circ f $ is quasi-convex. It is easy to show that two sets of real numbers have the same ordering, that is, $a_1<a_2<\cdots < a_n$ and $b_1<b_2<\cdots < b_n$ if and only if there exists a strictly monotone function $\phi :{\mathbb {R}} \rightarrow {\mathbb {R}} $, such that $b_i=\phi (a_i)$ for all i. It thus follows that it is only the total order of each row in the matrix M in Eq. (1) that constrain the geometric features of the point cloud $X_n = \{x_1,\ldots , x_n\}$ in a convex sensing problem. This motivates the following definition.

Definition 1.2

Let V be a finite set. A sequence of length k in V is a k-tuple $s = (v_1,\ldots , v_k)$ of elements in V without repetitions. We denote by $S_k[V]$ the set of all sequences of length k on V.

If M is an $m\times n$ real matrix that has distinct entries in each row, then each row yields a sequence of length n. For the sake of an example, consider a real-valued matrix

$$\begin{aligned} M = \begin{bmatrix} 8.23 &{} 4.19 &{} 2.56 &{} 3.96\\ 4.78 &{} 2.88 &{} 5.76 &{} 13.43 \end{bmatrix}. \end{aligned}$$

Since the first row has the ordering $2.56<3.96<4.19<8.23$, the total order $<_1$ on $ V= \{1, 2, 3, 4\}$ is $ 3<_1 4<_1 2<_1 1 $. Thus, the order sequence for the first row is $s_1 = (3,4,2,1)\in S_4[V]$. Similarly, the order sequence for the second row is $s_2 = (2,1,3,4)$.

It is easy to see that if the sampled points $X_n$ and the quasi-convex functions $\{f_i\}_{i\in [m]}$ satisfy some regularity assumptions,^{Footnote 3} then each row of the data matrix $M_{ia} = f_i(x_a)$ has no repeated values with probability 1. We denote the set of all such data matrices as

$$\begin{aligned} {\mathcal {M}}_{m,n}^o{\mathop {=}\limits ^{\mathrm {def}}}\{m\times n\text { real-valued matrices with no repeated entries in each row}\}. \end{aligned}$$

For $M = [M_{ia}]\in {\mathcal {M}}_{m,n}^o$, one can define a collection S(M) of m maximal-length sequences as $S(M)=\{s_1,\dots , s_m\}$, where each sequence,^{Footnote 4}$s_i\in { S_n[n] } $,

$$\begin{aligned} s_i = (a_{i1},\ldots , a_{in}) \end{aligned}$$

is obtained from the total order of the i-th row:

$$\begin{aligned} M_{ia_{i1}}< M_{ia_{i2}}<...< M_{ia_{in}}. \end{aligned}$$

The following observation makes it possible to re-state any convex sensing problem purely in terms of embedding a set of points that satisfy certain convex hull non-containment conditions. Let ${\mathrm{conv}}(x_1,\ldots , x_k)$ denote the convex hull of a collection of points $x_1,\ldots , x_k$ in ${\mathbb {R}}^d$.

Lemma 1.3

For any collection of n distinct points $ \{x_1,x_2,\dots , x_n\}\subset {\mathbb {R}}^d$, the following statements are equivalent:

(i)
There exists a continuous quasi-convex function $f:{\mathbb {R}}^d\rightarrow {\mathbb {R}}$ such that
$$\begin{aligned} f (x_{1})<f (x_{2})<\cdots <f (x_{ n}), \end{aligned}$$
(2)
(ii)
For each $k = 2,\ldots , n$, $x_{ k }\notin {\mathrm{conv}}(x_{ 1 },\ldots , x_{ k-1 })$.

Proof

The implication $(i)\!\!\implies \!\! (ii)$ follows from Definition 1.1. To prove that $(ii)\!\!\implies \!\! (i)$, denote $C_k = {\mathrm{conv}}(x_{ 1 },\ldots , x_{ k })$, $d_k(x){\mathop {=}\limits ^{\mathrm {def}}}{\mathrm{dist}}(x,C_k)$ for any $k=1,\dots n$, and define $f (x) = \sum _{k=1}^n h_k\cdot d_k(x)$, where $h_1 = 1$ and

$$\begin{aligned} h_k{\mathop {=}\limits ^{\mathrm {def}}}1+ \frac{1}{d_k(x_{ k+1 })} \max \left\{ \sum _{j = 1}^{k-1} h_j\left( d_j\left( x_{ k }\right) -d_j\left( x_{ k+1 }\right) \right) , 0 \right\} , \text { for } k\ge 2. \end{aligned}$$

Note that (ii) implies that $d_k(x_{ k+1 })>0$ for all $k\ge 2$. Recall that, for any convex set $C\subset {\mathbb {R}}^d$, the function $x\mapsto {\mathrm{dist}}(x,C) $ is continuous and convex.^{Footnote 5} Thus, since $h_k $ are positive, f(x) is a continuous convex^{Footnote 6} (and thus quasi-convex) function. Moreover, $ f(x_{ 1 }) =0 < {\mathrm{dist}}(x_{ 2 },x_{ 1 })= f (x_{ 2 }) $, and

$$\begin{aligned} h_k>\frac{1}{d_k(x_{ k+1 })} \sum _{j = 1}^{k-1} h_j\left( d_j\left( x_{ k }\right) -d_j\left( x_{ k+1 }\right) \right) \text { for } k\ge 2. \end{aligned}$$

The last inequality is equivalent to $f(x_{ k+1 })>f(x_{ k })$. Thus inequalities (2) hold. $\square $

Corollary 1.4

A matrix $M=[M_{ia}]\in {\mathcal {M}}_{m,n}^o$ can be obtained as $M_{ia} =f_i(x_a)$ from a collection of m continuous quasi-convex functions $ f_i:{\mathbb {R}}^d \rightarrow {\mathbb {R}}$ and n points $ x_1, \dots , x_n\in {\mathbb {R}}^d$ if and only if there exist n points $ x_1, \dots , x_n\in {\mathbb {R}}^d$ such that, for each sequence $s=(a_1, a_2,\dots , a_n)\in S(M)$ and each $k = 2,\ldots , n$, $x_{ a_k }\notin {\mathrm{conv}}(x_{ a_1 },\ldots , x_{ a_{k-1} })$.

We recall that a set of points is convexly independent if none of these points lies in the convex hull of the others. An important implication of Corollary 1.4 is the following

Corollary 1.5

For every matrix $M\in {\mathcal {M}}_{m,n}^o$ and convexly independent points $x_1,x_2,\dots , x_n \in {\mathbb {R}}^d$, there exist m continuous quasi-convex functions $f_i:{\mathbb {R}}^d\rightarrow {\mathbb {R}}$ such that $M_{ia} = f_i(x_a)$.

Note that any n distinct points in ${\mathbb {R}}^2$ on the unit circle $S^1$ are convexly independent. Therefore, by Corollary 1.5, for any matrix $M\in {\mathcal {M}}_{m,n}^o$, any choice of n distinct points $X_n\subset S^1$ along with appropriately chosen quasi-convex functions yields a two-dimensional solution to the “naive” convex sensing problem on M.

The situation where all the sampled points are convexly independent is highly improbable, for large n. If one explicitly excludes this situation, then the combinatorics of S(M) constrains the minimal possible dimension d of the geometric realization, as illustrated by the following example.

Example 1

Let $n>2$, and $M\in {\mathcal {M}}_{n-1,n}^o$ be a matrix obtained as in Eq. (1) with continuous quasi-convex functions $f_i$, whose $(n-1)$ sequences $S(M)=\{s_1,s_2,\dots , s_{n-1}\} $ are of the form

$$\begin{aligned} s_i = (\cdots , n, i), \quad \text {for all } i\in [n-1], \end{aligned}$$

(3)

where each of the “$\cdots $” in $s_i$ is an arbitrary permutation of $[n]\setminus \{n, i\}$. Assume that at least one point of $ X_n = \{x_1,\ldots , x_n\}\subset {\mathbb {R}}^d$ is contained in the interior of the convex hull ${\mathrm{conv}}(X_n)$, then the dimension in which M can be obtained as in Corollary 1.4 is $d= n-2$.

Proof

For any $a\le n-1$, the point $x_a$ is ordered the last in the sequence $s_a=(\cdots , n,a) $; thus, by Lemma 1.3 each such point $x_a$ cannot be in the interior of the convex hull of the other points, therefore $x_n\in {\mathrm{conv}}(x_1, \dots , x_{n-1})$. Assume that the embedding dimension is $d\le n-3$, then by the Caratheodory’s theorem we conclude that there exists $b\in [n-1]$, such that

$$\begin{aligned} x_n\in {\mathrm{conv}}(x_1,\ldots , {\hat{x}}_b,\ldots , x_{n-1}). \end{aligned}$$

(4)

However, by conditions in Eq. (3) there exists a continuous quasi-convex function $f_b$ such that $f_b(x_a)< f_b(x_{n})$ for all $a\in [n-1]\setminus \{ b\}$, thus Lemma 1.3 yields a contradiction with (4). Therefore, the matrix is not embeddable in dimension $d\le n-3$.

To prove that these sequences are embeddable in dimension $d=n-2$, one can place points $x_1,\ldots , x_{n-1}$ to the vertices of an $(n-2)$-simplex in ${\mathbb {R}}^{n-2}$, and place $x_{n}$ to the barycenter of that simplex. By construction, $\{x_1,\ldots , x_{n-1}\}$ are convexly independent and we have for following convex hull relations for every $i<n$: $x_n \notin {\mathrm{conv}}\left( \left\{ x_1,\ldots , x_{n-1}\right\} \setminus \{x_i\}\right) $ , and $x_i \notin {\mathrm{conv}}(\{x_1,\ldots , x_{n}\}\setminus \{x_i\})$. Therefore, by Lemma 1.3 there exist quasi-convex continuous functions that realize the sequences in (3). $\square $

1.3 Dimension inference in a convex sensing problem

It is clear from Corollary 1.5 and Example 1, that the problem of dimension inference is well-posed only in the presence of some additional assumptions that guarantee convex dependence of the sampled points. Instead of making such an assumption explicit, we take a probabilistic perspective, wherein points are drawn from a probability distribution that is equivalent to the Lebesgue measure. We assume that there are three objects (which are unknown) that underly any “convex sensing” data:

(i)
an open convex set $K\subseteq {\mathbb {R}}^d$,
(ii)
m quasi-convex continuous functions ${\mathcal {F}}=\{f_i:K\rightarrow {\mathbb {R}}\}_{i=1}^m$, and
(iii)
a probability measure $P_K$ on K.

In relation to the neuroscience motivation in Sect. 1.1, $K\subseteq {\mathbb {R}}^d$ is the stimulus space, each function $f_i$ is the negative of the receptive field of a neuron, and $P_K$ is the measure that describes the probability distribution of the stimuli. To guarantee that the convex sensing problem is well-posed, we impose the following regularity assumptions.

Definition 1.6

A regular pair is a pair $({\mathcal {F}}, P_K)$ that satisfies the conditions (i)-(iii) above as well as the following two conditions:

(R1)
The probability measure $P_K$ is equivalent to the Lebesgue measure on K.
(R2)
Level sets of all functions in $ {\mathcal {F}}$ are of measure zero, i.e. for every $i\in [m]$ and $\ell \in {\mathbb {R}}$, $P_K(f_i^{-1}(\ell ))=0$.

Definition 1.7

A point cloud $\{x_1,\ldots , x_n\}\subset K$ is sampled from a regular pair $({\mathcal {F}}, P_K)$ if it is i.i.d. from $P_K$. A matrix $M=[M_{ia}]\in {\mathcal {M}}_{m,n}^o$ is sampled from a regular pair $({\mathcal {F}}, P_K)$, if for all $i\in [m]$, and $a\in [n]$, $M_{ia} =f_i(x_a), $ where $\{x_1,\ldots , x_n\}\subset K $ is sampled from $({\mathcal {F}}, P_K)$.

The assumption (R1) ensures that the domain K is well-sampled, and thus the probability that the points $ x_1,\ldots , x_n$ are convexly independent approaches zero in the limit of large n. The assumption (R2) guarantees, with probability 1, that the data matrix M has no repeated values in each row, and thus is in $ {\mathcal {M}}_{m,n}^o$.

In this paper, we develop a method for estimating the dimension of convex sensing data. Intuitively, such an estimator needs to be consistent, i.e. “behave well” in the limit of large data. In addition to the conditions imposed on a regular pair, other properties of a pair $({\mathcal {F}}, P_K)$ may be needed, depending on the context. It is therefore natural to define a consistent dimension estimator in relation to a particular class of regular pairs. Since an estimator may rely on different parameters for different regular pairs, we consider a one-parameter family of such estimators, motivating the following definition of consistency.

Definition 1.8

Let ${\mathcal {RP}}$ be a class of regular pairs. For each regular pair ${({\mathcal {F}},P_K)}\in {\mathcal {RP}}$ we denote by $d {({\mathcal {F}},P_K)}$ the dimension d, where the open convex set $K\subseteq {\mathbb {R}}^d$ is embedded. A one-parameter family of functions ${{\hat{d}}}(\varepsilon ) :{\mathcal {M}}_{m,n}^o\rightarrow {\mathbb {N}}$ is called an asymptotically consistent estimator in ${\mathcal {RP}}$, if for every regular pair ${({\mathcal {F}},P_K)}\in {\mathcal {RP}}$, there exists $l>0$, such that for every $\varepsilon \in (0,l)$ and each sequence of matrices $M_n \in {\mathcal {M}}_{m,n}^o $, sampled from ${({\mathcal {F}},P_K)}$,

$$\begin{aligned} \lim _{n\rightarrow \infty } P\left( {{\hat{d}}}\left( \varepsilon \right) \left( M_n\right) = d {({\mathcal {F}},P_K)}\right) =1. \end{aligned}$$

(5)

The structure of this paper is as follows. In Sect. 2, we define two multi-dimensional filtrations of simplicial complexes: the empirical Dowker complex ${\mathrm{Dow}}(S(M))$ that can be associated to a data matrix M, and the Dowker complex ${{\mathbb {D}}{\mathrm{ow}}}{({\mathcal {F}},P_K)}$, that can be associated to a regular pair ${({\mathcal {F}},P_K)}$. Using an interleaving distance between multi-filtered complexes, we prove (Theorem 2.10) that for a sequence $\{ M_n\}$ of data matrices, sampled from a regular pair ${({\mathcal {F}},P_K)}$, ${\mathrm{Dow}}(S(M_n))\rightarrow {{\mathbb {D}}{\mathrm{ow}}}{({\mathcal {F}},P_K)}$ in probability, as $n\rightarrow \infty $.

In Sect. 3, we develop tools for estimating the dimension of ${({\mathcal {F}},P_K)}$ using persistent homology. We define a set of maximal persistent lengths associated to ${{\mathbb {D}}{\mathrm{ow}}}{({\mathcal {F}},P_K)}$ and prove (Lemma 3.6) that a lower bound of the dimension of ${({\mathcal {F}},P_K)}$ can be derived from these persistent lengths. Next we define another set of maximal persistence lengths from ${\mathrm{Dow}}(S(M_n))$ and prove (Theorem 3.8) that they converge to the maximal persistence lengths associated to ${{\mathbb {D}}{\mathrm{ow}}}{({\mathcal {F}},P_K)}$ in probability, in the limit of large sampling of the data. The rest of Sect. 3 is devoted to two subsampling procedures for different practical situations, as well as simulation results that illustrate that the correct dimension can be inferred with these two methods.

In Sect. 4, we introduce complete regular pairs and prove (Theorem 4.3) that the lower bound in Lemma 3.6 is equal to the dimension $d {({\mathcal {F}},P_K)}$ for complete regular pairs. This establishes (Theorem 4.4) that the dimension estimator introduced in Sect. 3.3 is an asymptotically consistent estimator in the class of complete regular pairs. In Sect. 5, we define an estimator that can be used to test (Theorem 5.6) whether the data matrix is sampled from a complete regular pair. The Appendix (Sect. 7) contains the proofs of the main theorems as well as some technical supporting lemmas.

2 Empirical Dowker complex and the interleaving convergence theorem

In this section, we define the empirical Dowker complex from the m sequences induced from the rows of the data matrix M and the Dowker complex from the regular pair ${({\mathcal {F}},P_K)}$ and prove that the empirical Dowker complex converges to the Dowker complex in probability. These complexes are both examples of multi-filtered simplicial complexes.

Definition 2.1

Let $I = \prod _{i\in [m]} I_i$ be an m-orthotope in ${\mathbb {R}}^m$, where each $I_i$ is an interval (open, closed, half-open, finite, or infinite are all allowed) in ${\mathbb {R}}$. Let $\le $ be the natural partial order on I induced from ${\mathbb {R}}^m$. A multi-filtered simplicial complex ${\mathcal {D}}$ indexed over I is a collection $\{{\mathcal {D}}_\alpha \}_{\alpha \in I}$ of simplicial complexes on a fixed finite vertex set, such that, ${\mathcal {D}}_\alpha \subseteq {\mathcal {D}}_\beta $, for all $\alpha \le \beta $ in I.

We define the empirical Dowker complexes from a collection of sequences of maximal length^{Footnote 7} on the vertex set [n].

Definition 2.2

Let $S=\{s_1,\ldots , s_m\}$ be a collection of sequences on [n] of length n. Let $\le _i$ be the total order on [n] induced from $s_i$; namely, for $a,b\in [n]$, $a\le _i b$ if and only if a is before or equal to b in $s_i$. We define the following multi-filtered simplicial complex, with vertex set [m] and indexed over $[0,1]^m$:

$$\begin{aligned} {\mathrm{Dow}}(S) {\mathop {=}\limits ^{\mathrm {def}}}\left\{ {\mathrm{Dow}}\left( S\right) \left( t_1,\ldots , t_m\right) :\left( t_1,\ldots , t_m\right) \in [0,1]^m\right\} , \end{aligned}$$

where

$$\begin{aligned} {\mathrm{Dow}}(S)(t_1,\ldots , t_m) {\mathop {=}\limits ^{\mathrm {def}}}\varDelta (\{\sigma _a:a=1,\ldots , n\}), \end{aligned}$$

and

$$\begin{aligned} \sigma _a =\left\{ i\in [m]: \#(\{b\in [n]:b\le _i a\})\le n t_i\right\} . \end{aligned}$$

(6)

Here $\varDelta (\{ \sigma _a\}_{a\in [n]})$ denotes the smallest simplicial complex containing the faces $\{ \sigma _a\}_{a\in [n]}$. This filtered complex is called the empirical Dowker complex of S.

Example 2.3

For $n = 3$, $m = 4$, consider $S = \{s_1, s_2, s_3, s_4\} = \{(1,2,3), (2,1,3), (3,1,2), (1,3,2)\}$. The four total orders on [3] induced by sequences in S are $1\le _1 2\le _1 3$, $2\le _2 1\le _2 3$, $3\le _3 1\le _3 2$, and $1\le _4 3\le _4 2$. Let us compute ${\mathrm{Dow}}(S)(t_1, t_2, t_3, t_4)$ for $(t_1, t_2, t_3, t_4) = \left( \frac{1}{2}, \frac{1}{3}, \frac{2}{3}, \frac{1}{4}\right) $.

First, $n(t_1, t_2, t_3, t_4)= (3/2, 1, 2, 3/4)$, hence, for $a = 1$,

$$\begin{aligned} \{b\in [n]:b\le _1 a\}&= \{1\}, \\ \{b\in [n]:b\le _2 a\}&= \{2,1\}, \\ \{b\in [n]:b\le _3 a\}&= \{3,1\},\\ \{b\in [n]:b\le _4 a\}&= \{1\}. \end{aligned}$$

Hence $\#(\{b\in [n]:b\le _i a\}) = 1,2,2,1$, for $i = 1,2,3,4$. Since $1\le 3/2, 2\nleq 1, 2\le 2$ and $1\nleq 3/4$, the Eq. (6) yields $ \sigma _1 = \{1,3\}$.

Similarly, for $\sigma _2$, we compute $\#(\{b\in [n]:b\le _i 2\}) = 2,1,3,3$, for $i = 1,2,3,4$; since $2\nleq 3/2, 1\le 1, 3\nleq 2$, and $3\nleq 3/4$, $\sigma _2 = \{2\}$. Finally, for $\sigma _3$, we compute $\#(\{b\in [n]:b\le _i 3\}) = 3,3,1,2$, for $i = 1,2,3,4$; since $3\nleq 3/2, 3\nleq 1,1\le 2$, and $2\nleq 3/4$, $\sigma _3 = \{3\}$. Therefore, we conclude that

$$\begin{aligned} {\mathrm{Dow}}(S)\left( \frac{1}{2}, \frac{1}{3}, \frac{2}{3}, \frac{1}{4}\right) = \varDelta (\{\sigma _1, \sigma _2, \sigma _3\}) = \varDelta (\{\{1,3\}, \{2\}, \{3\}\}) = \varDelta (\{\{1,3\}, \{2\}\}), \end{aligned}$$

i.e. the simplicial complex with vertices $[4] = \{1,2,3,4\}$ and facets $\{1,3\}$ and $\{2\}$.

Recall from Sect. 1.2 that the relevant geometric information of the $m\times n$ data matrix $M\in {\mathcal {M}}_{m,n}^o$ is contained in the collection of m sequences $S(M) = \{s_1,\ldots , s_m\}$, where $s_i\subset S_n[n]$ is of length n and records the total order induced by the i-th row of M. Therefore, we can consider the empirical Dowker complex ${\mathrm{Dow}}(S(M))$ derived from the data matrix M.

Dowker complexes first appeared in the paper (Dowker 1952) by C.H. Dowker, defined as a simplicial complex induced by a relation between two sets. For modern treatments and generalizations, we refer the readers to Section 4.9 of Ghrist (2014) and Section 4.2.3 of Chazal et al. (2014), Chowdhury and Mémoli (2018), and Brun and Blaser (2019). In particular, our definition of empirical Dowker complex is a multi-parameter generalization of the Dowker complex defined in Chazal et al. (2014). Specifically, the one-dimensional filtration of simplicial complex (indexed over t) ${\mathrm{Dow}}(S(M))(n\cdot t,\ldots , n\cdot t)$ is equal to the Dowker complex defined in Chazal et al. (2014).

Recall that, for a collection ${\mathcal {A}} = \{A_i\}_{i\in [m]}$ of sets, the nerve of ${\mathcal {A}}$, denoted ${\mathrm{nerve}}({\mathcal {A}})$, is the simplicial complex on the vertex set [m] defined as

$$\begin{aligned} {\mathrm{nerve}}({\mathcal {A}}) {\mathop {=}\limits ^{\mathrm {def}}}\left\{ \sigma \subseteq [m]: \bigcap _{i\in \sigma } A_i\ne \varnothing \right\} . \end{aligned}$$

The following lemma is immediate from Definition 2.2.

Lemma 2.4

Let $S=\{s_1,\ldots , s_m\}$ be a collection of sequences on [n] of length n. For each $i\in [m]$ and $t\in {\mathbb {R}}$, consider

$$\begin{aligned} A^{(i)}(t) {\mathop {=}\limits ^{\mathrm {def}}}\left\{ a\in [n]: \#\left( \left\{ b\in [n]:b\le _i a\right\} \right) \le n t\right\} \subset [n], \end{aligned}$$

where $\le _i$ is the total order on [n] induced by $s_i$. Then

$$\begin{aligned} {\mathrm{Dow}}(S)(t_1,\ldots , t_m) = {\mathrm{nerve}}\left( \left\{ A^{(i)}(t_i)\right\} _{i\in [m]}\right) . \end{aligned}$$

Next we connect the combinatorics of ${\mathrm{Dow}}(S(M))$ to the geometry. From Lemma 2.4, we know that ${\mathrm{Dow}}(S(M))$ is the nerve of $\{A^{(i)}(t_i)\}_{i\in [m]}$. To define an analogue of ${\mathrm{Dow}}(S(M))$ from the regular pair $({\mathcal {F}},P_K)$, we use the following

Lemma 2.5

Assume that $P_K$ is a probability measure on a convex open set K that is equivalent to the Lebesgue measure. Let $f:K\rightarrow {\mathbb {R}}$ be a continuous function with $P_K(f^{-1}(\ell )) = 0$ for all $\ell \in {\mathbb {R}}$. Then there exists a unique strictly increasing continuous function $\lambda :(0,1)\rightarrow {\mathbb {R}}$ such that, for all $t\in (0,1)$,

$$\begin{aligned} P_K(f^{-1}(-\infty ,\lambda (t)) = t. \end{aligned}$$

(7)

Proof

Since K is path-connected, by intermediate value theorem, f(K) is an interval in ${\mathbb {R}}$. Define a function $p_f:f(K)\rightarrow [0,1]$ by $p_f(\ell ) {\mathop {=}\limits ^{\mathrm {def}}}P_K(f^{-1}(-\infty ,\ell ))$. Rewriting Eq. (7) as $p_f(\lambda (t)) = t$, we note that $\lambda (t)$ (if exists) is the inverse of $p_f$, proving uniqueness of $\lambda (t)$. For the existence and continuity of $\lambda (t)$, it suffices to prove $p_f$ is continuous and strictly increasing.

To prove $p_f$ is continuous, we prove $p_f$ is continuous from the right and from the left. Let $\ell \in f(K)$. For $\ell _n\nearrow \ell $ in f(K),^{Footnote 8} from definition, $f^{-1}(-\infty ,\ell _n)\nearrow f^{-1}(-\infty ,\ell )$.^{Footnote 9} Since $P_K$ is a finite measure, taking $P_K$ on both sides, we obtain $p_f(\ell _n)\nearrow p_f(\ell )$. Thus $p_f$ is continuous from the left. On the other hand, for $\ell _n\searrow \ell $ in f(K), from definition,

$$\begin{aligned} \left( \bigcap _{n = 1}^\infty f^{-1}(-\infty ,\ell _n)\right) \setminus f^{-1}(-\infty ,\ell ) = f^{-1}(\ell ). \end{aligned}$$

Thus $p_f(\ell _n)\searrow P_K(\bigcap _{n = 1}^\infty f^{-1}(-\infty ,\ell _n))=P_K(f^{-1}(-\infty ,\ell ))+P_K(f^{-1}(\ell )) = p_f(\ell )+0 = p_f(\ell )$ and $p_f$ is continuous from the right. Therefore, $p_f$ is a continuous function.

Now we turn to prove $p_f$ is strictly increasing. For $\ell _1<\ell _2$ in f(K), we need to prove $p_f(\ell _1)<p_f(\ell _2)$. Let $U_1 = f^{-1}(-\infty ,\ell _1)$ and $U_2 = f^{-1}(-\infty ,\ell _2)$, which are open convex sets with $U_1\subseteq U_2$. Since f(K) is an interval, for any $\ell \in (\ell _1,\ell _2)$, there exists $x\in K$ with $f(x) = \ell $. Thus $U_1\ne U_2$. Note that $U_2\nsubseteq cl(U_1)$; otherwise, $U_1\subseteq U_2 = int(U_2)\subseteq int(cl(U_1))=U_1$ will imply $U_1 = U_2$, where the last equality follows from openness and convexity of $U_1$. Thus, there exists $x_0\in U_2\setminus cl(U_1)$. Choose $\epsilon >0$ such that $B(x_0,\epsilon )\subseteq U_2$ but $B(x_0,\epsilon )\cap cl(U_1) = \varnothing $. Then $P_K(U_2)\ge P_K(U_1)+P_K(B(x_0,\epsilon ))>P_K(U_1)$; equivalently, $p_f(\ell _2)>p_f(\ell _1)$. Hence, $p_f$ is strictly increasing. $\square $

For a regular pair ${({\mathcal {F}},P_K)}=(\{f_i:K\rightarrow {\mathbb {R}}\}_{i\in [m]},P_K)$, Lemma 2.5 guarantees that for each $i\in [m]$, there exists a unique strictly increasing continuous function $\lambda _i:(0,1)\rightarrow {\mathbb {R}}$ that satisfies (7). The following definition provides a continuous analogue of $A^{(i)}(t)$.

Definition 2.6

Let $({\mathcal {F}},P_K)=(\{f_i\}_{i\in [m]},P_K)$ be a regular pair. For each $i\in [m]$ and $t\in (0,1)$, define

$$\begin{aligned} K^{(i)}(t){\mathop {=}\limits ^{\mathrm {def}}}f_i^{-1}(-\infty ,\lambda _i(t)), \end{aligned}$$

where $\lambda _i :(0,1)\rightarrow {\mathbb {R}}$ is the unique function that satisfies (7). For convenience, we also define $K^{(i)}(0){\mathop {=}\limits ^{\mathrm {def}}}\varnothing $ and $K^{(i)}(1){\mathop {=}\limits ^{\mathrm {def}}}K$.

In other words, $K^{(i)}(t)$ are sublevel sets of $f_i$ rescaled with respect to the measure $P_K$ (see Fig. 2). Analogous to Lemma 2.4, we use $K^{(i)}(t)$ to define the continuous version of Dowker complex.

Definition 2.7

Let $({\mathcal {F}}, P_K)$ be a regular pair. Define a multi-filtered complex ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)$, indexed over $[0,1]^m$, by

$$\begin{aligned} {{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)(t_1,\ldots , t_m){\mathop {=}\limits ^{\mathrm {def}}}{\mathrm{nerve}}\left( \{K^{(i)}(t_i)\}_{i=1}^m\right) . \end{aligned}$$

This multi-filtered complex is called the Dowker complex induced from $({\mathcal {F}}, P_K)$.

The complex ${\mathrm{Dow}}(S(M))$ is what we can obtain from the data matrix M, but it does not capture the whole geometric information of $({\mathcal {F}},P_K)$. On the other hand, ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)$ reflects the whole geometric information but is not directly computable. Since $A^{(i)}(t_i)$ is an approximation of $K^{(i)}(t_i)$, one might expect ${\mathrm{Dow}}(S(M))$ approximates ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)$. To compare theese two complexes, we need the concept of the interleaving distance.

Definition 2.8

For a multi-filtered complex ${\mathcal {K}}$ indexed over ${\mathbb {R}}^m$ and $\epsilon >0$, the $\epsilon $-shift of ${\mathcal {K}}$, denoted ${\mathcal {K}}+\epsilon $, is the multi-filtered complex defined by

$$\begin{aligned} ({\mathcal {K}}+\epsilon )(t_1,\ldots , t_m) {\mathop {=}\limits ^{\mathrm {def}}}{\mathcal {K}}(t_1+\epsilon ,\ldots , t_m+\epsilon ). \end{aligned}$$

For two multi-filtered complexes ${\mathcal {K}}$ and ${\mathcal {L}}$ indexed over ${\mathbb {R}}^m$, the simplicial interleaving distance between ${\mathcal {K}}$ and ${\mathcal {L}}$ is defined as

$$\begin{aligned} d_{{\mathrm{INT}}}({\mathcal {K}}, {\mathcal {L}}) {\mathop {=}\limits ^{\mathrm {def}}}\inf \{\epsilon >0: {\mathcal {K}}\subseteq {\mathcal {L}}+\epsilon \text { and } {\mathcal {L}}\subseteq {\mathcal {K}}+\epsilon \}. \end{aligned}$$

Note that this interleaving distance is between multi-filtered simplicial complexes (see, e.g. Lesnick 2015, for the standard definition of interleaving distance between multi-dimensional persistence modules). Similar to the standard interleaving distance, the simplicial interleaving distance $d_{\mathrm{INT}}$ defined here is also a pseudo-metric; namely, $d_{\mathrm{INT}}({\mathcal {K}},{\mathcal {L}}) = 0$ does not imply ${\mathcal {K}}={\mathcal {L}}$.

The definition of simplicial interleaving distance involves a shift of indices and that is why the two multi-filtered complexes to be compared are required to be indexed over the whole ${\mathbb {R}}^m$. Since both ${\mathrm{Dow}}(S(M))$ and ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)$ are indexed only over $[0,1]^m$, we need to extend their indexing domain to ${\mathbb {R}}^m$. This motivates the definition below.

Definition 2.9

For ${\mathcal {D}}={\mathrm{Dow}}(S(M))$ or ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)$ and $(t_1,\ldots , t_m)\in {\mathbb {R}}^m$, define

$$\begin{aligned} {\mathcal {D}}(t_1,\ldots , t_m) {\mathop {=}\limits ^{\mathrm {def}}}{\mathcal {D}}(\theta (t_1),\ldots ,\theta (t_m)) \end{aligned}$$

where $\theta :{\mathbb {R}}\rightarrow [0,1]$ is defined by $\theta (t) = t$, if $0\le t\le 1$; $\theta (t) =0$, if $t<0$; $\theta (t) =1$, if $t>1$.

With the above notations, we state one of our main theorems.

Theorem 2.10

(Interleaving Convergence Theorem) Let $({\mathcal {F}}, P_K)$ be a regular pair and $M_n$ be an $m\times n$ data matrix sampled from ${({\mathcal {F}},P_K)}$. Then the simplicial interleaving distance between ${\mathrm{Dow}}(S(M_n))$ and ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)$ converges to 0 in probability as $n\rightarrow \infty $; namely, for all $\epsilon >0$,

$$\begin{aligned} \lim _{n\rightarrow \infty } \Pr \left[ d_{{\mathrm{INT}}}({\mathrm{Dow}}(S(M_n)), {{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K))>\epsilon \right] =0. \end{aligned}$$

The proof of Theorem 2.10 is given in Section 6.2. In Secti. 3, we use Theorem 2.10 to infer a lower bound for the dimension of ${({\mathcal {F}},P_K)}$.

3 Estimating the stimulus space dimension

In this section we define the topological features of ${\mathrm{Dow}}(S(M))$ that we use to infer a lower bound for the intrinsic dimension $d{({\mathcal {F}},P_K)}$ of a regular pair ${({\mathcal {F}},P_K)}$. Specifically, in Sect. 3.1, we review the basics of persistence modules and persistence homology and define the maximal persistence lengths. In Sect. 3.2 we define the features of ${{\mathbb {D}}{\mathrm{ow}}}{({\mathcal {F}},P_K)}$ that can be used to infer the embedding dimension. Lemma 3.6 introduces a lower bound $d_{\mathrm{low}}{({\mathcal {F}},P_K)}$ of the intrinsic dimension $d{({\mathcal {F}},P_K)}$ that is defined in terms of these features. In Sect. 3.3, we define the analogous features of ${\mathrm{Dow}}(S(M))$ and prove their convergence to the respective features of ${{\mathbb {D}}{\mathrm{ow}}}{({\mathcal {F}},P_K)}$ in the limit of large sampling (Theorem 3.8). Section 3.4 provides explicit algorithms for the computation of the features of ${\mathrm{Dow}}(S(M))$ that allow to infer the dimension due to Theorem 3.8. Lastly, Sect. 3.5 discusses the computational aspects of the method and provides two practical approaches in their respective scenarios. We also illustrate the usability of these methods for the dimension inference using synthetic data.

3.1 Persistence modules and maximal persistence length

First we recall the definition of persistence modules, persistence intervals and persistence diagrams; for more details see, e.g. Chapter 1 of Oudot (2015). Then we define the maximal persistence length for a 1-dimensional filtration of simplicial complexes. We consider a ground field ${\mathbb {F}}$ (which is normally taken to be ${\mathbb {F}}_2$ for computational reasons). All the results in this paper do not depend on the choice of the field.

Definition 3.1

A persistence module ${\mathcal {M}}$ indexed over an interval [0, T] is a collection $\{{\mathcal {M}}_t\}_{t\in [0,T]}$ of vector spaces over ${\mathbb {F}}$ along with linear maps $\phi _s^t:{\mathcal {M}}_s\rightarrow {\mathcal {M}}_t$ for every $s\le t$ in [0, T] such that $\phi _s^u = \phi _t^u\circ \phi _s^t$, and $\phi _t^t ={\text {id}}_{{\mathcal {M}}_{\mathrm{t}}}$ for all $s\le t\le u$ in [0, T].

The direct sum of two persistence modules ${\mathcal {M}}$ and ${\mathcal {N}}$ over [0, T] is the persistence module $({\mathcal {M}}\oplus {\mathcal {N}})_t{\mathop {=}\limits ^{\mathrm {def}}}{\mathcal {M}}_t\oplus {\mathcal {N}}_t$. For an interval $J\subseteq [0,T]$, the interval module ${\mathbb {I}}_J = \{({\mathbb {I}}_J)_t\} $ is defined as

$$\begin{aligned} ({\mathbb {I}}_J)_t{\mathop {=}\limits ^{\mathrm {def}}}{\left\{ \begin{array}{ll} {\mathbb {F}},&{} \text { if } t\in J, \\ 0,&{} \text { if } t\notin J,\end{array}\right. } \end{aligned}$$

along with the identity maps $\phi _s^t: {\mathbb {F}}\rightarrow {\mathbb {F}}$ for every $s\le t$ in J and zero maps $\phi _s^t=0 $ for all other $s< t$ in [0, T].

A well-known structural characterization of a persistence module is via its persistence intervals. The following is a structural theorem that characterizes persistence modules and guarantees the existence and uniqueness of persistence intervals (Crawley-Boevey 2015) (see, also Oudot 2015 and references therein).

Theorem 3.2

Let ${\mathcal {M}} = \{{\mathcal {K}}_t\}_{t\in [0,T]}$ be a persistence module over [0, T]. If, for each $t\in [0,T]$, ${\mathcal {M}}_t$ is a finite dimensional vector space over ${\mathbb {F}}$, then ${\mathcal {M}}$ can be decomposed as a direct sum of interval modules; namely,

$$\begin{aligned} {\mathcal {M}} = \bigoplus _{J} {\mathbb {I}}_J \end{aligned}$$

where $\{J\}$ is a collection of some intervals (open, closed, or half-open) in [0, T]. The decomposition is unique in the sense that, for every such decomposition, the collection of intervals is the same.

Rigorously speaking, one should distinguish open, closed, and half-open intervals. In what follows, we only use the lengths of the persistence intervals, and hence those distinctions will not matter. An important class of persistence modules is obtained from a 1-dimensional filtration of simplicial complexes by applying the homology functors $H_k(\ \cdot \ ;{\mathbb {F}})$, $k = 0, 1, 2, ...$. Specifically, for a 1-dimensional filtration of simplicial complexes ${\mathcal {K}} = \{{\mathcal {K}}_t\}_{t\in [0,T]}$ and a fixed nonnegative integer k, we have the persistence module $H_k({\mathcal {K}};{\mathbb {F}}) = \{H_k({\mathcal {K}}_t;{\mathbb {F}})\}_{t\in [0,T]}$ along with the linear maps $(i_s^t)_*:H_k({\mathcal {K}}_s;{\mathbb {F}})\rightarrow H_k({\mathcal {K}}_t;{\mathbb {F}})$ for every $s\le t$ in [0, T], where $i_s^t$ is the inclusion map from ${\mathcal {K}}_s$ to ${\mathcal {K}}_t$. Since $H_k(\ \cdot \ ;{\mathbb {F}})$ is a covariant functor, the equality $(i_t^u)_*\circ (i_s^t)_* = (i_s^u)_*$ holds for every $s\le t\le u$ in [0, T].

For each k, we may use the persistence diagram of $H_k({\mathcal {K}};{\mathbb {F}})$ for analysis. For our purpose, instead of the whole diagram, we summarize the diagram by only looking at the longest length among all persistence intervals, which we formally define below:

Definition 3.3

Let ${\mathcal {K}} = \{{\mathcal {K}}_t\}_{t\in [0,T]}$ be a 1-dimensional filtration of simplicial complexes. For each nonnegative integer k, we define

$$\begin{aligned} l_{\max }(k,{\mathcal {K}}){\mathop {=}\limits ^{\mathrm {def}}}\sup \{\beta -\alpha :(\alpha ,\beta )\in H_k({\mathcal {K}};{\mathbb {F}})\} \end{aligned}$$

(8)

and call it the maximal persistence length in dimension ${\varvec{k}}$.

This definition is similar to the one used in Section 3 of Bobrowski et al. (2015).^{Footnote 10} Normally, the length of a persistence interval in $H_k({\mathcal {K}};{\mathbb {F}})$ is viewed as its significance in dimension k. Therefore, $l_{\max }(k,{\mathcal {K}})$, the maximum among such interval lengths, is viewed as the significance of ${\mathcal {K}}$ in dimension k.

3.2 The length $L_k{({\mathcal {F}},P_K)}$ and its relation to the dimension of ${({\mathcal {F}},P_K)}$

In this section, from the regular pair ${({\mathcal {F}},P_K)}$, we define quantities that we use to bound the dimension $d{({\mathcal {F}},P_K)}$ from below. We start with the following notation.

Definition 3.4

Given $({\mathcal {F}}, P_K)$, where ${\mathcal {F}} = \{f_i:K\rightarrow {\mathbb {R}}\}_{i\in [m]}$ is a collection of quasi-convex functions defined on a convex open set K and $P_K$ is a probability measure on K. For $x\in K$, we define

$$\begin{aligned} T_i(x) {\mathop {=}\limits ^{\mathrm {def}}}P_K(f_i^{-1}(-\infty ,f_i(x))). \end{aligned}$$

The function $T_i(x)$ may be regarded as the $P_K$-rescaled version of $f_i$ (see Fig. 3 for an illustration). Now we define a one dimensional filtration of simplicial complexes ${\mathcal {K}}_x$ that are used to infer a lower bound of the dimension $d{({\mathcal {F}},P_K)}$. The geometry underlying the definition is depicted in Fig. 4.

Throughout Sect. 3, we fix an arbitrary coefficient field ${\mathbb {F}}$ when taking homology; namely, for a filtration of simplicial complexes ${\mathcal {K}}$ and a nonnegative integer k, $H_k({\mathcal {K}}){\mathop {=}\limits ^{\mathrm {def}}}H_k({\mathcal {K}};{\mathbb {F}})$.

Definition 3.5

Let ${({\mathcal {F}},P_K)}$ be a regular pair, where ${\mathcal {F}} = \{f_i:K\rightarrow {\mathbb {R}}\}_{i\in [m]}$. For $x\in K$, let

$$\begin{aligned} t_{\max }(x) {\mathop {=}\limits ^{\mathrm {def}}}\max _{i\in [m]} T_i(x). \end{aligned}$$

(9)

Define a one dimensional filtered complex ${\mathcal {K}}_x$, indexed over $t\in [0,t_{\max }(x)]$, by

$$\begin{aligned} {\mathcal {K}}_x(t) {\mathop {=}\limits ^{\mathrm {def}}}{{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}},P_K)\left( T_1(x)-(t_{\max }(x)-t),\ldots ,T_m(x)-(t_{\max }(x)-t)\right) . \end{aligned}$$

(10)

For every nonnegative integer k, we define

$$\begin{aligned} L_k({\mathcal {F}},P_K) {\mathop {=}\limits ^{\mathrm {def}}}\sup _{x\in K} l_{\max }(k,{\mathcal {K}}_x). \end{aligned}$$

(11)

As illustrated in Fig. 4, if x is “central” in some appropriate sense (see Definition 4.1 in Sect. 4), a $(d{({\mathcal {F}},P_K)}-1)$-dimensional sphere is expected to show up and persist for a significant amount of time. In general, $L_k{({\mathcal {F}},P_K)}$ can at least be used to derive a lower bound for the dimension of the regular pair $({\mathcal {F}}, P_K)$ due to the following lemma.

Lemma 3.6

Let ${({\mathcal {F}},P_K)}$ be a regular pair. Then, for $k\ge d{({\mathcal {F}},P_K)}$, $L_k({\mathcal {F}},P_K)=0$. In particular,

$$\begin{aligned} d_{{\mathrm{low}}}{({\mathcal {F}},P_K)}{\mathop {=}\limits ^{\mathrm {def}}}1+\max \{k:L_k({\mathcal {F}},P_K)> 0\} \le d{({\mathcal {F}},P_K)}. \end{aligned}$$

(12)

Proof

For notational simplicity, in this proof, we denote $d_{{\mathrm{low}}}=d_{{\mathrm{low}}}{({\mathcal {F}},P_K)}$ and $d = d{({\mathcal {F}},P_K)}$. Recall that ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}},P_K)(t_1,\ldots , t_m) = {\mathrm{nerve}}\left( \{f_i^{-1}(-\infty ,\lambda _i(t_i))\}_{i=1}^m\right) $. Since the functions $f_i$ are quasi-convex, intersections of convex sets are convex and convex sets are contractible, the set $\{f_i^{-1}(-\infty ,\lambda _i(t_i))\}_{i=1}^m$ is a good cover. Thus, by nerve lemma (see, e.g., Theorem 10.7 in Björner 1995 or Corollary 4G.3 in Hatcher 2002), we have the following homotopy equivalence:

$$\begin{aligned} {{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}},P_K)(t_1,\ldots , t_m)\sim \bigcup _{i\in [m]} f_i^{-1}(-\infty ,\lambda _i(t_i)). \end{aligned}$$

(13)

Notice that $\bigcup _{i\in [m]} f_i^{-1}(-\infty ,\lambda _i(t_i))$ is open in ${\mathbb {R}}^d$ and it is well-known that, for every open set $U\subseteq {\mathbb {R}}^d$, $H_k(U) = 0$, for all $k\ge d$ (see, e.g., Proposition 3.29 in Hatcher 2002). Thus, for $k\ge d$, $H_k\left( \bigcup _{i\in [m]} f_i^{-1}(-\infty ,\lambda _i(t_i))\right) = 0$. Combining with (13), we obtain, for $k\ge d$ and $(t_1,\ldots , t_m)\in [0,1]^m$, $H_k({{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}},P_K)(t_1,\ldots , t_m)) = 0$. Therefore, for $k\ge d$, $l_{\max }(k,{\mathcal {K}}_x)=0$, for all $x\in K$, and $L_k{({\mathcal {F}},P_K)}=0$. Thus $d_{{\mathrm{low}}}-1 = \max \{k:L_k({\mathcal {F}},P_K)> 0\}\le d-1$ or, equivalently, $d_{{\mathrm{low}}}\le d$. $\square $

$L_k({\mathcal {F}},P_K)$ is defined with respect to a regular pair $({\mathcal {F}}, P_K)$ and thus is not directly computable from discrete data. In Sect. 3.3, we follow an analogous approach in defining $L_k{({\mathcal {F}},P_K)}$ to define $L_k(M)$ and prove that $L_k(M)$ converges to $L_k{({\mathcal {F}},P_K)}$.

3.3 The length $L_k(M)$ and its convergence to $L_k{({\mathcal {F}},P_K)}$

In Theorem 2.10, we see that, for the data matrix M, ${\mathrm{Dow}}(S(M))$ approximates ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)$ with high probability. Thus, it is natural to use ${\mathrm{Dow}}(S(M))$ to define an analogue $L_k(M)$ of $L_k{({\mathcal {F}},P_K)}$.

Definition 3.7

Let $M\in {\mathcal {M}}_{m,n}^o$ and $S(M)=\{s_1,\ldots , s_m\}$ be the collection of m sequences induced from the rows of M ($s_i$ corresponds to row i). For $a\in [n]$ and $i\in [m]$, denote

$$\begin{aligned} {\mathrm{ord}}_i(M,a){\mathop {=}\limits ^{\mathrm {def}}}\#(\{b\in [n]:M_{ib}\le M_{ia}\}). \end{aligned}$$

(14)

For $a\in [n]$, which corresponds to the a-th column of the data matrix M, let

$$\begin{aligned} {\hat{t}}_{\max }(a){\mathop {=}\limits ^{\mathrm {def}}}\max _{i\in [m]} \frac{{\mathrm{ord}}_i(M,a)}{n}. \end{aligned}$$

(15)

Define a one dimensional filtered complex $\hat{{\mathcal {K}}}_{a}$, indexed over $t\in [0,{\hat{t}}_{\max }(a)]$, by

$$\begin{aligned} \hat{{\mathcal {K}}}_a(t) {\mathop {=}\limits ^{\mathrm {def}}}{\mathrm{Dow}}(S(M))\left( \frac{{\mathrm{ord}}_1(M,a)}{n}-({\hat{t}}_{\max }(a)-t),\ldots , \frac{{\mathrm{ord}}_m(M,a)}{n}-({\hat{t}}_{\max }(a)-t)\right) . \end{aligned}$$

(16)

See Definition 2.2 for the definition of ${\mathrm{Dow}}(S(M))$. For every nonneative integer k, we define

$$\begin{aligned} L_k(M) {\mathop {=}\limits ^{\mathrm {def}}}\max _{a\in [n]} l_{\max }(k,\hat{{\mathcal {K}}}_a). \end{aligned}$$

(17)

Since ${\mathrm{Dow}}(S(M))$ approximates ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}},P_K)$, intuitively, $\hat{{\mathcal {K}}}_a(t)$ approximates ${\mathcal {K}}_{x_a}(t)$ and $L_k(M)$ approximates $L_k({\mathcal {F}},P_K)$. With the help of Theorem 2.10 and the Isometry Theorem in topological data analysis (see e.g. Theorem 7.16 in Sect. 7 of Oudot 2015), these intuitions are justified as follows:

Theorem 3.8

Let $({\mathcal {F}}, P_K)$ be a regular pair. Assume that K is bounded and each $f_i\in {\mathcal {F}}$ can be continuously extended to the closure ${\bar{K}}$. Let $M_n$ be an $m\times n$ matrix sampled from ${({\mathcal {F}},P_K)}$. Then, for all $k\in \{0\}\cup {\mathbb {N}}$, as $n\rightarrow \infty $, $L_k(M_n)$ converges to $L_k({\mathcal {F}},P_K)$ in probability; namely, for all $\epsilon >0$,

$$\begin{aligned} \lim _{n\rightarrow \infty } \Pr \left[ \left| L_k(M_n)-L_k({\mathcal {F}},P_K)\right| <\epsilon \right] =1. \end{aligned}$$

Moreover, the rate of convergence is independent of k.^{Footnote 11}

The proof of Theorem 3.8 is given in Sect. 7.2. According to Theorem 3.8, for each non-negative integer k, $L_k(M_n)$ are consistent estimators of $L_k({\mathcal {F}},P_K)$ and they converge uniformly in probability. Thus, by Lemma 3.6 and Theorem 3.8, we can estimate a lower bound for the dimension of ${({\mathcal {F}},P_K)}$ from the data matrix M, via looking at the values of $L_k(M_n)$. Formally, we can define the following estimator of $d_{\mathrm{low}}{({\mathcal {F}},P_K)}$.

Definition 3.9

For $\epsilon >0$ and $M\in {\mathcal {M}}_{m,n}^o$, we define

$$\begin{aligned} {\hat{d}}_{{\mathrm{low}}}(M,\epsilon ){\mathop {=}\limits ^{\mathrm {def}}}1 + \max \{k:L_k(M)>\epsilon \}. \end{aligned}$$

(18)

As a consequence of Lemma 3.6 and Theorem 3.8, it is immediate that ${\hat{d}}_{{\mathrm{low}}}(M_n,\epsilon )$ is a consistent estimator, for appropriately chosen $\epsilon $.

Corollary 3.10

Let ${({\mathcal {F}},P_K)}$ be a regular pair satisfying the conditions in Theorem 3.8 and $M_n\in {\mathcal {M}}_{m,n}^o$ be sampled from ${({\mathcal {F}},P_K)}$. Denote $d_{\mathrm{low}}= d_{\mathrm{low}}{({\mathcal {F}},P_K)}$. Then, for all $0<\epsilon <L_{d_{{\mathrm{low}}}-1}{({\mathcal {F}},P_K)}$,

$$\begin{aligned} \lim _{n\rightarrow \infty } \Pr \left[ {\hat{d}}_{{\mathrm{low}}}(M_n,\epsilon ) = d_{{\mathrm{low}}}{({\mathcal {F}},P_K)}\right] = 1. \end{aligned}$$

(19)

Proof

For notational simplicity, in this proof, we denote $d = d{({\mathcal {F}},P_K)}$. By Lemma 3.6 and Theorem 3.8, for $k\ge d$, $L_k(M_n)\rightarrow 0$ in probability and $L_{d_{\mathrm{low}}-1}(M_n)\rightarrow L_{d_{\mathrm{low}}-1}{({\mathcal {F}},P_K)}>0$ in probability, as $n\rightarrow \infty $, with the same rate of convergence. Since $0<\epsilon <L_{d_{{\mathrm{low}}}-1}{({\mathcal {F}},P_K)}$, as $n\rightarrow \infty $, w.h.p., $L_{d_{\mathrm{low}}-1}(M_n)>\epsilon $ and $L_k(M_n)<\epsilon $. Therefore, w.h.p., ${\hat{d}}_{{\mathrm{low}}}(M_n,\epsilon ) = d_{{\mathrm{low}}}$, and the result follows. $\square $

From Corollary 3.10, ${\hat{d}}_{{\mathrm{low}}}(M_n,\epsilon )$ can be used as a consistent estimator of $d_{\mathrm{low}}{({\mathcal {F}},P_K)}$. However, we need to know how to choose an appropriate $\epsilon $ for ${\hat{d}}_{{\mathrm{low}}}(M_n,\epsilon )$, and hence estimation of $L_k{({\mathcal {F}},P_K)}$ is still necessary. Therefore, in practice, we suggest one use a statistical approach estimating $L_k{({\mathcal {F}},P_K)}$ to infer $d_{\mathrm{low}}{({\mathcal {F}},P_K)}$, instead of using ${\hat{d}}_{{\mathrm{low}}}(M_n,\epsilon )$ directly. The details are discussed in Sect. 3.5.

3.4 Algorithm for $\hat{{\mathcal {K}}}_a$ and $L_k(M)$

For ease of implementation, we combine Definitions 3.7 and 2.2 and summarize them as algorithms for the computaion of $\hat{{\mathcal {K}}}_a$ and $L_k(M)$. Algorithm 1 is for $\hat{{\mathcal {K}}}_a$.

The next algorithm, Algorithm 2, is for computing $L_k(M)$. Note that, in the algorithm, PersistenceIntervals is a function with two inputs, a filtration of simplicial complexes and a positive integer that is set to limit the dimension of the computation of persistent homology to avoid possible intractable computational complexities. As the name suggests, the output of PersistenceIntervals is the persistence intervals of the first input in dimensions less than or equal to the second input.

3.5 How to use the algorithms in different situations

The worst case complexity of a standard algorithm for computing the persistent homology of a 1-dimensional filtration of simplicial complexes is cubical in the number of simplices (see, e.g. Section 5.3.1 in Otter et al. 2017 and references therein). Since each $\hat{{\mathcal {K}}}_a$ starts from the empty simplicial complex and ends at the full simplex $\varDelta ^{m-1}$, we would need to go through all faces of $\varDelta ^{m-1}$. However, since we limit the computation only in dimension $0, 1,\ldots , d_{{\mathrm{up}}}$, where $d_{{\mathrm{up}}}\le m-1$ is pre-set, we only need to consider the $(\min \{d_{{\mathrm{up}}}+1, m-1\})$-skeleton of $\varDelta ^{m-1}$. Therefore, for our algorithm, the number of faces in the 1-dimensional filtration is

$$\begin{aligned} \sum _{k = 0}^{\min \{d_{{\mathrm{up}}}+2, m\}} \left( {\begin{array}{c}m\\ k\end{array}}\right) \end{aligned}$$

which is $O(m^{d_{{\mathrm{up}}}+2})$. Since there are n such $a\in [n]$, the worst case complexity of computing $\{L_k(M)\}_{k = 0}^{d_{\mathrm{up}}}$ is $O(n\cdot m^{3(d_{{\mathrm{up}}}+2)}) = O(n\cdot m^{3d_{{\mathrm{up}}}+6})$, which is of degree $3d_{{\mathrm{up}}}+6$ in m but only linear in n.

Since the algorithm is linear in n, even in the case when n is large, as long as m is not too large, the algorithm is still tractable. Moreover, to use the full power of Theorem 3.8, we would want n to be large. In the case when n is large, we may subsample the points (i.e. the columns) to see how large the variance of $L_k(M_n)$ is; this is called bootstrap in statistics. Moreover, we can implement the subsampling for different numbers of columns and get the convergence trend.

On the other hand, to infer the dimension $d{({\mathcal {F}},P_K)}$, we will need at least $m\ge d{({\mathcal {F}},P_K)}+1$. Thus, we want m to be not too small. However, since the computational complexity of $L_k(M_n)$ goes up in high degree order with respect to m, we cannot have m being too large. In the case when m is too large, we can overcome the computational difficulty by subsampling the functions (i.e. the rows); namely, pick randomly $m_s$, say $m_s = 10$, functions, which correspond to their respective $m_s$ rows of $M_n$ and compute the $L_k$ of the submatrix thus formed; repeat this process many times and see how the result is distributed.

We elaborate on these two methods (subsampling points or functions) in the following two subsections. We also implement the methods for estimating the embedding dimension in their appropriate situations, plot the results and give some principles for decision making (i.e. deciding, given the the plot and k, whether we accept $L_k{({\mathcal {F}},P_K)}>0$ or not).

3.5.1 Subsample points when n is sufficiently large

In the case when n is sufficiently large, say $n\ge 300$, we are allowed to subsample, say $n_s$, points (i.e. columns of $M_n$) and obtain the variance information. Moreover, letting $n_s$ go up, we can obtain further how the trend of convergence goes, which, by Theorem 3.8, should converge to the true $L_k{({\mathcal {F}},P_K)}$. The technique of subsampling is called bootstrap in statistics.

Figure 5 is the boxplots^{Footnote 12} of $L_k(M_{n_s})$ obtained by implementing this idea under different settings of (d, m, n), where $d=d{({\mathcal {F}},P_K)}$ is the dimension of ${({\mathcal {F}},P_K)}$, m is the number of functions and n is the number of data points. Here, we choose (m, n) to be (10, 350) throughout, where m is moderate for computation and n is sufficiently large for subsampling. Subsampling is repeated 100 times for each boxplot. To compare with the result of a purely random matrix, we also generate a $10\times 350$ matrix whose entries are iid from $\mathrm {Unif}(0,1)$ and compute its $L_k$’s. The details of how the boxplots are generated are in the caption of Fig. 5.

Let us elaborate a little more on Fig. 5. The decision principle we propose to follow is that,

on each boxplot of $L_k(M_{n_s})$, if the first quartile^{Footnote 13}Q1 is not greater than 0, reject $L_k{({\mathcal {F}},P_K)}>0$; otherwise, accept $L_k{({\mathcal {F}},P_K)}>0$.

For panel (a) where $d = 3$, we can see that, as $n_s$ goes up, the variance of $L_k(M_{n_s})$ for each k goes down. For $k = 2$, the first quartile of $L_k(M_{n_s})$ is greater than 0 even for $n_s = 50$; for $k\ge 3$, $L_k(M_n)$ stays at 0 with only some noise-like dots all the time. According to this principle, we can conclude $d\ge 3$ for this regular pair. In fact, as we know in advance, $d = 3$.

For panel (b) where $d = 4$, the same shrinking variance behavior can be observed. Moreover, the principle concludes $d\ge 4$ after $n_s = 150$, where the first quartile starts to stay away from 0. Similarly, for panel (c) where $d = 5$, in $n_s = 50$ and $n_s = 150$, our principle concludes $d\ge 4$ and in $n_s = 200$, it concludes $d\ge 5$.

It is observed that, for higher d, we would need $n_s$ to be larger to make the best conclusion (i.e. inferring the true dimension). However, by making $n_s$ go up, the variance of $L_k(M_{n_s})$ goes down and we may also use this information. Therefore, for small sample case, one may count on this convergence behavior and develop other principles by quantifying the trend of convergence. For example, in panel (c) where $d = 5$, when $n_s$ goes up from 50 to 150, we observe that $L_4(M_{n_s})$ pokes out from noiselike outliers to a filled box. This trend suggests that we “may accept" $L_4{({\mathcal {F}},P_K)}>0$. We will leave it to the practitioners to decide their own principles on how to use the convergence trend information in their fields of interest.

3.5.2 Subsample functions when m is large

As we mentioned earlier, the worst case computational complexity of $L_k(M_n)$ goes up although polynomially but with degree $3 d_{\mathrm{up}}+6$ (high degree) in m, the number of rows of $M_n$. To overcome this difficulty, we propose to subsample the rows (i.e. the collection of functions) of $M_n$. Specifically, for a fixed number $m_s<m$, we randomly choose $m_s$ rows of $M_n$ and construct the $m_s\times n$ submatrix $M_{m_s\times n}$ accordingly, compute $L_k(M_{m_s\times n})$ and repeat the process as many times as assigned. Figure 6 is the boxplot of $L_k(M_{m_s\times n})$ with $m_s = 10$, repeated $N_{{\mathrm{rep}}} = 1000$ times, under different settings. Notice that throughout the plots, $m = 100$, $n = 150$, $m_s = 10$ and $N_{{\mathrm{rep}}} = 1000$. We still adopt the principle as last subsection that we only accept $L_k{({\mathcal {F}},P_K)}>0$ when the first quartile Q1 is above 0. Therefore, the concluding lower bounds for the plots are 2, 3, 4 and 4, resp., for panel (a), (b), (c) and (d) in Fig. 6.

A lower bound for $d{({\mathcal {F}},P_K)}$ may not be very satisfactory. In Sects. 4 and 5, we develop some theory and methods to decide whether the lower bound obtained in this section is indeed the dimension $d{({\mathcal {F}},P_K)}$.

4 Complete regular pairs and ${\hat{d}}_{\mathrm{low}}(M,\varepsilon )$ as an asymptotically consistent dimension estimator

We establish in Sect. 3 that a lower bound $d_{\mathrm{low}}{({\mathcal {F}},P_K)}$ of $d{({\mathcal {F}},P_K)}$ is generally inferable from sampled data. Here we provide a sufficient condition for $d_{\mathrm{low}}{({\mathcal {F}},P_K)}= d{({\mathcal {F}},P_K)}$; this ensures that the dimension $d{({\mathcal {F}},P_K)}$ can be inferred with high probability. Recall that the conic hull of a set $S\subseteq {\mathbb {R}}^d$, denoted ${\mathbf {cone}}({{\varvec{S}}})$, is the set

$$\begin{aligned} {\mathrm{cone}}(S){\mathop {=}\limits ^{\mathrm {def}}}\left\{ \sum _{i=1}^k c_iv_i \, \, \vert \,\, c_1,\ldots , c_k\ge 0,\, \, v_1,\ldots , v_k\in S, \, \, k\in {\mathbb {N}}\right\} . \end{aligned}$$

(20)

Definition 4.1

Let ${({\mathcal {F}},P_K)}$ be a regular pair, where ${\mathcal {F}} = \{f_i\}_{i\in [m]} $ and each $f_i:K\rightarrow {\mathbb {R}}$ is differentiable. The set

$$\begin{aligned} {\mathrm{Cent}}_1{\mathop {=}\limits ^{\mathrm {def}}}\left\{ x\in K\, \, \vert \,\, {\mathrm{cone}}\left( \{\nabla f_1(x), \dots , \nabla f_m(x)\}\right) = {\mathbb {R}}^d \right\} , \end{aligned}$$

is called the type 1 central region of $({\mathcal {F}}, P_K)$.

Figure 7 illustrates two possibilities: (a) $x\in {\mathrm{Cent}}_1$, and (b) $x\notin {\mathrm{Cent}}_1$. Since the gradient of a function f at a point x is the outward pointing normal direction to the sublevel set $f^{-1}(-\infty ,x)$, every point $x\in {\mathrm{Cent}}_1$ is “enclosed” from every direction by the union of the sublevel sets $\{f_i^{-1}(-\infty ,x)\}_{i\in [m]}$ (panel (a) of Fig. 7). It is therefore intuitive to see that at every point $x\in {\mathrm{Cent}}_1$, the filtration ${\mathcal {K}}_x(t)$ (see Fig. 4 on page 13) results in the persistence of a (homological) $(d-1)$-sphere. Therefore, for $x\in {\mathrm{Cent}}_1$, $l_{\max }(d-1,{\mathcal {K}}_x)$ (and hence $L_{d-1}{({\mathcal {F}},P_K)}$) is expected to be positive and the lower bound in Lemma 3.6 yields the dimension $d{({\mathcal {F}},P_K)}$. It is also easy to visualize why there is no persistent $(d-1)$-sphere in ${\mathcal {K}}_x(t) $ if $x\notin {\mathrm{Cent}}_1$, see panel (b) of Fig. 7. This motivates the following definition.

Definition 4.2

A regular pair ${({\mathcal {F}},P_K)}$ is said to be complete if its type 1 central region $ {\mathrm{Cent}}_1$ is non-empty.

The following theorem makes the above intuition more precise.

Theorem 4.3

Let ${({\mathcal {F}},P_K)}$ be a regular pair, where ${\mathcal {F}} = \{f_i\}_{i\in [m]} $ and each $f_i:K\rightarrow {\mathbb {R}}$ is differentiable. If ${({\mathcal {F}},P_K)}$ is complete, then the lower bound in Lemma 3.6 is indeed the dimension of the regular pair, i.e. $d_{\mathrm{low}}{({\mathcal {F}},P_K)}= d{({\mathcal {F}},P_K)}$.

The proof of Theorem 4.3 is given in Sect. 4.1. Notably, if the regular pair is not complete (i.e. none of the points illustrated in panel (a) of Fig. 7 exists), then the desired persistent $(d-1)$-sphere cannot be formed in a small neighborhood of any point in K and one cannot be certain about whether $L_{d-1}{({\mathcal {F}},P_K)}$ is positive and, equivalently, whether $d_{\mathrm{low}}{({\mathcal {F}},P_K)}=d{({\mathcal {F}},P_K)}$. Immediate from Theorem 4.3 and Corollary 3.10 is the following

Theorem 4.4

Let ${({\mathcal {F}},P_K)}$ be a regular pair satisfying the conditions in Theorem 3.8, where ${\mathcal {F}} = \{f_i\}_{i\in [m]} $ and each $f_i:K\rightarrow {\mathbb {R}}$ is differentiable. If ${({\mathcal {F}},P_K)}$ is complete regular pair with dimension $d=d{({\mathcal {F}},P_K)}$, and matrices $M_n\in {\mathcal {M}}_{m,n}^o $ are sampled from ${({\mathcal {F}},P_K)}$, then for every $\varepsilon \in \left( 0,L_{d-1}{({\mathcal {F}},P_K)}\right) $,

$$\begin{aligned} \lim _{n\rightarrow \infty } \Pr \left[ {\hat{d}}_{{\mathrm{low}}}(M_n,\varepsilon ) = d{({\mathcal {F}},P_K)}\right] = 1. \end{aligned}$$

(21)

In other words, ${\hat{d}}(\varepsilon ):{\mathcal {M}}_{m,n}^o\rightarrow {\mathbb {N}}$ defined by ${\hat{d}}(\varepsilon )(M_n){\mathop {=}\limits ^{\mathrm {def}}}{\hat{d}}_{{\mathrm{low}}}(M_n,\varepsilon )$ is an asymptotically consistent estimator in the class of complete regular pairs.

Proof

By Theorem 4.3, $d_{\mathrm{low}}{({\mathcal {F}},P_K)}= d{({\mathcal {F}},P_K)}$. Moreover, by Corollay 3.10,

$$\begin{aligned} \lim _{n\rightarrow \infty } \Pr \left[ {\hat{d}}_{{\mathrm{low}}}(M_n,\varepsilon ) = d_{{\mathrm{low}}}{({\mathcal {F}},P_K)}\right] = 1. \end{aligned}$$

Thus, the result follows. $\square $

4.1 Proof of Theorem 4.3

Recall, the following notation from Sect. 3. Let ${({\mathcal {F}},P_K)}= (\{f_i\}_{i\in [m]},P_K)$ be a regular pair; for any $t\in [0,1]$, we denote $K^{(i)}(t) {\mathop {=}\limits ^{\mathrm {def}}}f_i^{-1}(-\infty ,\lambda _i(t))$, where $\lambda _i(t) $ is a monotone-increasing function that satisfies $P_K(f_i^{-1}(-\infty ,\lambda _i(t))) = t$. For any $x\in K$, we also denote

$$\begin{aligned} T_i(x) {\mathop {=}\limits ^{\mathrm {def}}}P_K(f_i^{-1}(-\infty ,f_i(x))). \end{aligned}$$

Theorem 4.3 follows from the following key lemma.

Lemma 4.5

Let $(\{f_i\}_{i=1}^m, P_K)$ be a complete regular pair. Suppose $x_0\in {\mathrm{Cent}}_1$, then there exists $\varepsilon >0$ such that

$$\begin{aligned} {\mathrm{nerve}}\left( \left\{ K^{(i)}(T_i(x_0)-t)\right\} _{i\in [m]}\right) \sim S^{d-1},\ \forall \ t\in [0, \varepsilon ). \end{aligned}$$

(22)

Proof of Theorem 4.3

By Lemma 3.6, $d_{{\text {low}}}{({\mathcal {F}},P_K)}\le d{({\mathcal {F}},P_K)}$. We therefore only need to prove $L_{d-1}({\mathcal {F}},P_K)>0$. Let $x_0\in {\mathrm{Cent}}_1$. By Lemma 4.5, $l_{\max }(d-1,x_0)>0$ (see Definition 3.5). Therefore, $L_{d-1}({\mathcal {F}},P_K)>0$, completing the proof. $\square $

To prove Lemma 4.5, we need

Lemma 4.6

For $t_1,\ldots , t_m\in (0,1)$ and $\sigma \subseteq [m]$, if $\bigcap _{i\in [m]} K^{(i)}(t_i)\ne \varnothing $, then there exists $\epsilon >0$ such that $\bigcap _{i\in \sigma } K^{(i)}(t_i-\epsilon )\ne \varnothing $. In addition, by monotonicity of $K^{(i)}(t)$, we also have $\bigcap _{i\in \sigma } K^{(i)}(t_i-\eta )\ne \varnothing $, for all $0<\eta \le \epsilon $.

Proof

Let $\epsilon _n$ be a sequence with $\epsilon _n\searrow 0$. Let us first prove that $K^{(i)}(t_i-\epsilon _n)\nearrow K^{(i)}(t_i)$; equivalently, $K^{(i)}(t_i) = \bigcup _{n=1}^\infty K^{(i)}(t_i-\epsilon _n)$. For any n, $K^{(i)}(t_i-\epsilon _n)\subseteq K^{(i)}(t_i)$ by definition. Therefore, $\bigcup _{n=1}^\infty K^{(i)}(t_i-\epsilon _n)\subseteq K^{(i)}(t_i)$. For the other inclusion, assume $x\in K^{(i)}(t_i)=f^{-1}(-\infty ,\lambda _i(t_i))$. Then $f_i(x)<\lambda _i(t_i)$. By Lemma 2.5, $\lambda _i$ is continuous and strictly increasing. Hence, there exists n such that $\lambda _i(t_i-\epsilon _n)>f_i(x)$; in other words, $x\in K^{(i)}(t_i-\epsilon _n)$. Therefore, $x\in \bigcup _{n=1}^\infty K^{(i)}(t_i-\epsilon _n)$, proving the claim.

Now we have, as $n\nearrow \infty $, $K^{(i)}(t_i-\epsilon _n)\nearrow K^{(i)}(t_i)$. Thus $\bigcap _{i\in \sigma } K^{(i)}(t_i-\epsilon _n)\nearrow \bigcap _{i\in \sigma } K^{(i)}(t_i) $. Since $\bigcap _{i\in \sigma } K^{(i)}(t_i)\ne \varnothing $, there must exist n such that $\bigcap _{i\in \sigma } K^{(i)}(t_i-\epsilon _n)\ne \varnothing $. Taking $\epsilon = \epsilon _n$, the result follows. $\square $

Proof of Lemma 4.5

For each $i\in [m]$, we denote by

$$\begin{aligned} \varOmega _i = K^{(i)}(T_i(x_0))=\left\{ x \in K \, \vert \, f(x)<f(x_0) \right\} \end{aligned}$$

the appropriate open convex sublevel set of $f_i$. Note that $ K^{(i)}(T_i(x_0)-t)\subseteq \varOmega _i$ for any $t \ge 0$, and the nerve in the left-hand-side of (22) is a subcomplex of the ${\mathrm{nerve}}\left( \left\{ \varOmega _i\right\} _{i\in [m]}\right) $.

For each non-empty $\sigma \in {\mathrm{nerve}}\left( \left\{ \varOmega _i\right\} _{i\in [m]}\right) $, the subset $\bigcap _{i\in \sigma }\varOmega _i $ is open and non-empty and hence has a nonzero $P_K$ measure. Thus, by Lemma 4.6, there exists $\varepsilon _\sigma >0$, such that $ \bigcap _{i\in \sigma } K^{(i)}(T_i(x_0)-t) $ is non-empty for any $t\in [0, \varepsilon _\sigma )$. Choosing $\varepsilon $ to be the minimum of all such $\varepsilon _\sigma $ thus guarantees that

$$\begin{aligned} {\mathrm{nerve}}\left( \left\{ K^{(i)}(T_i(x_0)-t)\right\} _{i\in [m]}\right) = {\mathrm{nerve}}\left( \left\{ \varOmega _i\right\} _{i\in [m]}\right) \ \forall \ t\in [0, \varepsilon ). \end{aligned}$$

It thus suffices to prove (22) for $t=0$. Since each $\varOmega _i$ is open and convex, by the nerve lemma^{Footnote 14} it is enough to show that $\bigcup _{i\in [m]} \varOmega _i \sim S^{d-1}$. Moreover, since $x_0$ lies on the boundary of each $\varOmega _i $, the union $ \{ x_0\} \cup \bigcup _{i\in [m]} \varOmega _i $ is star-shaped. Therefore, it suffices to prove that there exists $\eta >0$ such that

$$\begin{aligned} B_\eta (x_0)\setminus \{x_0\}\subseteq \bigcup _{i\in [m]} \varOmega _i, \end{aligned}$$

(23)

where $B_\eta (x_0) = \{x\in {\mathbb {R}}^d: \Vert x-x_0\Vert <\eta \}$.

Suppose no such $\eta >0$ exists, then, for all $n\in {\mathbb {N}}$, there exists a unit vector $v_n\in S^{d-1}=\{ x\in {\mathbb {R}}^d \, , \, \Vert {x} \Vert =1\}$ such that $x_n=x_0+\frac{1}{n}\cdot v_n\notin \bigcup _{i\in [m]} \varOmega _i$. By compactness of $S^{d-1}$, there is an infinite subsequence $\{ v_{n_j}\}$, that converges to a particular $v_*\in S^{d-1}$. Since all $f_i$ are differentiable, using Taylor’s theorem, we obtain

$$\begin{aligned} n_j \left( {f_i(x_n) -f_i(x_0)} \right) =\langle \nabla f_i(x_0), v_{n_j}\rangle +O(1/n_j ). \end{aligned}$$

(24)

Since $x_n \notin \varOmega _i $ for all i, $f_i(x_n)\ge f_i(x_0)$. Taking $\liminf _{j\rightarrow \infty }$ on both sides of Eq. (24), we conclude that $ \langle \nabla f_i(x_0), v_*\rangle \ge 0, \ \forall \ i\in [m] $. Since $-v_*\in {\mathrm{cone}}(\{\nabla f_i(x_0)\}_{i\in [m]}) = {\mathbb {R}}^d$, choosing appropriate nonnegative coefficients in (20) yields $0\le \langle -v_*, v_*\rangle $ and hence $v_* = 0$, a contradiction. Therefore the inclusion (25) holds for some $\eta >0$. $\square $

5 Testing the completeness of ${({\mathcal {F}},P_K)}$ from sampled data

Theorem 4.3 establishes that completeness of ${({\mathcal {F}},P_K)}$ implies $d_{\mathrm{low}}{({\mathcal {F}},P_K)}= d{({\mathcal {F}},P_K)}$, and thus the data dimension $d{({\mathcal {F}},P_K)}$ can be inferred from sampled data. Unfortunately, completeness cannot be directly tested from sampled data, since the gradient information is not directly accessible from discrete samples.

To address this difficulty, we consider a different notion of central region, ${\mathrm{Cent}}_0\subseteq K$, which, under a natural assumption (Definition 5.2 below), is indistinguishable from ${\mathrm{Cent}}_1$ in the probability measure $P_K$ (Lemma 5.3). We also establish that the probability measure of ${\mathrm{Cent}}_0$ can be approximated from sampled data (Theorem 5.6). This thus makes it possible to test completeness of a regular pair from (discrete) sampled data.

Definition 5.1

Let $({\mathcal {F}}, P_K) $ be a regular pair, the subset

$$\begin{aligned} {\mathrm{Cent}}_0{\mathop {=}\limits ^{\mathrm {def}}}\left\{ x\in K\, \, \vert \, \bigcap _{i\in [m]} f_i^{-1}(-\infty ,f_i(x)) = \varnothing \right\} , \end{aligned}$$

is called the type 0 central region of $({\mathcal {F}}, P_K)$.

A point x belongs to ${\mathrm{Cent}}_0$ if the appropriate open sublevel sets of $\{f_i\}$ have an empty intersection. If the functions $f_i$ are differentiable and quasi-convex, a sublevel set must be a subset of the open half-space:

$$\begin{aligned} f_i^{-1}(-\infty ,f_i(x))\subseteq H_i^- {\mathop {=}\limits ^{\mathrm {def}}}\{v+x:\langle v,\nabla f_i(x)\rangle <0\}. \end{aligned}$$

(25)

If $x \in {\mathrm{Cent}}_1$, then $\{\nabla f_i(x)\}_{i\in [m]}$ positively span the whole Euclidean space; hence, the open half-spaces $H_i^-$, must have empty intersection: $ \cap _{i=1}^n H_i^-=\varnothing $. Thus because of the inclusion (25), we conclude $x\in {\mathrm{Cent}}_0$. We thus established the inclusion

$$\begin{aligned} {\mathrm{Cent}}_1\subseteq {\mathrm{Cent}}_0. \end{aligned}$$

Note that the reverse direction of the above containment is not necessary true. For example, if for some $x\in K$, $\{\nabla f_i(x)\}_{i\in [m]}$ positively spans a proper linear subspace^{Footnote 15} of ${\mathbb {R}}^d$, then $x\in {\mathrm{Cent}}_0$ but $x\notin {\mathrm{Cent}}_1$. A “regular” random choice of m vectors in ${\mathbb {R}}^d$ is improbable (albeit possible) to exactly positively span a proper linear subspace, motivating the following

Definition 5.2

A set of vectors $V = \{v_1,\ldots , v_m\}\subseteq {\mathbb {R}}^d$ is said to be in general direction if, for every $\sigma \subseteq [m]$ with $|\sigma |\le d$, the set of vectors $\{v_i\}_{i\in \sigma }$ is linearly independent. A collection of differentiable functions ${\mathcal {F}} = \{f_i:K\rightarrow {\mathbb {R}}\}_{i\in [m]}$ is said to be in general position if for (Lebesgue) almost every x in K, the vectors $\{\nabla f_i(x)\}_{i\in [m]}$ are in general direction.

In other words, functions in general position are those functions, whose gradients are improbable to positively span a proper subspace. It turns out that assuming that the functions are in general position serves our purpose of replacing ${\mathrm{Cent}}_1$ with ${\mathrm{Cent}}_0$, as described in the following

Lemma 5.3

Let $({\mathcal {F}}, P_K)$ be a regular pair, where each function in $ {\mathcal {F}}$ is differentiable. Assume that ${\mathcal {F}}$ is in general position, then

$$\begin{aligned} P_K({\mathrm{Cent}}_1\setminus {\mathrm{Cent}}_0) =P_K({\mathrm{Cent}}_0\setminus {\mathrm{Cent}}_1) = 0. \end{aligned}$$

(26)

The proof of Lemma 5.3 is given in Sect. 5.1. In the next lemma, we prove that ${\mathrm{Cent}}_1$ is always an open subset of K; hence, completeness of a regular pair ${({\mathcal {F}},P_K)}$ is equivalent to $P_K({\mathrm{Cent}}_1)>0$. Consequently, Lemma 5.3 ensures that completeness of a regular pair in general position is equivalent to $P_K({\mathrm{Cent}}_0)>0$.

Lemma 5.4

Let $\{f_i:K\rightarrow {\mathbb {R}}\}_{i\in [m]}$ be a collection of quasi-convex $C^1$ functions, where $K\subseteq {\mathbb {R}}^d$ is open and convex. Then the set ${\mathrm{Cent}}_1=\big \{x\in K: {\mathrm{cone}}(\{\nabla f_i(x)\}_{i\in [m]})={\mathbb {R}}^d\big \}$ is open. In particular, ${\mathrm{Cent}}_1\ne \varnothing $ is equivalent to $P_K({\mathrm{Cent}}_1)>0$.

Proof

Define a function $h:K\times S^{d-1}\rightarrow {\mathbb {R}}$ by $h(x,u) = \max _{i\in [m]} \langle u,\nabla f_i(x)\rangle $. Since each $f_i$ is $C^1$, the functions $(x,u)\mapsto \langle u,\nabla f_i(x)\rangle $ are continuous and hence h is also continuous. For $x\in K$, we define $\rho (x)=\min _{u\in S^{d-1}} h(x,u)$. Let us prove that, for $x\in K$, $\rho (x)>0$ if and only if ${\mathrm{cone}}(\{\nabla f_i(x)\}_{i\in [m]}) = {\mathbb {R}}^d$.

For one direction, let $x\in K$ satisfy $\rho (x)>0$, or, equivalently, $\max _{i\in [m]} \langle u,\nabla f_i(x)\rangle >0$, for all $u\in S^{d-1}$. If $0\in {\mathrm{bd}}({\mathrm{conv}}(\{\nabla f_i(x)\}_{i\in [m]}))$, then the nonzero vector v pointing outward of ${\mathrm{conv}}(\{\nabla f_i(x)\}_{i\in [m]})$ and orthogonal to the hyperface containing 0 will make $\max _{i\in [m]} \langle v,\nabla f_i(x)\rangle = 0$, a contradiction. If $0\notin {\mathrm{conv}}(\{\nabla f_i(x)\}_{i\in [m]})$, then taking $v = -{\mathrm{argmin}}_{z\in {\mathrm{conv}}(\{\nabla f_i(x)\}_{i\in [m]})} \langle z, z\rangle $ will make $\max _{i\in [m]} \langle v,\nabla f_i(x)\rangle \!<\!0$, also a contradiction. Therefore, $0\!\in \!{\mathrm{int}}({\mathrm{conv}}(\{\nabla f_i(x)\}_{i\in [m]}))$ and hence ${\mathrm{cone}}(\{\nabla f_i(x)\}_{i\in [m]}) = {\mathbb {R}}^d$. For the other direction, let $x\in K$ satisfy ${\mathrm{cone}}(\{\nabla f_i(x)\}_{i\in [m]}) = {\mathbb {R}}^d$. To prove $\rho (x)>0$, since $u\mapsto \max _{i\in [m]} \langle u,\nabla f_i(x)\rangle $ is continuous and $S^{d-1}$ is compact, it suffices to prove that, for all $u\in S^{d-1}$, $\max _{i\in [m]} \langle u, \nabla f_i(x)\rangle >0$. Given $u\in S^{d-1}$, since ${\mathrm{cone}}(\{\nabla f_i(x)\}_{i\in [m]}) = {\mathbb {R}}^d$, $u = \sum _{i\in [m]} r_i\cdot \nabla f_i(x)$, for some $r_i\ge 0$. If $\langle u,\nabla f_i(x)\rangle \le 0$ for all $i\in [m]$, then $\langle u,u\rangle = \sum _{i\in [m]} r_i\langle u,\nabla f_i(x)\rangle \le 0$, a contradiction. Thus, the other direction is proved and, for $x\in K$, $\rho (x)>0$ if and only if ${\mathrm{cone}}(\{\nabla f_i(x)\}_{i\in [m]}) = {\mathbb {R}}^d$.

Now let $x_0\in K$ such that ${\mathrm{cone}}(\{\nabla f_i(x_0)\}_{i\in [m]})={\mathbb {R}}^d$. By what has been claimed, this is equivalent to $\rho (x_0)>0$. We want to prove that there exists $\epsilon >0$ such that for all $x\in B(x_0,\epsilon )$, ${\mathrm{cone}}(\{\nabla f_i(x)\}_{i\in [m]})={\mathbb {R}}^d$, or equivalently, $\rho (x)>0$. Suppose not, then there exists a sequence $(x_n, u_n)\in K\times S^{d-1}$ such that $x_n\rightarrow x_0$ and $h(x_n, u_n)\le 0$ for all n. By compactness of $S^{d-1}$, there is a subsequence $u_{n_j}\rightarrow u_0$ and thus by continuity of h, $h(x_0, u_0)\le 0$. However, $h(x_0, u_0)\ge \min _{u\in S^{d-1}} h(x_0,u) = \rho (x_0)>0$, a contradiction. Thus the proof is complete. $\square $

We have now established that for a regular pair in general position, completeness is equivalent to $P_K({\mathrm{Cent}}_0)>0$. A natural question arises: assuming that the considered regular pair is in general position, can one know/test whether $P_K({\mathrm{Cent}}_0)$ is positive or not, solely from the data matrix M? To answer the question, we propose the following natural discretization of ${\mathrm{Cent}}_0$.

Definition 5.5

For a matrix $M\in {\mathcal {M}}_{m,n}^o$, the set

$$\begin{aligned} {\widehat{{\mathrm{Cent}}}}_0(M) {\mathop {=}\limits ^{\mathrm {def}}}\left\{ a\in [n]:\bigcap _{i\in [m]}\left\{ b\in [n]: M_{ib} <M_{i a}\right\} = \varnothing \right\} \end{aligned}$$

is called the discretized central region.

If a matrix $M\in {\mathcal {M}}_{m,n}^o$ is sampled from a regular pair, then for each $a\in [n]$, the set $ \{b\in [n]: M_{ib} <M_{i a} \} $ is a discretization of $f_i^{-1}(-\infty ,f_i(x_a))$, and ${\widehat{{\mathrm{Cent}}}}_0(M)$ can be thought of as an approximation of ${\mathrm{Cent}}_0$. The following theorem confirms this intuition.

Theorem 5.6

Let $M_n\in {\mathcal {M}}_{m,n}^o$ be sampled from a regular pair, then $\frac{1}{n}{\#({\widehat{{\mathrm{Cent}}}}_0(M_n))} $ converges to $P_K({\mathrm{Cent}}_0)$ in probability:

$$\begin{aligned} \text { for all } \epsilon>0,\quad \lim _{n\rightarrow \infty } \Pr \left[ \left| \frac{1}{n}{\#({\widehat{{\mathrm{Cent}}}}_0(M_n))}-P_K({\mathrm{Cent}}_0)\right| >\epsilon \right] = 0. \end{aligned}$$

The proof of Theorem 5.6 involves technicalities used in the proof of Interleaving Convergence Theorem (Theorem 2.10) and is given in Sect. 7.3 in the Appendix. Theorem 5.6 establishes that $\frac{1}{n}{\#({\widehat{{\mathrm{Cent}}}}_0(M_n))}$ serves as an approximation of $P_K({\mathrm{Cent}}_0)$, and thus enables one to test whether $P_K({\mathrm{Cent}}_0)>0$. Thus, by Lemma 5.3, this provides a way to test the completeness of the underlying regular pair ${({\mathcal {F}},P_K)}$.

5.1 Proof of Lemma 5.3

First we prove the first part of Lemma 5.3.

Lemma 5.7

Let $({\mathcal {F}},P_K) = (\{f_i\}_{i\in [m]},P_K)$ be a regular pair, where each function in $ {\mathcal {F}}$ is differentiable. Assume that ${\mathcal {F}}$ is in general position, then $P_K({\mathrm{Cent}}_1\setminus {\mathrm{Cent}}_0)=0$.

Proof

Let $K^\prime =\{ x\in K\, \vert \, \exists \ i, \,\nabla f_i(x)= 0 \} $ denote the union of critical points of functions in ${\mathcal {F}}$. Since ${\mathcal {F}}$ is in general position, $K'$ has Lebesgue measure zero. Assume $x_0\in {\mathrm{Cent}}_1\setminus K^\prime $, and thus

$$\begin{aligned} \forall u\in {\mathbb {R}}^d : \exists \ c_1,\dots ,c_m \ge 0 \text { such that } u=\sum _{i=1}^m c_i \nabla f_i(x_0). \end{aligned}$$

(27)

It can be easily shown, see e.g. Theorem 3.2.3 in Cambini and Martein (2009), that if f is differentiable and quasi-convex on an open convex K with $\nabla f(x_0)\ne 0$, then $f(x)<f(x_0)$ implies $\left\langle \nabla f(x_0),x-x_0\right\rangle <0$. Thus

$$\begin{aligned} \bigcap _{i\in [m]} f_i^{-1}(-\infty ,f_i(x_0))\subseteq \bigcap _{i\in [m]} \left\{ x \,\vert \, \left\langle \nabla f_i(x_0),x-x_0 \right\rangle <0\right\} =\varnothing , \end{aligned}$$

where the last equality follows from (27), as one can chose $u=x-x_0$. This implies $x_0\in {\mathrm{Cent}}_0$. Therefore, ${\mathrm{Cent}}_1\setminus K'\subseteq {\mathrm{Cent}}_0$ and $P_K({\mathrm{Cent}}_1\setminus {\mathrm{Cent}}_0)\le P_K(K') = 0$. $\square $

To prove the second half of Lemma 5.3, we first recall that a convex cone ${\mathcal {C}} \subseteq {\mathbb {R}}^d$ is called flat if there exists $w\ne 0$ such that both $w\in {\mathcal {C}}$ and $-w\in {\mathcal {C}} $. Otherwise, it is called salient. If a convex cone ${\mathcal {C}}$ is closed and salient, then there exists^{Footnote 16}$w\in {\mathbb {R}}^d$ such that $\langle u, w\rangle <0$, for all non-zero $ u\in {\mathcal {C}}$.

Lemma 5.8

Let $({\mathcal {F}},P_K) = (\{f_i\}_{i\in [m]},P_K)$ be a regular pair, where each $f_i$ is differentiable, then

$$\begin{aligned} {\mathrm{Cent}}_0\setminus {\mathrm{Cent}}_1 \subseteq \left\{ x\in K:{\mathrm{cone}}\left( \left\{ \nabla f_i(x)\right\} _{i\in [m]}\right) \text { is flat but not} {\mathbb {R}}^d\right\} . \end{aligned}$$

(28)

Proof

Let $x_0\in {\mathrm{Cent}}_0\setminus {\mathrm{Cent}}_1$. Denote

$$\begin{aligned} {\mathcal {C}}_0 {\mathop {=}\limits ^{\mathrm {def}}}{\mathrm{cone}}\left( \left\{ \nabla f_i(x_0)\right\} _{i\in [m]}\right) . \end{aligned}$$

Since $x_0\notin {\mathrm{Cent}}_1$, ${\mathcal {C}}_0 \ne {\mathbb {R}}^d$. Thus, it suffices to prove that the cone ${\mathcal {C}}_0$ is flat. Suppose that the cone ${\mathcal {C}}_0 $ is not flat, then there exists w such that $\langle u, w\rangle <0$, for all non-zero $ u\in {\mathcal {C}}_0$. In particular, $\langle \nabla f_i(x_0), w\rangle <0,\ \forall \ i\in [m]$. Let us show that $\forall \ i\in [m]$, there exists $\alpha _i>0$ such that $x_0+\alpha _i w\in f_i^{-1}(-\infty ,f_i(x_0))$. Suppose not, then there exists $i\in [m]$, such that $f_i(x_0+\alpha w)\ge f(x_0),\ \forall \ \alpha >0$, and we have

$$\begin{aligned} \langle \nabla f_i(x_0), w\rangle = \liminf _{\alpha \rightarrow 0^+} \frac{f(x_0+\alpha w)-f(x_0)}{\alpha }\ge 0, \end{aligned}$$

which is a contradiction. Thus, such positive $\alpha _i$’s exist, and we obtain that

$$\begin{aligned} x_0+ \left( \min _{i\in [m]} \alpha _i \right) w \in \bigcap _{i\in [m]} f_i^{-1}(-\infty ,f_i(x_0)). \end{aligned}$$

This contradicts the assumption that $x_0\in {\mathrm{Cent}}_0$. Therefore the cone ${\mathcal {C}}_0$ is flat. $\square $

It can be shown that the inclusion in (28) is in fact an equality. However, since we do not need the equality here, it was left out the proof. To finish the proof of Lemma 5.3, we use the following

Lemma 5.9

Let $V = \{v_1,\ldots , v_m\}\subset {\mathbb {R}}^d$ be a set of vectors in general direction, then ${\mathrm{cone}}(V)={\mathbb {R}}^d$ or ${\mathrm{cone}}(V)$ is salient.

To prove Lemma 5.9, we use the following lemma.

Lemma 5.10

(see e.g. Theorem 2.5 in Regis 2016) Let $V = \{v_1,\ldots , v_m\}$ be a set of non-zero vectors in ${\mathbb {R}}^d$, then the following two statements are equivalent:

(i)
${\mathrm{cone}}(V)={\mathrm{span}}(V)$;
(ii)
For each $i\in [m]$, $-v_i\in {\mathrm{cone}}(V\setminus \{v_i\})$.

Proof of Lemma 5.9

For $m\le d$, the vectors $\{v_1,\ldots , v_m\}$ are linearly independent. Suppose there exists $w\in R^d $, such that $w, -w\in {\mathrm{cone}}(V)$. Thus there exist $a_i, b_i\ge 0$ with $ w= \sum _{i=1}^m a_iv_i=-\sum _{i=1}^m b_iv_i$. Since the vectors $\{v_1,\ldots , v_m\}$ are linearly independent, $a_i+b_i=0$ for all $i\in [m]$, and thus $w=0$. Therefore ${\mathrm{cone}}(V)$ is salient.

For $m>d$, we prove by induction on the size of V. Suppose the result holds for any set of $m\ge d$ vectors in general direction. Let $V = \{v_1,\ldots , v_{m+1}\}$ be a set of $m+1$ vectors in general direction. Since any d vectors in V is a basis in ${\mathbb {R}}^d$, ${\mathrm{span}}(V) = {\mathbb {R}}^d$. Suppose the result is false for V; equivalently, ${\mathrm{cone}}(V)\ne {\mathbb {R}}^d$ and ${\mathrm{cone}}(V)$ is flat. By Lemma 5.10, there exists $j\in [m+1]$ such that

$$\begin{aligned} -v_j\notin {\mathrm{cone}}(V\setminus \{v_j\}). \end{aligned}$$

(29)

Since ${\mathrm{cone}}(V)$ is flat, there exists a nonzero $w\in {\mathbb {R}}^d$ such that $w, -w\in {\mathrm{cone}}(V)$, and thus $w= \sum _{i=1}^{m+1} a_iv_i=-\sum _{i=1}^{m+1} b_iv_i $, with $a_i, b_i\ge 0$ for all i. Let us prove that $a_j +b_j>0$. If $a_j+b_j = 0$, then $a_j = b_j = 0$. Thus, $w, -w\in {\mathrm{cone}}(V\setminus \{v_j\})$ and ${\mathrm{cone}}(V\setminus \{v_j\})$ is not salient. Since $|V\setminus \{v_j\}|=m$, by the induction hypothesis, we must have ${\mathrm{cone}}(V\setminus \{v_j\})={\mathbb {R}}^d$. However, ${\mathrm{cone}}(V)\ne {\mathbb {R}}^d$ and hence ${\mathrm{cone}}(V\setminus \{v_j\})\ne {\mathbb {R}}^d$, a contradiction. Therefore $a_j+b_j>0$, and we can conclude that

$$\begin{aligned} -v_j = \sum _{i\in [m+1]\setminus \{j\}} \left( \frac{a_i+b_i}{a_j+b_j}\right) v_i\in {\mathrm{cone}}(V\setminus \{v_j\}), \end{aligned}$$

contradicting to (29). Therefore, the result holds for any V in general direction of size $|V| = m+1$. This completes the proof by induction. $\square $

We now finish the proof of Lemma 5.3.

Proof of Lemma 5.3

The first half of the proof of Lemma 5.3 is done in Lemma 5.7. To prove the second half, we combine Lemmas 5.8 and 5.9 to obtain

$$\begin{aligned} {\mathrm{Cent}}_0\setminus {\mathrm{Cent}}_1\subseteq \left\{ x\in K:\left\{ \nabla f_i\left( x\right) \right\} _{i\in [m]}\text { is not in general direction}\right\} . \end{aligned}$$

(30)

Since $\{f_i\}_{i\in [m]}$ is in general position, the right hand side of (30) has measure zero, completing the proof. $\square $

6 Discussion

Motivated by neuroscience applications we introduced a convex sensing problem that is sufficiently constrained under the natural assumptions of a regular pair. Using the formalism of Dowker complexes, we showed how the relevant combinatorial information of a convex sensing problem can be used to estimate the embedding dimension $d{({\mathcal {F}},P_K)}$ of the data. We established (Theorem 4.4) conditions that guarantee that the estimate ${\hat{d}}_{{\mathrm{low}}}(M_n,\varepsilon ) $ is asymptotically consistent, i.e. converges to the embedding dimension in the limit of large sampling. We also introduced a method for testing the completeness of the regular pair that underlies the discrete data of the convex sensing problem (Theorem 5.6).

These approaches work reasonably well on synthetic data, as illustrated in Sect. 3.5. While these methods already allow for practical applications, one major “usability issue” still remains unresolved. The asymptotic nature of these theorems does not provide a guarantee that the existing finite sampling of the data is sufficiently large to produce an accurate enough estimate. In other words, practical applications of these results will greatly benefit from knowing the rate of convergence for the appropriate convergence theorems.

Our results can be thought as the very first step in a non-linear dimensionality reduction in the context of convex sensing data. Once the Euclidean embedding dimension $d=d{({\mathcal {F}},P_K)}$ is correctly inferred, one might also ask the question of finding the actual point cloud $X_n\subset {\mathbb {R}}^d $ as well as quasi-convex functions $f_i:{\mathbb {R}}^d\rightarrow {\mathbb {R}}$. Intuitively, the combinatorial information in the sequences S(M) yields all the convex hull information, allowed by the problem. It thus seems possible that the construction of the geometric realization may be similar to that used in the proof of Lemma 1.3. We believe that “convex sensing” is a promising approach for non-linear dimensionality reduction in scientific applications.

Notes

A function f(x) is quasi-concave if its negative, $-f(x)$, is quasi-convex.
A function $\phi :{\mathbb {R}} \rightarrow {\mathbb {R}}$ is strictly monotone-increasing if $\phi (y)>\phi (x)$ whenever $y>x$.
We introduce regular pairs in Definition 1.6 for the regularity of the functions and how the point cloud $X_n$ and data matrix are sampled from a regular pair in Definition 1.7.
A more accurate accurate notation is $S_n[[n]]$, but here we use the less cumbersome $S_n[n]$.
See, e.g., Example 3.16 in Boyd and Vandenberghe (2004).
See, e.g., Section 3.2.1 in Boyd and Vandenberghe (2004).
i.e. of length n.
Recall that a sequence $(\ell _n)_n\subset {\mathbb {R}}$ goes up to $\ell \in {\mathbb {R}}$, denoted $\ell _n\nearrow \ell $, if $\ell _n\le \ell _{n+1}$, for all n, and $\lim _{n\rightarrow \infty } \ell _n = \ell $. $\ell _n\searrow \ell $ is similarly defined.
Recall that a sequence of sets $(A_n)_n$ goes up to a set A, denoted $A_n\nearrow A$, if $A_n\subseteq A_{n+1}$, for all n, and $\bigcup _{n} A_n = A$. $A_n\searrow A$ is similarly defined.
The only difference is that the authors in Bobrowski et al. (2015) measures the maximal cycle multiplicatively while we measure it additively.
i.e. for all $\epsilon >0$, $\lim _{n\rightarrow \infty } \Pr \left[ \sup _{k\ge 0}\left| L_k(M_n)-L_k({\mathcal {F}},P_K)\right| <\epsilon \right] =1$.
The boxplot of a collection of real numbers is a box together with a upper whisker and a lower whisker attached to the top and bottom of the box and possibly some dots on top of the upper whisker or below the lower whisker. From the box part, one can read out the first quartile Q1 (25th percentile), medium Q2 (50th percentile) and third quartile Q3 (75th percentile) which are the bottom end, line in-between, and the top end of the box. The value $Q3-Q1$ is called the interquartile range (IQR). The lower whisker and upper whisker, resp., label the values $Q1-1.5IQR$ and $Q3+1.5IQR$, resp. Values outside of the whiskers are regarded as outliers and labelled by dots.
i.e. the 25th percentile.
See, e.g., Theorem 10.7 in Björner (1995) or Corollary 4G.3 in Hatcher (2002).
i.e. a linear subspace whose dimension is strictly less than d
Salient cones are also called pointed cones. It is well-known (see, e.g., Section 2.6.1 in Boyd and Vandenberghe 2004) that if ${\mathcal {C}}\subset {\mathbb {R}}^d$ is a closed salient cone, then its dual cone ${\mathcal {C}}^*{\mathop {=}\limits ^{\mathrm {def}}}\{w\in {\mathbb {R}}^d:\langle w,u\rangle \ge 0,\ \forall \ u\in {\mathcal {C}}\}$ has nonempty interior. Consider $-{\mathcal {C}}^*=\{w\in {\mathbb {R}}^d:\langle w,u\rangle \le 0,\ \forall \ u\in {\mathcal {C}}\}=\{w\in {\mathbb {R}}^d:\langle w,u\rangle < 0,\ \forall \ u\in {\mathcal {C}}\}\cup \{w\in {\mathbb {R}}^d:\langle w,u\rangle = 0,\ \forall \ u\in {\mathcal {C}}\}$. If ${\mathcal {C}}$ is closed and salient, then $-{\mathcal {C}}^*$ has nonempty interior. Note that, if $d>0$, then $\{w\in {\mathbb {R}}^d:\langle w,u\rangle = 0,\ \forall \ u\in {\mathcal {C}}\}$ has measure 0 and hence $\{w\in {\mathbb {R}}^d:\langle w,u\rangle < 0,\ \forall \ u\in {\mathcal {C}}\}$ is nonempty and any vector in it satisfies the wanted property.
Glivenko-Cantelli theorem has been generalized in many aspects in different literatures and is closely related to the famous VC (Vapnik–Chervonenkis) theory in theoretical machine learning. See, for example, Chapter 12 of Devroye et al. (1996) for a detailed introduction that connects standard Glivenko-Cantelli theorem and the VC theory.
See Definition 7.3 for the defining formula of $K_n^{(i)}(t_i)$.
See Eq. (33) for the definition of $Z_\infty +\delta $.
See Eq. (35) for the definition of Hausdorff distance.

References

Björner, A.: Handbook of Combinatorics. Chap. Topological Methods, vol. 2, pp. 1819–1872. MIT Press, Cambridge (1995)
MATH Google Scholar
Bobrowski, O., Kahle, M., Skraba, P.: Maximally persistent cycles in random geometric complexes. Ann. Appl. Probab. 2, 7 (2015)
MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization (2004)
Brun, M., Blaser, N.: Sparse dowker nerves. J. Appl. Comput. Topol. 3(1), 1–28 (2019)
MathSciNet MATH Google Scholar
Cambini, A., Martein, L.: Generalized Convexity and Optimization: Theory and Applications (2009)
Chazal, F., de Silva, V., Oudot, S.: Persistence stability for geometric complexes. Geom. Dedicata. 173, 193–214 (2014)
Article MathSciNet Google Scholar
Chowdhury, S., Mémoli, F.: A functorial dowker theorem and persistent homology of asymmetric networks. J. Appl. Comput. Topol. 2(1), 115–175 (2018)
Article MathSciNet Google Scholar
Crawley-Boevey, W.: Decomposition of pointwise finite-dimensional persistence modules. J. Algebra Appl. 14(05), 1550066 (2015)
Article MathSciNet Google Scholar
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition (1996)
Dowker, C.H.: Homology groups of relations. Ann. Math. 56, 84 (1952)
Article MathSciNet Google Scholar
Ghrist, R.: Elementary Applied Topology, 1.0 edn. Createspace (2014)
Hatcher, A.: Algebraic Topology. Cambridge University Press, Cambridge (2002)
MATH Google Scholar
Lesnick, M.: The theory of the interleaving distance on multidimensional persistence modules. Found. Comput. Math. 15, 613–650 (2015)
Article MathSciNet Google Scholar
O’Keefe, J., Dostrovsky, J.: The hippocampus as a spatial map. preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175 (1971)
Otter, N., Porter, M.A., Tillmann, U., Grindrod, P., Harrington, H.A.: A roadmap for the computation of persistent homology. EPJ Data Science 6 (2017)
Oudot, S.Y.: Persistence Theory: From Quiver Representations to Data Analysis. American Mathematical Society, Providence, RI (2015)
Book Google Scholar
Regis, R.G.: On the properties of positive spanning sets and positive bases. Optim. Eng. 17, 229–262 (2016)
Article MathSciNet Google Scholar
Yartsev, M.M., Ulanovsky, N.: Representation of three-dimensional space in the hippocampus of flying bats. Science 340, 367–372 (2013)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the NSF IOS-155925 and NSF Next Generation Networks for Neuroscience Program (award 2014217).

Author information

Authors and Affiliations

Department of Mathematics, The Pennsylvania State University, University Park, PA, 16802, USA
Min-Chun Wu & Vladimir Itskov

Authors

Min-Chun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Itskov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir Itskov.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: proofs of the main theorems and supporting lemmas

1.1 Proof of Theorem 2.10, the Interleaving Convergence Theorem

The goal of this subsection is to prove Theorem 2.10, the Interleaving Convergence Theorem. The asymptotic behavior of ${\mathrm{Dow}}(S(M))$ actually follows from the asymptotic behaviors of several building blocks of ${\mathrm{Dow}}(S(M))$. We will first define these building blocks and prove their own asymptotic theorems and then put these asymptotic theorems together to prove the Interleaving Convergence Theorem.

We start with an object that, as will be seen, can be used to express ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}},P_K)$. We recall that, for $t\in [0,1]$ and $i\in [m]$, $K^{(i)}(t){\mathop {=}\limits ^{\mathrm {def}}}f_i^{-1}(-\infty ,\lambda _i(t))$, where $\lambda _i(t)\in {\mathbb {R}}$ satisfies $P_K(f_i^{-1}(-\infty ,\lambda _i(t))) = t$.

Definition 7.1

For a regular pair $({\mathcal {F}}, P_K)$, define a function $R_\infty :[0,1]^m\rightarrow [0,1]$ by

$$\begin{aligned} R_\infty (t_1,\ldots , t_m){\mathop {=}\limits ^{\mathrm {def}}}P_K\left( \bigcap _{i\in [m]} K^{(i)}(t_i)\right) . \end{aligned}$$

It is easy to see that $R_\infty $ is a cumulative distribution function (CDF). We next introduce another CDF, denoted $R_n$, which will be used as an intermediate between ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}},P_K)$ and ${\mathrm{Dow}}(S(M))$.

Definition 7.2

For a point cloud $X_n\subset K$ of size n, sampled from a regular pair, we define a function $R_n:[0,1]^m\rightarrow [0,1]$ by

$$\begin{aligned} R_n(t_1,\ldots , t_m){\mathop {=}\limits ^{\mathrm {def}}}\frac{1}{n}\cdot \#\left( X_n\cap \left( \bigcap _{i\in [m]} K^{(i)}(t_i)\right) \right) . \end{aligned}$$

For those familiar with nonparametric statistics, it is easy to see that $R_n$ is in fact the empirical cumulative distribution function (empirical CDF) of $R_\infty $. However, $R_n$ is still not obtainable from the $m\times n$ data matrix $M_n = [f_i(x_a)]$ since $K^{(i)}(t_i)$ is not directly accessible from $M_n$. The next definition is introduced to solve this problem by considering a step-function approximation of $K^{(i)}(t_i)$.

Definition 7.3

Let $M_n\in {\mathcal {M}}_{m,n}^o$ be sampled from a regular pair $({\mathcal {F}},P_K)$, where ${\mathcal {F}} = \{f_i\}_{i\in [m]}$. For $i\in [m]$ and $t\in [0,1]$, we define

$$\begin{aligned} K_n^{(i)}(t) = {\left\{ \begin{array}{ll} f_i^{-1}(-\infty , f_i(x_a)) &{}\text { if }t\in \left[ \frac{{\mathrm{ord}}_i(M_n,a)-1}{n}, \frac{{\mathrm{ord}}_i(M_n,a)}{n}\right) , a=1,\ldots , n\\ K &{}\text { if }t=1 \end{array}\right. }. \end{aligned}$$

For a pictorial illustration of $K_n^{(i)}(t)$, please refer to Fig. 8. Notice that there is a subscript n in $K_n^{(i)}(t)$, indicating its dependence on the sampled matrix $M_n$. The object in the next definition is obtainable solely from the data matrix $M_n$, sampled from a regular pair.

Definition 7.4

Let $M_n=[M_{ia}]$ be an $m\times n$ data matrix, sampled from a regular pair. Define a function ${\hat{R}}_n:[0,1]^m\rightarrow [0,1]$ by

$$\begin{aligned} {\hat{R}}_n(t_1,\ldots , t_m)&{\mathop {=}\limits ^{\mathrm {def}}}\frac{1}{n}\cdot \#\left( \left\{ a\in [n]:\frac{{\mathrm{ord}}_i(M_n,a)}{n} \le t_i,\ \forall \ i\in [m]\right\} \right) \\&= \frac{1}{n}\cdot \#\left( \bigcap _{i\in [m]}\left\{ a\in [n]:\frac{{\mathrm{ord}}_i(M_n,a)}{n}\le t_i \right\} \right) . \end{aligned}$$

In the following lemma, we rewrite ${\hat{R}}_n$ in a form that is similar to the definition of $R_n$, which helps build a connection between them.

Lemma 7.5

Let $X_n = \{x_1,\cdots ,x_n\}\subset K$ be a point cloud, sampled from a regular pair, and $M_n$ be the corresponding $m\times n$ data matrix. Then, for all $(t_1,\ldots , t_m)\in [0,1]^m$,

$$\begin{aligned} {\hat{R}}_n(t_1,\ldots , t_m)=\frac{1}{n}\cdot \#\left( X_n\cap \left( \bigcap _{i\in [m]} K_n^{(i)}(t_i) \right) \right) . \end{aligned}$$

Proof

For each $i\in [m]$ and $t_i\in [0,1)$, there exists $b\in [n]$ such that $\frac{{\mathrm{ord}}_i(M_n,b)-1}{n}\le t_i<\frac{{\mathrm{ord}}_i(M_n,b)}{n}$. Then $K_n^{(i)}(t_i) = f_i^{-1}(-\infty ,f_i(x_b))$ and

$$\begin{aligned} X_n\cap K_n^{(i)}(t_i)&= \{x_a\in X_n:f_i(x_a)<f_i(x_b)\}=\{x_a\in X_n:{\mathrm{ord}}_i(M_n,a)<{\mathrm{ord}}_i(M_n,b)\} \\&= \{x_a\in X_n:{\mathrm{ord}}_i(M_n,a)\le nt_i\} = \left\{ x_a\in X_n:\frac{{\mathrm{ord}}_i(M_n,a)}{n}\le t_i\right\} . \end{aligned}$$

For $t_i = 1$, $K_n^{(i)}(t_i)=K$ and the above equality still holds. Thus, for any $(t_1,\ldots , t_m)\in [0,1]^m$,

$$\begin{aligned} X_n\cap \left( \bigcap _{i\in [m]} K_n^{(i)}(t_i)\right) = \bigcap _{i\in [m]}\left\{ x_a\in X_n:\frac{{\mathrm{ord}}_i(M_n,a)}{n}\le t_i\right\} . \end{aligned}$$

By the definition of ${\hat{R}}_n$, the equality follows. $\square $

Now the intuition behind the approximations is quite clear: since $K_n^{(i)}$ is an approximation of $K^{(i)}$, by Lemma 7.5, ${\hat{R}}_n$ is an approximation of $R_n$. Therefore, ${\hat{R}}_n$ also approximates $R_\infty $.

Next we connect $R_\infty $ and ${\hat{R}}_n$ with our target objects ${\mathrm{Dow}}(S(M))$ and ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)$. For simplicity, we introduce the following convenient notations:

Definition 7.6

For $(t_1,\ldots , t_m)\in [0,1]^m$ and $\sigma \subseteq [m]$, define

$$\begin{aligned} t^\sigma _i{\mathop {=}\limits ^{\mathrm {def}}}{\left\{ \begin{array}{ll} t_i, &{}\quad \text { if }i\in \sigma \\ 1, &{}\quad \text { otherwise}. \end{array}\right. } \end{aligned}$$

With these notations, we have

Theorem 7.7

Let ${({\mathcal {F}},P_K)}$ be a regular pair and $M_n$ be an $m\times n$ data matrix, sampled from ${({\mathcal {F}},P_K)}$. Then, for all $(t_1,\ldots , t_m)\in [0,1]^m$, we have

(i)
${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)(t_1,\ldots , t_m) = \{\sigma \subseteq [m]:R_\infty (t_1^\sigma ,\ldots , t_m^\sigma )\ne 0\}$.
(ii)
${\mathrm{Dow}}(S(M_n))(t_1,\ldots , t_m) = \{\sigma \subseteq [m]:{\hat{R}}_n(t_1^\sigma ,\ldots , t_m^\sigma )\ne 0\}$.

Proof

For the first equality, recall that ${{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)(t_1,\ldots , t_m) = {\mathrm{nerve}}(\{K^{(i)}(t_i)\}_{i\in [m]})$; namely, $\sigma \in {{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)(t_1,\ldots , t_m)$ if and only if $\bigcap _{i\in \sigma } K^{(i)}(t_i)\ne \varnothing $. Since each $K^{(i)}(t_i)$ is open, $\sigma \in {{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)(t_1,\ldots , t_m)$ is also equivalent to $P_K(\bigcap _{i\in \sigma } K^{(i)}(t_i))\ne 0$. Notice that, by definition of $R_\infty $ and $t_i^\sigma $,

$$\begin{aligned} R_\infty (t_1^\sigma ,\ldots , t_m^\sigma )&= P_K\left( \bigcap _{i\in [m]} K^{(i)}(t_i^\sigma )\right) = P_K\left( \left( \bigcap _{i\in \sigma } K^{(i)}(t_i)\right) \cap \left( \bigcap _{i\in [m]\setminus \sigma } K^{(i)}(1)\right) \right) \\&= P_K\left( \bigcap _{i\in \sigma } K^{(i)}(t_i)\right) . \end{aligned}$$

Therefore, the first equality follows.

For the second equality, recall, from Lemma 2.4, that

$$\begin{aligned} {\mathrm{Dow}}(S(M_n))(t_1,\cdots ,t_m) = {\mathrm{nerve}}(\{A^{(i)}(t_i)\}_{i\in [m]}) \end{aligned}$$

where $A^{(i)}(t_i) = \{a\in [n]:\frac{1}{n}\cdot \#(\{b\in [n]:b\le _i a\})\le t_i\}$. Thus, $\sigma \in {\mathrm{Dow}}(S(M))(t_1,\cdots ,t_m)$ is equivalent to $\bigcap _{i\in \sigma } A^{(i)}(t_i)\ne \varnothing $, which is also equivalent to $\#(\bigcap _{i\in \sigma } A^{(i)}(t_i))\ne 0$ since each $A^{(i)}(t_i)$ is a finite set. Note that $A^{(i)}(t_i)$ is exactly $X_n\cap K_n^{(i)}(t_i)$. Thus, by Lemma 7.5,

$$\begin{aligned} {\hat{R}}_n(t_1^\sigma ,\cdots ,t_m^\sigma )&= \frac{1}{n}\cdot \#\left( X_n\cap \left( \bigcap _{i\in [m]} K_n^{(i)}(t_i^\sigma )\right) \right) \\&= \frac{1}{n}\cdot \#\left( \left( \bigcap _{i\in \sigma } A^{(i)}(t_i)\right) \cap \left( \bigcap _{i\in [m]\setminus \sigma } A^{(i)}(1)\right) \right) \\&= \frac{1}{n}\cdot \#\left( \bigcap _{i\in \sigma } A^{(i)}(t_i)\right) . \end{aligned}$$

Hence, $\#(\bigcap _{i\in \sigma } A^{(i)}(t_i))\ne 0$ is equivalent to ${\hat{R}}_n(t_1^\sigma ,\cdots ,t_m^\sigma )\ne 0$, and the second equality follows. $\square $

Before diving into asymptotic results, let us look at one useful property of $R_\infty $.

Lemma 7.8

The map $R_\infty $ in Definition 7.1 is uniformly continuous.

Proof

Let $(t_1,\ldots , t_m), (t_1',\ldots , t_m')\in [0,1]^m$. Denoting the symmetric difference of any two sets A and B by $A\ominus B{\mathop {=}\limits ^{\mathrm {def}}}(A\setminus B)\cup (B\setminus A)$, then

$$\begin{aligned}&\left| R_\infty (t_1,\ldots , t_m)-R_\infty (t_1',\ldots , t_m')\right| \\&\quad =\left| P_K\left( \bigcap _{i\in [m]} K^{(i)}(t_i)\right) -P_K\left( \bigcap _{i\in [m]} K^{(i)}(t_i')\right) \right| \\&\quad \le P_K\left( \left( \bigcap _{i\in [m]} K^{(i)}(t_i)\right) \ominus \left( \bigcap _{i\in [m]} K^{(i)}(t_i')\right) \right) \\&\quad \le P_K\left( \bigcup _{i\in [m]}\left( K^{(i)}(t_i)\ominus K^{(i)}(t_i')\right) \right) \\&\quad \le \sum _{i\in [m]} P_K\left( K^{(i)}(t_i)\ominus K^{(i)}(t_i')\right) = \sum _{i\in [m]} |t_i-t_i'|. \end{aligned}$$

Using this inequality, it is now easy to obtain that $R_\infty $ is uniformly continuous. $\square $

Now we arrive at a theorem that is key to the proof of Interleaving Convergence Theorem. In the rest of the discussion, we use w.h.p. to refer to with high probability. Namely, if we state, as $n\rightarrow \infty $, w.h.p., a sequence $(A_n)_{n=1}^\infty $ of events holds, then this means that, as $n\rightarrow \infty $, the probability $\Pr [A_n]$ approaches 1.

Theorem 7.9

(1st Asymptotic Theorem) The sup-norm $\Vert R_n-R_\infty \Vert _\infty $ converges to 0 in probability. In other words, for any $\epsilon >0$,

$$\begin{aligned} \lim _{n\rightarrow \infty } \Pr \left[ \Vert R_n-R_\infty \Vert _\infty \le \epsilon \right] =1. \end{aligned}$$

For the proof, we recall an intuitive fact from probability theory:

For a finite collection of N events $A_{1,n}, \cdots , A_{N,n}$ that depends on $n = 1, 2, \cdots $, if, for each $i\in [N]$, $\lim _{n\rightarrow \infty } \Pr [A_{i,n}]=1$, then $\lim _{n\rightarrow \infty } \Pr [\bigcap _{i\in [N]} A_{i,n}]=1$.

Proof of Theorem 7.9

Let $(t_1,\ldots , t_m)\in [0,1]^m$. Let I be the indicator function of $\bigcap _{i\in [m]} K^{(i)}(t_i)$. In other words, $I:K\rightarrow \{0,1\}$ is a function defined by

$$\begin{aligned} I(x) = {\left\{ \begin{array}{ll} 1&{}\quad \text {if }x\in \bigcap _{i\in [m]} K^{(i)}(t_i)\\ 0&{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$

Notice that, since $(K,P_K)$ is a probability space (with Borel $\sigma $-algebra), I is a random variable. Moreover, by Definition 7.2, if $I_1,\ldots , I_n$ are i.i.d copies of I, then $R_n(t_1,\ldots , t_m)=\frac{1}{n}\sum _{a=1}^n I_a$. Let $p=P_K\left( \bigcap _{i\in [m]} K^{(i)}(t_i)\right) =R_\infty (t_1,\ldots , t_m)=\Pr [I = 1]$. By Chebyshev inequality, for any $\epsilon >0$, as $n\rightarrow \infty $,

$$\begin{aligned} \Pr \left[ \left| \left( \frac{1}{n}\cdot \sum _{a=1}^n I_a\right) -p\right| >\epsilon \right] \le \frac{{\mathrm{Var}}\left( \frac{1}{n}\cdot \sum _{a=1}^n I_a\right) }{\epsilon ^2} =\frac{p(1-p)}{n\epsilon ^2}\rightarrow 0. \end{aligned}$$

i.e. $\Pr \left[ \left| R_n(t_1,\ldots , t_m)-R_\infty (t_1,\ldots , t_m)\right| >\epsilon \right] \rightarrow 0$. Thus we have obtained the pointwise convergence version of the result.

To prove uniform convergence, consider the following: By Lemma 7.8, $R_\infty $ is uniformly continuous. Thus there exists $\delta >0$ such that, for all $(t_1,\ldots , t_m)$ and $(t_1',\ldots , t_m')$ with $\max _{i\in [m]} |t_i-t_i'|\le \delta $,

$$\begin{aligned} \left| R_\infty (t_1,\ldots , t_m)-R_\infty (t_1',\ldots , t_m')\right| \le \epsilon . \end{aligned}$$

Subdivide $[0,1]^m$ into finitely many (m-dimensional) rectangles of sides shorter than $\delta $. Let V be the collection of all vertices of all rectangles in the subdivision. Since V is a finite set, by the above pointwise result, as $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned} \sup _{\mathbf {t} \in V} |R_n(\mathbf {t})-R_\infty (\mathbf {t})|\le \epsilon . \end{aligned}$$

(31)

In other words,

$$\begin{aligned} \lim _{n\rightarrow \infty } \Pr \left[ \sup _{\mathbf {t}\in V} |R_n(\mathbf {t})-R_\infty (\mathbf {t})|\le \epsilon \right] = 1. \end{aligned}$$

We now claim that Eq. (31) implies

$$\begin{aligned} \Vert R_n-R_\infty \Vert _\infty = \sup _{\mathbf {t}\in [0,1]^m} |R_n(\mathbf {t})-R_\infty (\mathbf {t})|\le 2\epsilon . \end{aligned}$$

(32)

Let $\mathbf {t} = (t_1,\ldots , t_m)$ be an arbitrary element in $[0,1]^m$. Then $\mathbf {t}$ lies in some small rectangle of the subdivision. Let $\mathbf {t}_1$ and $\mathbf {t}_0$ be the unique maximum and minimum, respectively, in the rectangle. Then

$$\begin{aligned}&R_\infty (\mathbf {t})-\epsilon \\&\quad \le (R_\infty (\mathbf {t}_0)+\epsilon )-\epsilon = R_\infty (\mathbf {t}_0)&\quad \text {(uniform continuity of }R_\infty )\\&\quad \le R_n(\mathbf {t}_0)+\epsilon&\quad (\mathbf {t}_0\text { is a vertex in the subdivision)}\\&\quad \le R_n(\mathbf {t})+\epsilon&\quad (R_n\text { is monotone and }\mathbf {t}_0\le \mathbf {t})\\&\quad \le R_n(\mathbf {t}_1)+\epsilon&\quad (R_n\text { is monotone and }\mathbf {t}\le \mathbf {t}_1)\\&\quad \le (R_\infty (\mathbf {t}_1)+\epsilon )+\epsilon&\quad (\mathbf {t}_1\text { is a vertex in the subdivision)}\\&\quad \le R_\infty (\mathbf {t})+\epsilon +\epsilon +\epsilon&\quad \text {(uniform continuity of }R_\infty ). \end{aligned}$$

Thus, $R_\infty (\mathbf {t})-2 \epsilon \le R_n(\mathbf {t})\le R_\infty (\mathbf {t})+2\epsilon $. i.e. $|R_\infty (\mathbf {t})-R_n(\mathbf {t})|\le 2\epsilon $. Since $\mathbf {t}\in [0,1]^m$ is arbitrary, Eq. (32) follows. In other words,

$$\begin{aligned} \lim _{n\rightarrow \infty } \Pr \left[ \sup _{\mathbf {t}\in [0,1]^m} |R_n(\mathbf {t})-R_\infty (\mathbf {t})|\le 2\epsilon \right] = 1. \end{aligned}$$

Rescaling $2\epsilon $ to $\epsilon $, the uniform result follows. $\square $

For people familiar with non-parametric statistics, it is immediate that Theorem 7.9 is a natural m-dimensional generalization of the standard Glivenko-Cantelli theorem^{Footnote 17} under specific conditions.

Recall that, for $x\in K$ and $i\in [m]$, $T_i(x){\mathop {=}\limits ^{\mathrm {def}}}P_K\left( f_i^{-1}(-\infty ,f_i(x))\right) $.

Lemma 7.10

For all $\epsilon >0$, as $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned} K^{(i)}(t-\epsilon )\subseteq K_n^{(i)}(t)\subseteq K^{(i)}(t+\epsilon ),\ \forall \ i\in [m],\ t\in [0,1] \end{aligned}$$

where $K^{(i)}(t){\mathop {=}\limits ^{\mathrm {def}}}\varnothing $ for $t<0$ and to be $K^{(i)}(t){\mathop {=}\limits ^{\mathrm {def}}}K$ for $t>1$.

Proof

For $t=1$, $K_n^{(i)}(t) = K = K^{(i)}(t+\epsilon )$ and the inclusions are clearly satisfied. Now let $t\in [0,1)$. Then $t\in [\frac{a}{n},\frac{a+1}{n})$ for some $a\in \{0, 1,\cdots ,n-1\}$. W.L.O.G., assume $f_i(x_1)<\cdots <f_i(x_n)$. By definition of $K_n^{(i)}(t)$, $K_n^{(i)}(t) = K^{(i)}(T_i(x_{a+1}))$. By monotonicity of $K^{(i)}$, it suffices to prove that, w.h.p., $|T_i(x_{a+1})-t|<\epsilon $. Notice that $R_\infty (1,\cdots ,1,T_i(x_{a+1}),1,\cdots ,1) = T_i(x_{a+1})$ and $R_n(1,\cdots ,1,T_i(x_{a+1}),1,\cdots ,1) = \frac{1}{n}\cdot \#(K^{(i)}(T_i(x_{a+1}))\cap X_n)=\frac{a}{n}$. Choose n large enough such that $\frac{1}{n}<\frac{\epsilon }{2}$. Then

$$\begin{aligned} |T_i(x_{a+1})-t|&\le \left| T_i(x_{a+1})-\frac{a}{n}\right| +\left| \frac{a}{n}-t\right| \\&=|R_\infty (1,\cdots ,1,T_i(x_{a+1}),1,\cdots ,1)\\&\quad -R_n(1,\cdots ,1,T_i(x_{a+1}),1,\cdots ,1)+\left| \frac{a}{n}-t\right| . \end{aligned}$$

In the last expression, by Theorem 7.9, w.h.p., the first term is less than $\frac{\epsilon }{2}$, not depending on t, and the second term is less than $\frac{\epsilon }{2}$ by our choice of sufficient large n. Thus, w.h.p., $|T_i(x_{a+1})-t|<\epsilon $, for all $t\in [0,1]$ and the result follows. $\square $

Corollary 7.11

For all $\epsilon >0$, as $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned} R_n(t_1-\epsilon ,\ldots ,t_m-\epsilon )\le & {} {\hat{R}}_n(t_1,\ldots , t_m) \\\le & {} R_n(t_1+\epsilon ,\ldots ,t_m+\epsilon )\ ,\ \forall \ (t_1,\ldots , t_m)\in [0,1]^m \end{aligned}$$

where, for the variables of $R_n$, negative inputs are automatically replaced by 0 and inputs greater than 1 are automatically replaced by 1.

Proof

As $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned} R_n(t_1-\epsilon ,\ldots ,t_m-\epsilon )&=\frac{1}{n}\cdot \#\left( \left( \bigcap _{i\in [m]} K^{(i)}(t_i-\epsilon )\right) \cap X_n\right) \quad \text {(Definition}~7.2)\\&\le \frac{1}{n}\cdot \#\left( \left( \bigcap _{i\in [m]} K_n^{(i)}(t_i)\right) \cap X_n\right) \quad \text {(Lemma}~7.10)\\&(={\hat{R}}_n(t_1,\ldots , t_m))\quad \text {(Lemma}~7.5)\\&\le \frac{1}{n}\cdot \#\left( \left( \bigcap _{i\in [m]} K^{(i)}(t_i+\epsilon )\right) \cap X_n\right) \quad \text {(Lemma}~7.10)\\&= R_n(t_1+\epsilon ,\ldots ,t_m+\epsilon )\quad \text {(Definition}~7.2). \end{aligned}$$

Hence the result follows. $\square $

Motivated from Theorem 7.7, the key to prove the Interleaving Convergence Theorem is the zero sets of $R_\infty $, $R_n$, and ${\hat{R}}_n$, explicitly defined below.

Definition 7.12

Let $R_\infty $, $R_n$, and ${\hat{R}}_n$ be defined as in Definitions 7.1, 7.2, and 7.4. Define the following subsets of $[0,1]^m$:

(i)
$Z_\infty {\mathop {=}\limits ^{\mathrm {def}}}R_\infty ^{-1}(0)=\left\{ (t_1,\ldots , t_m):\bigcap _{i\in [m]} K^{(i)}(t_i) = \varnothing \right\} $.
(ii)
$Z_n{\mathop {=}\limits ^{\mathrm {def}}}R_n^{-1}(0)=\left\{ (t_1,\ldots , t_m):X_n\cap \left( \bigcap _{i\in [m]} K^{(i)}(t_i)\right) =\varnothing \right\} $.
(iii)
${\hat{Z}}_n {\mathop {=}\limits ^{\mathrm {def}}}{\hat{R}}_n^{-1}(0) = \left\{ (t_1,\ldots , t_m):X_n\cap \left( \bigcap _{i\in [m]} K_n^{(i)}(t_i)\right) =\varnothing \right\} $.^{Footnote 18}

For $Z\subseteq {\mathbb {R}}^m$ and $\epsilon >0$, define

$$\begin{aligned} Z+\epsilon {\mathop {=}\limits ^{\mathrm {def}}}\{(t_1+\epsilon _1,\ldots ,t_m+\epsilon _m):(t_1,\ldots ,t_m)\in Z,\ 0\le \epsilon _i\le \epsilon \}. \end{aligned}$$

(33)

Note that, since $R_\infty $, $R_n$, and ${\hat{R}}_n$ are all monotone, $Z_\infty $, $Z_n$, and ${\hat{Z}}_n$ are closed under lower parital order; namely, for $Z = Z_\infty , Z_n\text { or }{\hat{Z}}_n$, if $\mathbf {t}_*\in Z$, then $\mathbf {t}\in Z$ for all $\mathbf {t}\le \mathbf {t}_*$.

Lemma 7.13

Let $Z_\infty $, $Z_n$, and ${\hat{Z}}_n$ be defined as in Definition 7.12. Then, for all $\epsilon >0$, as $n\rightarrow \infty $, w.h.p.,

(i)
$Z_n\subseteq {\hat{Z}}_n+\epsilon $ and ${\hat{Z}}_n\subseteq Z_n+\epsilon $.
(ii)
$Z_n\subseteq Z_\infty +\epsilon $.

Moreover, with probability 1,

(iii)
$Z_\infty \subseteq Z_n$.

Proof

For the first inclusion in (i), let $z\in Z_n$, where $z = (t_1, \cdots ,t_m)$; namely, $R_n(z) = 0$. By Corollary 7.11, w.h.p., ${\hat{R}}_n(t_1-\epsilon ,\cdots ,t_m-\epsilon ) = 0$; namely, $(t_1-\epsilon ,\cdots ,t_m-\epsilon )\in {\hat{Z}}_n$. Thus $z = (t_1,\cdots ,t_m)\in {\hat{Z}}_n+\epsilon $. For the second inclusion in (i), let $z\in {\hat{Z}}_n$, where $z = (t_1,\cdots ,t_m)$; namely, ${\hat{R}}_n(z) = 0$. By Corollary 7.11, w.h.p., $R_n(t_1-\epsilon ,\cdots ,t_m-\epsilon ) = 0$; namely, $(t_1-\epsilon ,\cdots ,t_m-\epsilon )\in Z_n$. Thus $z = (t_1,\cdots ,t_m)\in Z_n+\epsilon $.

To prove (ii), first, let $\nu = \inf \{R_\infty (\mathbf {t}):\mathbf {t}\in [0,1]^m\setminus (Z_\infty +\epsilon )\}$; if $[0,1]^m\setminus (Z_\infty +\epsilon )=\varnothing $, simply define $\nu = 1$. Since $Z_\infty $ is the domain where $R_\infty $ takes zero values and $Z_\infty +\epsilon $ is strictly away from $Z_\infty $, by continuity of $R_\infty $ and the fact $R_\infty (1,\cdots ,1) = 1$, we must have $\nu > 0$. By Theorem 7.9, w.h.p., $\Vert R_n-R_\infty \Vert _\infty \le \nu /2$. Thus, by triangle inequality, w.h.p., $R_n\ge \nu -\nu /2 = \nu /2>0$ on $[0,1]^m\setminus (Z_\infty +\epsilon )$. Now, for every $\mathbf {t}\in Z_n$, $R_n(\mathbf {t})=0$ and thus $\mathbf {t}\notin [0,1]^m\setminus Z_\infty +\epsilon $; namely, $\mathbf {t}\in Z_\infty +\epsilon $. Hence $Z_n\subseteq Z_\infty +\epsilon $.

To prove (iii), let $\mathbf {t}\in Z_\infty $. Then $R_\infty (\mathbf {t}) = 0$, where $\mathbf {t}=(t_1,\ldots ,t_m)$; namely $P_K(\bigcap _{i\in [m]}K^{(i)}(t_i))=0$. Since $K^{(i)}(t_i)$ is open for all $i\in [m]$, $\bigcap _{i\in [m]}K^{(i)}(t_i)$ is open with zero measure. Therefore, $\bigcap _{i\in [m]}K^{(i)}(t_i)=\varnothing $ and thus $R_n(\mathbf {t}) = \frac{1}{n}\cdot \#((\bigcap _{i\in [m]}K^{(i)}(t_i))\cap X_n)=0$. Hence, $\mathbf {t}\in Z_n$ and $Z_\infty \subseteq Z_n$. $\square $

Immediate from Lemma 7.13 is

Corollary 7.14

Let $Z_\infty $ and ${\hat{Z}}_n$ be defined as in Definition 7.12. Then, for all $\epsilon >0$, as $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned} {\hat{Z}}_n\subseteq Z_\infty +\epsilon \text { and } Z_\infty \subseteq {\hat{Z}}_n+\epsilon . \end{aligned}$$

We are now able to prove Theorem 2.10, the Interleaving Convergence Theorem.

Proof of Theorem 2.10

Let $\epsilon >0$. We need to prove: as $n\rightarrow \infty $, w.h.p.

$$\begin{aligned} {{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)(t_1-\epsilon ,\ldots ,t_m-\epsilon )&\subseteq {\mathrm{Dow}}(S(M_n))(t_1,\ldots ,t_m)\\&\subseteq {{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)(t_1+\epsilon ,\ldots ,t_m+\epsilon ). \end{aligned}$$

By Corollary 7.14, as $n\rightarrow \infty $, w.h.p., ${\hat{Z}}_n\subseteq Z_\infty +\epsilon \text { and } Z_\infty \subseteq {\hat{Z}}_n+\epsilon $.

For the first inclusion, let $\sigma \in {{\mathbb {D}}{\mathrm{ow}}}({\mathcal {F}}, P_K)(t_1-\epsilon ,\ldots ,t_m-\epsilon )$. Then, by Theorem 7.7, $R_\infty ((t_1-\epsilon )^\sigma ,\cdots ,(t_m-\epsilon )^\sigma )\ne 0$, where $(t_i-\epsilon )^\sigma =(t_i-\epsilon )$ if $i\in \sigma $ and $(t_i-\epsilon )^\sigma =1$ if $i\notin \sigma $. We need to prove ${\hat{R}}_n(t_1^\sigma ,\cdots ,t_m^\sigma )\ne 0$. Let us prove by contradiction. Suppose ${\hat{R}}_n(t_1^\sigma ,\cdots ,t_m^\sigma )=0$. Then $(t_1^\sigma ,\cdots ,t_m^\sigma )\in {\hat{Z}}_n$. Thus $(t_1^\sigma ,\cdots ,t_m^\sigma )\in Z_\infty +\epsilon $ and $((t_1-\epsilon )^\sigma ,\cdots ,(t_m-\epsilon )^\sigma )$, meaning $R_\infty (((t_1-\epsilon )^\sigma ,\cdots ,(t_m-\epsilon )^\sigma ))=0$, a contradiction. Thus ${\hat{R}}_n(t_1^\sigma ,\cdots ,t_m^\sigma )\ne 0$ and $\sigma \in {\mathrm{Dow}}(S(M_n))(t_1,\ldots ,t_m)$, completing the 1st part. Analogously, the second inclusion can be proved by using Theorem 7.7, with the help of $Z_\infty \subseteq {\hat{Z}}_n+\epsilon $. $\square $

1.2 Proof of Theorem 3.8, the convergence of $L_k(M_n)$ to $L_k{({\mathcal {F}},P_K)}$

In this subsection, we state the well-known Isometry Theorem in topological data analysis and use it to prove Theorem 3.8. We begin with the definition of a quadrant-tame persistence module.

Definition 7.15

(Definition 1.12 in Oudot 2015) A persistence module ${\mathbb {V}} = (V_i, v_i^j)$ over ${\mathbb {R}}$ is quadrant-tame if ${\mathrm{rank}}\ v_i^j<\infty $ for all $i<j$.

Theorem 7.16

(Isometry Theorem, see e.g. Theorem 3.1 in Oudot 2015) Let ${\mathbb {V}}, {\mathbb {W}}$ be quadrant-tame persistence modules over ${\mathbb {R}}$. Then

$$\begin{aligned} \mathrm {d}_{\mathrm {b}}({\mathrm{dgm}}({\mathbb {V}}), {\mathrm{dgm}}({\mathbb {W}})) = \mathrm {d}_{\mathrm {i}}({\mathbb {V}}, {\mathbb {W}}) \end{aligned}$$

where $\mathrm {d}_{\mathrm {b}}$ is the bottleneck distance between persistence diagrams and $\mathrm {d}_{\mathrm {i}}$ is the interleaving distance between persistence modules.

Notice that, throughout the paper, all simplicial complexes are subcomplexes of $2^{[m]}$ and hence all vector spaces in the persistence modules we consider are finite dimensional and thus quadrant-tame. Therefore, we have the Isometry Theorem available. In the rest of this section, the proof of Theorem 3.8 is broken into several lemmas based on some newly developed tools. Since the presentation is in logical order instead of the order of ideas, we give a quick overview of how they are pieced together.

The central observation throughout the proof is Lemma 7.21, which writes both $L_k{({\mathcal {F}},P_K)}$ and $L_k(M_n)$ in terms of double supremum expressions. Notice that their expressions only differ in ${{\mathbb {D}}{\mathrm{ow}}}{({\mathcal {F}},P_K)}$ and ${\mathrm{Dow}}(S(M_n))$, and in ${\mathcal {T}}{({\mathcal {F}},P_K)}_+$ and ${\mathcal {T}}(M_n)_+$, introduced in Definitions 7.18 and 7.17.

With this in mind, it is easy to see that a result that bounds the variation of the double supremum expression when ${{\mathbb {D}}{\mathrm{ow}}}{({\mathcal {F}},P_K)}$ is replaced by ${\mathrm{Dow}}(S(M_n))$ is needed, which is exactly Lemma 7.23. Similarly, a result that bounds the variation of the double supremum expression when ${\mathcal {T}}{({\mathcal {F}},P_K)}_+$ is replaced by ${\mathcal {T}}(M_n)_+$ is also needed, which is Lemma 7.22. We still need to justify the applicability of Lemmas 7.23 and 7.22, respectively, which corresponds to the Interleaving Convergence Theorem (Theorem 2.10) and Lemma 7.19, respectively. Now the pieces can be connected and combined to complete the proof of Theorem 3.8. Notice that Isometry Theorem (Theorem 7.16) is lurking in the proofs of Lemmas 7.23 and 7.22 and thus playing an important role in the proof of Theorem 3.8.

Definition 7.17

Let ${\mathcal {T}}\subseteq [0,1]^m$. Define the set of diagonal rays of ${\mathcal {T}}$, denoted ${\mathcal {T}}_+$, by

$$\begin{aligned} {\mathcal {T}}_+ {\mathop {=}\limits ^{\mathrm {def}}}\{{\mathrm{ray}}_T:T = (t_1,\ldots , t_m)\in {\mathcal {T}}\} \end{aligned}$$

(34)

where ${\mathrm{ray}}_T{\mathop {=}\limits ^{\mathrm {def}}}\left\{ (t_1-t,\ldots , t_m-t):t\in [0,\max _{i\in [m]} t_i]\right\} $.

Definition 7.18

Let ${({\mathcal {F}},P_K)}$ be a regular pair and $M_n\in {\mathcal {M}}_{m,n}^o$ be sampled from ${({\mathcal {F}},P_K)}$. Define the following two subsets of $[0,1]^m$:

$$\begin{aligned} {\mathcal {T}}{({\mathcal {F}},P_K)}&{\mathop {=}\limits ^{\mathrm {def}}}\{(T_1(x),\ldots , T_m(x)):x\in K\}\text {, and}\\ \hat{{\mathcal {T}}}(M_n)&{\mathop {=}\limits ^{\mathrm {def}}}\left\{ \left( {\mathrm{ord}}_1(M_n,a)/n,\cdots ,{\mathrm{ord}}_m(M_n,a)/n\right) : a\in [n]\right\} . \end{aligned}$$

Recall that the Hausdorff distance between two subsets $S_1, S_2$ of ${\mathbb {R}}^m$ is defined as

$$\begin{aligned} \mathrm {d}_{\mathrm {H}}(S_1, S_2) = \inf \{\epsilon >0:S_1\subseteq S_2+B(0,\epsilon ),S_2\subseteq S_1+B(0,\epsilon )\} \end{aligned}$$

(35)

where $B(0,\epsilon )$ is the $\epsilon $-ball in ${\mathbb {R}}^m$ centered at 0 and $+$ inside the $\inf $ is the operation of Minkowski sum. In the next lemma, we prove that $\hat{{\mathcal {T}}}(M_n)$ approximates ${\mathcal {T}}{({\mathcal {F}},P_K)}$ in Hausdorff distance.

Lemma 7.19

Let $M_n\in {\mathcal {M}}_{m,n}^o$ be sampled from a regular pair ${({\mathcal {F}},P_K)}$. Then, as $n\rightarrow \infty $,

$$\begin{aligned} \mathrm {d}_{\mathrm {H}}({\mathcal {T}}{({\mathcal {F}},P_K)}, \hat{{\mathcal {T}}}(M_n))\rightarrow 0 \text { in probability.} \end{aligned}$$

(36)

Proof

Recall that, for each $i\in [m]$, $T_i = \phi _i\circ f_i$, where $\phi _i$ is a monotone increasing function. Since there is no measure jump in a regular pair (i.e. $P_K(f_i^{-1}(\ell )) = 0$ for all $i\in [m], \ell \in {\mathbb {R}}$), each $\phi _i$ is continuous and so is each $T_i$. Since each $f_i$ can be extended continuously to ${\bar{K}}$, we also have each $T_i$ continuously extendable to ${\bar{K}}$. Since ${\bar{K}}$ is compact, the function $(T_1,\ldots , T_m):{\bar{K}}\rightarrow {\mathbb {R}}^m$ is uniformly continuous.

Let $\epsilon >0$. We need to prove, as $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned} \mathrm {d}_{\mathrm {H}}({\mathcal {T}}{({\mathcal {F}},P_K)}, \hat{{\mathcal {T}}}(M_n))<\epsilon \end{aligned}$$

(37)

By uniform continuity, there exists $\delta >0$ such that, for all $x,y\in K$ with $\Vert x-y\Vert _2\le \delta $,

$$\begin{aligned} \Vert (T_1(x), \cdots ,T_m(x))-(T_1(y), \cdots ,T_m(y))\Vert _2\le \epsilon /2. \end{aligned}$$

(38)

Let $X_n = \{x_1,\ldots , x_n\}$ be a sample of size n, i.i.d. from ${({\mathcal {F}},P_K)}$. Let us prove that, as $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned} K\subseteq X_n+B(0,\delta ) \end{aligned}$$

(39)

Since K is bounded, we may cover K by finitely many small rectangles of diameters smaller than $\delta $, where each rectangle intersects K and the rectangles intersect each other only on their boundaries. Denote the rectangles by $\{R_1,\ldots , R_N\}$. Let $p_j{\mathop {=}\limits ^{\mathrm {def}}}P_K(R_j\cap K)$, which are positive by openness of K. Then

$$\begin{aligned} \Pr [(R_j\cap K)\cap X_n\ne \varnothing ,\ \forall \ j\in [N]]\ge 1-\sum _{j = 1}^N (1-p_j)^n. \end{aligned}$$

(40)

Since N is finite and each $1-p_j\in [0,1)$, as $n\rightarrow \infty $, w.h.p., $(R_j\cap K)\cap X_n\ne \varnothing $, for all $j = 1,\ldots , N$. Since the diameter of each $R_j\cap K$ is less than $\delta $, Eq. (39) follows.

Let us prove another preparatory result: as $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned} \max _{a\in [n], i\in [m]} \left| \frac{{\mathrm{ord}}_i(M_n, a)}{n}-T_i(x_a)\right| \le \sqrt{\frac{\epsilon }{2m}}. \end{aligned}$$

(41)

Treating each $T_i$ as a cumulative distribution function defined on K, since [m] is finite, Eq. (41) is an immediate consequence of Glivenko-Cantelli Theorem.

Now we return to the proof. Notice that ${\mathcal {T}}{({\mathcal {F}},P_K)}$ is the image of $(T_1,\ldots , T_m):K\rightarrow [0,1]^m$. By Eqs. (39) and (38), w.h.p., ${\mathcal {T}}{({\mathcal {F}},P_K)}\subseteq (T_1,\ldots , T_m)(X_n)+B(0,\epsilon /2)$. By Eq. (41), $(T_1,\ldots , T_m)(X_n)\subseteq \hat{{\mathcal {T}}}(M_n)+B(0,\epsilon /2)$. Thus, ${\mathcal {T}}{({\mathcal {F}},P_K)}\subseteq \hat{{\mathcal {T}}}(M_n)+B(0,\epsilon )$. On the other hand, by Eq. (41), $\hat{{\mathcal {T}}}(M_n)\subseteq {\mathcal {T}}{({\mathcal {F}},P_K)}+B(0,\epsilon /2)\subseteq {\mathcal {T}}{({\mathcal {F}},P_K)}+B(0,\epsilon )$, completing the proof. $\square $

In the following, we develop the convention of restricting a multi-filtered complex to a diagonal ray as defined in Definition 7.17.

Definition 7.20

Let ${\mathcal {T}}\subseteq [0,1]^m$ and ${\mathcal {K}} = \{{\mathcal {K}}(T)={\mathcal {K}}(t_1,\ldots , t_m)\}_{T\in {\mathbb {R}}^m}$ be a multi-filtered complex indexed over ${\mathbb {R}}^m$ with ${\mathcal {K}}(T)\subseteq 2^{[m]}$ for all $T\in {\mathbb {R}}^m$. For $T = (t_1,\ldots , t_m)\in {\mathcal {T}}$, let ${\mathrm{ray}}_T$ be as in Definition 7.17. Define the restriction of ${\mathcal {K}}$ to ${\mathrm{ray}}_T$ as the 1-dimensional filtered complex ${\mathcal {K}}|_{{\mathrm{ray}}_T} = \{{\mathcal {K}}|_{{\mathrm{ray}}_T}(t)\}_t$, indexed over $t\in [0,\max _{i\in [m]} t_i]$, by

$$\begin{aligned} {\mathcal {K}}|_{{\mathrm{ray}}_T}(t){\mathop {=}\limits ^{\mathrm {def}}}{\mathcal {K}}\left( t_1-\max _{i\in [m]} t_i +t,\cdots ,t_m-\max _{i\in [m]} t_i+t\right) . \end{aligned}$$

(42)

Since we usually need to use interleaving distance to compare two filtered complexes, we extend the indexing set of ${\mathcal {K}}|_{{\mathrm{ray}}_T}$ to ${\mathbb {R}}$ by

$$\begin{aligned} {\mathcal {K}}|_{{\mathrm{ray}}_T}(t){\mathop {=}\limits ^{\mathrm {def}}}{\left\{ \begin{array}{ll} \varnothing &{}\quad \text {if }t<0\\ 2^{[m]} &{}\quad \text {if }t>\max _{i\in [m]} t_i. \end{array}\right. } \end{aligned}$$

(43)

With these conventions, we state the following lemma:

Lemma 7.21

For each $k\in \{0\}\cup {\mathbb {N}}$,

$$\begin{aligned} L_k{({\mathcal {F}},P_K)}&= \sup _{{\mathrm{ray}}\in {\mathcal {T}}{({\mathcal {F}},P_K)}_+}\sup \{(\beta -\alpha ) :(\alpha ,\beta )\in {\mathrm{dgm}}(H_k({{\mathbb {D}}{\mathrm{ow}}}{({\mathcal {F}},P_K)}|_{{\mathrm{ray}}}))\} \text {, and }\\ L_k(M_n)&= \sup _{{\mathrm{ray}}\in \hat{{\mathcal {T}}}(M_n)_+}\sup \{(\beta -\alpha ) :(\alpha ,\beta )\in {\mathrm{dgm}}(H_k({\mathrm{Dow}}(S(M_n))|_{{\mathrm{ray}}}))\}. \end{aligned}$$

Proof

The first equality follows from Eq. (11) in Definition 3.5 and the second equality can be obained from Eq. (17) in Definition 3.7. $\square $

The next lemma will be used to connect Lemmas 7.19 and 7.21.

Lemma 7.22

Let ${\mathcal {T}}_1,{\mathcal {T}}_2\subseteq [0,1]^m$ such that $\mathrm {d}_{\mathrm {H}}({\mathcal {T}}_1,{\mathcal {T}}_2)<\epsilon $. Let ${\mathcal {K}} = \{{\mathcal {K}}(T)\}_{T\in {\mathbb {R}}^m}$ be a multi-filtered complex indexed over ${\mathbb {R}}^m$ with ${\mathcal {K}}(T)\subseteq 2^{[m]}$ for all T. Then

$$\begin{aligned}&\left| \sup _{{\mathrm{ray}}\in ({\mathcal {T}}_1)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}}))\}\right. \\&\quad \left. -\sup _{{\mathrm{ray}}\in ({\mathcal {T}}_2)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}}))\}\right| <2\epsilon . \end{aligned}$$

Proof

For any constant $\eta _1>0$, we may choose ${\mathrm{ray}}_1\in ({\mathcal {T}}_1)_+$ such that

$$\begin{aligned}&\sup _{{\mathrm{ray}}\in ({\mathcal {T}}_1)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k ({\mathcal {K}}|_{\mathrm{ray}}))\}\nonumber \\&\quad \le \sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{{\mathrm{ray}}_1}))\}+\eta _1. \end{aligned}$$

(44)

Let $T_1$ be the element in ${\mathcal {T}}_1$ that ${\mathrm{ray}}_1$ is constructed from. Since $\mathrm {d}_{\mathrm {H}}({\mathcal {T}}_1, {\mathcal {T}}_2)<\epsilon $, there exists $T_2\in {\mathcal {T}}_2$ such that $\Vert T_1-T_2\Vert _2<\epsilon $. Let ${\mathrm{ray}}_2\in ({\mathcal {T}}_2)_+$ be constructed from $T_2$. Then $\mathrm {d}_{{\mathrm {INT}}}({\mathcal {K}}|_{{\mathrm{ray}}_1}, {\mathcal {K}}|_{{\mathrm{ray}}_2})<\epsilon $, implying $\mathrm {d}_{\mathrm {i}}(H_k({\mathcal {K}}|_{{\mathrm{ray}}_1}), H_k({\mathcal {K}}|_{{\mathrm{ray}}_2}))<\epsilon $. By Isometry Theorem (Theorem 7.16),

$$\begin{aligned} \mathrm {d}_{\mathrm {b}}({\mathrm{dgm}}(H_k({\mathcal {K}}|_{{\mathrm{ray}}_1})), {\mathrm{dgm}}(H_k({\mathcal {K}}|_{{\mathrm{ray}}_2})))<\epsilon . \end{aligned}$$

(45)

For any constant $\eta _2>0$, there exists $(a_1, b_1)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{{\mathrm{ray}}_1}))$ such that

$$\begin{aligned} \sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{{\mathrm{ray}}_1}))\}\le b_1-a_1+\eta _2. \end{aligned}$$

(46)

By Eq. (45), there exists $(a_2, b_2)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{{\mathrm{ray}}_2}))$ such that $\max \{|a_2-a_1|, |b_2-b_1|\}<\epsilon $. Therefore,

$$\begin{aligned} b_1-a_1\le b_2-a_2+2\epsilon . \end{aligned}$$

(47)

Combining Eq. (44), (46) and (47), we obtain

$$\begin{aligned}&\sup _{{\mathrm{ray}}\in ({\mathcal {T}}_1)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}}))\}\\&\quad \le b_1-a_1+\eta _1+\eta _2\\&\quad \le b_2-a_2+2\epsilon +\eta _1+\eta _2\\&\quad \le \sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{{\mathrm{ray}}_2}))\}+2\epsilon +\eta _1+\eta _2\\&\quad \le \sup _{{\mathrm{ray}}\in ({\mathcal {T}}_2)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}}))\}+2\epsilon +\eta _1+\eta _2 \end{aligned}$$

Since $\eta _1, \eta _2>0$ are arbitrary, we obtain

$$\begin{aligned}&\sup _{{\mathrm{ray}}\in ({\mathcal {T}}_1)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}}))\}\\&\quad \le \sup _{{\mathrm{ray}}\in ({\mathcal {T}}_2)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}}))\}+2\epsilon . \end{aligned}$$

Reversing the role of ${\mathcal {T}}_1$ and ${\mathcal {T}}_2$, we may obtain the other direction, completing the proof. $\square $

Lemma 7.23

Let ${\mathcal {K}}$ and ${\mathcal {L}}$ be multi-filtered complexes indexed over ${\mathbb {R}}^m$. Let ${\mathcal {T}}\subseteq [0,1]^m$. If $\mathrm {d}_{{\mathrm {INT}}}({\mathcal {K}},{\mathcal {L}})<\epsilon $, then, for all $k\in \{0\}\cup {\mathbb {N}}$,

$$\begin{aligned}&\left| \sup _{{\mathrm{ray}}\in {\mathcal {T}}_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}}))\}\right. \\&\quad \left. -\sup _{{\mathrm{ray}}\in {\mathcal {T}}_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {L}}|_{\mathrm{ray}}))\}\right| <2\epsilon . \end{aligned}$$

Proof

For a constant $\eta >0$, let ${\mathrm{ray}}\in {\mathcal {T}}_+$ and $(a_0, b_0)\in {\mathrm{dgm}}(H_k({\mathcal {L}}|_{\mathrm{ray}}))$ such that

$$\begin{aligned} \sup _{{\mathrm{ray}}\in {\mathcal {T}}_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {L}}|_{\mathrm{ray}}))\}\le b_0-a_0+\eta . \end{aligned}$$

(48)

Consider ${\mathcal {K}}|_{{\mathrm{ray}}}$. Since $\mathrm {d}_{{\mathrm {INT}}}({\mathcal {K}},{\mathcal {L}})<\epsilon $, $\mathrm {d}_{{\mathrm {INT}}}({\mathcal {K}}|_{\mathrm{ray}},{\mathcal {L}}|_{\mathrm{ray}})<\epsilon $. Taking the $H_k$ functor, by the Isometry Theorem, $\mathrm {d}_{\mathrm {b}}({\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}})), {\mathrm{dgm}}(H_k({\mathcal {L}}|_{\mathrm{ray}})))<\epsilon $, which implies that there exists $(a_1, b_1)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}}))$ such that $\max (|a_0-a_1|, |b_0-b_1|)<\epsilon $ and hence

$$\begin{aligned} b_0-a_0\le b_1-a_1+2\epsilon . \end{aligned}$$

(49)

Therefore, by Eq. (48) and (49),

$$\begin{aligned}&\sup _{{\mathrm{ray}}\in {\mathcal {T}}_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {L}}|_{\mathrm{ray}}))\}\\&\quad \le b_1-a_1+2\epsilon +\eta \\&\quad \le \sup _{{\mathrm{ray}}\in {\mathcal {T}}_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}}))\}. \end{aligned}$$

Since $\eta >0$ is arbitrary, we obtain

$$\begin{aligned}&\sup _{{\mathrm{ray}}\in {\mathcal {T}}_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {L}}|_{\mathrm{ray}}))\}\nonumber \\&\quad \le \sup _{{\mathrm{ray}}\in {\mathcal {T}}_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}}))\}+2\epsilon . \end{aligned}$$

(50)

Reversing the role of ${\mathcal {K}}$ and ${\mathcal {L}}$, the other direction can be otained, completing the proof. $\square $

With all the above lemmas, we now provide a proof of Theorem 3.8.

Proof of Theorem 3.8

For notational simplicity, let ${\mathcal {T}}_1 = {\mathcal {T}}{({\mathcal {F}},P_K)}$, ${\mathcal {T}}_2 = \hat{{\mathcal {T}}}(M_n)$, ${\mathcal {K}} = {{\mathbb {D}}{\mathrm{ow}}}{({\mathcal {F}},P_K)}$ and ${\mathcal {L}} = {\mathrm{Dow}}(S(M_n))$. Let $\epsilon >0$. By Lemma 7.21, we need to prove, as $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned}&\left| \sup _{{\mathrm{ray}}\in ({\mathcal {T}}_1)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}}))\}\right. \\&\quad \left. - \sup _{{\mathrm{ray}}\in ({\mathcal {T}}_2)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {L}}|_{\mathrm{ray}}))\}\right| \le \epsilon . \end{aligned}$$

By Interleaving Convergence Theorem (Thoerem 2.10), as $n\rightarrow \infty $, w.h.p., $\mathrm {d}_{{\mathrm {INT}}}({\mathcal {K}}, {\mathcal {L}})\le \epsilon /4$. Therefore, by Lemma 7.23,

$$\begin{aligned}&\left| \sup _{{\mathrm{ray}}\in ({\mathcal {T}}_1)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {K}}|_{\mathrm{ray}}))\}\right. \nonumber \\&\quad \left. - \sup _{{\mathrm{ray}}\in ({\mathcal {T}}_1)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {L}}|_{\mathrm{ray}}))\}\right| \le \epsilon /2. \end{aligned}$$

(51)

On the other hand, by Lemma 7.19, as $n\rightarrow \infty $, $\mathrm {d}_{\mathrm {H}}({\mathcal {T}}_1, {\mathcal {T}}_2)\le \epsilon /4$. Therefore, by Lemma 7.22,

$$\begin{aligned}&\left| \sup _{{\mathrm{ray}}\in ({\mathcal {T}}_1)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {L}}|_{\mathrm{ray}}))\}\right. \nonumber \\&\quad \left. - \sup _{{\mathrm{ray}}\in ({\mathcal {T}}_2)_+}\sup \{b-a:(a,b)\in {\mathrm{dgm}}(H_k({\mathcal {L}}|_{\mathrm{ray}}))\}\right| \le \epsilon /2. \end{aligned}$$

(52)

Hence, combining Eqs. (51) and (52), the result follows. $\square $

1.3 Proof of Theorem 5.6

Throughout this subsection, ${\mathrm{Cent}}_0$ is as defined in Definition 5.1, ${\widehat{{\mathrm{Cent}}}}_0(M_n)$ is as defined in Definition 5.5, and ${\hat{Z}}_n$ and $Z_\infty $ are as defined in Definition 7.12. The following two functions play a crucial role throughout the proof of Theorem 5.6.

Definition 7.24

Let $X_n=\{x_1,\ldots , x_n\}$ be sampled from a regular pair ${({\mathcal {F}},P_K)}$ and $M_n\in {\mathcal {M}}_{m,n}^o$ be the correspondin data matrix. Define $\tau :K\rightarrow [0,1]^m$ and ${\hat{\tau }}_n:X_n\rightarrow [0,1]^m$ by

$$\begin{aligned} \tau (x)&{\mathop {=}\limits ^{\mathrm {def}}}(T_1(x),\ldots , T_m(x))\\ {\hat{\tau }}_n(x_a)&{\mathop {=}\limits ^{\mathrm {def}}}\left( \frac{{\mathrm{ord}}_1(M_n,a)-1}{n}, \cdots , \frac{{\mathrm{ord}}_m(M_n,a)-1}{n}\right) . \end{aligned}$$

In order to prove Theorem 5.6, we first prove the following Lemmas (7.25–7.27).

Lemma 7.25

Let $\tau :K\rightarrow [0,1]^m$ and ${\hat{\tau }}_n:X_n\rightarrow [0,1]^m$ be defined as in Definition 7.24. Then

(i)
$\tau ^{-1}(Z_\infty )={\mathrm{Cent}}_0$,
(ii)
${\hat{\tau }}_n^{-1}({\hat{Z}}_n) = \left\{ x_a\in X_n:a\in {\widehat{{\mathrm{Cent}}}}_0\left( M_n\right) \right\} $, and
(iii)
$\tau ^{-1}(Z_\infty )\cap X_n\subseteq {\hat{\tau }}_n^{-1}({\hat{Z}}_n)$.

Proof

To prove (i),

$$\begin{aligned} \tau ^{-1}(Z_\infty )&= \left\{ x\in K: \tau (x)\in Z_\infty \right\} \\&= \left\{ x\in K: P_K\left( \bigcap _{i\in [m]} K^{(i)}(T_i(x))\right) = 0\right\} \\&=\left\{ x\in K: \bigcap _{i\in [m]} K^{(i)}(T_i(x)) = \varnothing \right\} \\&=\left\{ x\in K: \bigcap _{i\in [m]} f_i^{-1}(-\infty ,f_i(x)) = \varnothing \right\} ={\mathrm{Cent}}_0. \end{aligned}$$

To prove (ii),

$$\begin{aligned} {\hat{\tau }}_n^{-1}({\hat{Z}}_n)&= \{x_a\in X_n:{\hat{\tau }}_n(x_a)\in {\hat{Z}}_n\}\\&=\left\{ x_a\in X_n: \bigcap _{i\in [m]}\{b\in [n]:{\mathrm{ord}}_i(M_n,b)\le {\mathrm{ord}}_i(M_n,a)-1\} = \varnothing \right\} \\&=\left\{ x_a\in X_n: a\in [n], \bigcap _{i\in [m]} \{b\in [n]:M_{ib}<M_{ia}\}=\varnothing \right\} \\&= \{x_a\in X_n:a\in {\widehat{{\mathrm{Cent}}}}_0(M_n)\}. \end{aligned}$$

To prove (iii), assume $x_a\in \tau ^{-1}(Z_\infty )\cap X_n$. Then $\bigcap _{i\in [m]} f_i^{-1}(-\infty ,f_i(x_a)) = \varnothing $. Thus

$$\begin{aligned} {\hat{R}}_n({\hat{i}}_n(x_a))&= {\hat{R}}_n\left( \frac{{\mathrm{ord}}_1(M_n,a)-1}{n},\ldots , \frac{{\mathrm{ord}}_m(M_n,a)-1}{n}\right) \\&= \frac{1}{n}\cdot \#\left( X_n\cap \left( \bigcap _{i\in [m]} K_n^{(i)}\left( \frac{{\mathrm{ord}}_i(M_n,a)-1}{n}\right) \right) \right)&\text {(by Lemma}~7.5)\\&= \frac{1}{n}\cdot \#\left( X_n\cap \left( \bigcap _{i\in [m]} f_i^{-1}(-\infty ,f_i(x_a))\right) \right)&\text {(by Definition}~7.3)\\&=0. \end{aligned}$$

Hence, $x_a\in {\hat{\tau }}_n^{-1}({\hat{Z}}_n)$ and the inclusion in (iii) follows. $\square $

Lemma 7.26

Let $\tau :K\rightarrow [0,1]^m$ and ${\hat{\tau }}_n:X_n\rightarrow [0,1]^m$ be defined as in Definition 7.24. Then, for any $\delta >0$, as $n\rightarrow \infty $, w.h.p., ${\hat{\tau }}_n^{-1}({\hat{Z}}_n)\subseteq \tau ^{-1}(Z_\infty +\delta )$.^{Footnote 19}

Proof

By Corollary 7.14, w.h.p.,

$$\begin{aligned} {\hat{Z}}_n\subseteq Z_\infty +\delta /2. \end{aligned}$$

(53)

If $Z_\infty = [0,1]^m$, then $\tau ^{-1}(Z_\infty +\delta ) = K$ and hence ${\hat{\tau }}_n^{-1}({\hat{Z}}_n)\subseteq \tau ^{-1}(Z_\infty +\delta )$ clearly holds. Now suppose $Z_\infty \ne [0,1]^m$. Since $Z_\infty $ is closed, $int([0,1]^m\setminus Z_\infty )$ is nonempty and open, say containing $x_0$. Choose $\epsilon >0$ such that $\epsilon <\delta $ and $B(x_0,\sqrt{d}\epsilon )\subseteq int([0,1]^m\setminus Z_\infty )$. Then $Z_\infty +\epsilon /2\subsetneqq Z_\infty +\epsilon $. Thus $\mu _* {\mathop {=}\limits ^{\mathrm {def}}}d_H(Z_\infty +\epsilon /2, Z_\infty +\epsilon )/\sqrt{d}$ is a positive number, where $d_H$ is the Hausdorff distance.^{Footnote 20} Since $Z_\infty +\epsilon /2+\epsilon /2 = Z_\infty +\epsilon $, we have the inequality

$$\begin{aligned} \mu _*\le \sqrt{d\cdot (\epsilon /2)^2}/\sqrt{d} = (\epsilon /2)<\delta /2. \end{aligned}$$

(54)

By Eq. (41) and definition of $\tau $ and ${\hat{\tau }}_n$, w.h.p.,

$$\begin{aligned} \sup _{x\in X_n}\Vert \tau (x)-{\hat{\tau }}_n(x)\Vert _2 <\mu _*/2. \end{aligned}$$

(55)

There is one more property of $Z_\infty $, following from monotonicity of $R_\infty $, that we need in this proof: if $(t_1,\ldots , t_m)\in Z_\infty $ and $(t_1',\ldots , t_m')\le (t_1,\ldots , t_m)$, then $(t_1',\ldots , t_m')\in Z_\infty $; in short, $Z_\infty $ is closed under $\le $.

Now we can prove the inclusion. Assume $x\in {\hat{\tau }}_n^{-1}({\hat{Z}}_n)$, namely, ${\hat{\tau }}_n(x)\in {\hat{Z}}_n$. Then

$$\begin{aligned} \begin{aligned} \tau (x)&\in {\hat{\tau }}_n(x)+\overline{B(0,\mu _*/2})\quad \text {(by} (55)) \\&\subseteq {\hat{Z}}_n+\overline{B(0,\mu _*/2})\quad \text {(since }{\hat{\tau }}_n(x)\in {\hat{Z}}_n) \\&\subseteq Z_\infty +\delta /2+\overline{B(0,\mu _*/2})\quad \text {(by }(53))\\&\subseteq Z_\infty +\delta /2+\mu _*/2\quad \text {(since }Z_\infty \text { is closed under }\le )\\&\subseteq Z_\infty +\delta /2+\delta /4 \quad \text {(by }(54))\\&\subseteq Z_\infty +\delta . \end{aligned} \end{aligned}$$

Therefore, $x\in \tau ^{-1}(Z_\infty +\delta )$. Since $x\in {\hat{\tau }}_n^{-1}({\hat{Z}}_n)$ is arbitrary, the proof is complete. $\square $

Lemma 7.27

Let $X_n\subset K$ be a point cloud of size n, sampled from a regular pair ${({\mathcal {F}},P_K)}$. Let $\tau $ be defined as in Definition 7.24. Then, for all $\epsilon >0$, there exists $\delta _0>0$ such that, as $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned} \left| \frac{\#(X_n\cap \tau ^{-1}(Z_\infty +\delta _0))}{n}-P_K({\mathrm{Cent}}_0)\right| <\epsilon /2. \end{aligned}$$

Proof

Let us first prove that $\lim _{\delta \searrow 0} P_K(\tau ^{-1}(Z_\infty +\delta )) = P_K({\mathrm{Cent}}_0)$. Since $Z_\infty $ is closed, $Z_\infty +\delta \searrow Z_\infty $ as $\delta \searrow 0$. Thus, $\tau ^{-1}(Z_\infty +\delta )\searrow \tau ^{-1}(Z_\infty )$. Therefore, by monotone convergence theorem, $\lim _{\delta \searrow 0} P_K(\tau ^{-1}(Z_\infty +\delta ))= P_K(\tau ^{-1}(Z_\infty )) = P_K({\mathrm{Cent}}_0)$.

Since $\lim _{\delta \searrow 0} P_K(\tau ^{-1}(Z_\infty +\delta )) = P_K({\mathrm{Cent}}_0)$, we may choose $\delta _0$ such that

$$\begin{aligned} |P_K(\tau ^{-1}(Z_\infty +\delta _0))-P_K({\mathrm{Cent}}_0)|<\epsilon /4. \end{aligned}$$

(56)

Note that $X_n\cap \tau ^{-1}(Z_\infty +\delta )$ is an i.i.d. sample of $\tau ^{-1}(Z_\infty +\delta )\subseteq K$. Thus, by law of large numbers, as $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned} \left| \frac{\#(X_n\cap \tau ^{-1}(Z_\infty +\delta _0))}{n}-P_K(\tau ^{-1}(Z_\infty +\delta _0))\right| <\epsilon /4. \end{aligned}$$

(57)

Combining Eq. (56) and (57), the result follows. $\square $

With the help of previous lemmas, we give a proof of Theorem 5.6.

Proof of Theorem 5.6

Let $\tau $ and ${\hat{\tau }}_n$ be defined as in Definition 7.24. Let $\epsilon >0$. Note that $X_n\cap \tau ^{-1}(Z_\infty )$ is an i.i.d. sample of $\tau ^{-1}(Z_\infty )={\mathrm{Cent}}_0\subseteq K$. Thus, by law of large numbers, as $n\rightarrow \infty $, w.h.p., $\left| \#(X_n\cap \tau ^{-1}(Z_\infty ))/n-P_K({\mathrm{Cent}}_0)\right| <\epsilon /2$. Therefore, as $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned} \begin{aligned} P_K({\mathrm{Cent}}_0)-\epsilon /2&<\#(X_n\cap \tau ^{-1}(Z_\infty ))/n\le \#({\hat{\tau }}_n^{-1}({\hat{Z}}_n))/n \quad \text {(by (iii) of Lemma}~7.25) \\&=\#({\widehat{{\mathrm{Cent}}}}_0(M_n))/n \quad \text {(by Lemma}~7.25) \end{aligned} \end{aligned}$$

On the other hand, as $n\rightarrow \infty $, w.h.p.,

$$\begin{aligned} \begin{aligned} \frac{\#({\widehat{{\mathrm{Cent}}}}_0(M_n))}{n}&= \#({\hat{\tau }}_n^{-1}({\hat{Z}}_n))/n \quad \text {(by Lemma}~7.25)\\&= \#({\hat{\tau }}_n^{-1}({\hat{Z}}_n)\cap X_n)/n \quad \text {(since }{\hat{\tau }}_n^{-1}({\hat{Z}}_n)\subseteq X_n)\\&\le \#(\tau ^{-1}(Z_\infty +\delta _0)\cap X_n)/n \quad \text {(by Lemma}~7.26)\\&\le P_K({\mathrm{Cent}}_0)+\epsilon /2 \quad \text {(by Lemma}~7.27). \end{aligned} \end{aligned}$$

Therefore, as $n\rightarrow \infty $, w.h.p., $\left| \frac{\#({\widehat{{\mathrm{Cent}}}}_0(M_n))}{n}-P_K({\mathrm{Cent}}_0)\right| \le \epsilon /2$. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, MC., Itskov, V. A topological approach to inferring the intrinsic dimension of convex sensing data. J Appl. and Comput. Topology 6, 127–176 (2022). https://doi.org/10.1007/s41468-021-00081-3

Download citation

Received: 21 December 2020
Revised: 15 August 2021
Accepted: 04 October 2021
Published: 11 November 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s41468-021-00081-3

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A topological approach to inferring the intrinsic dimension of convex sensing data

Abstract

Similar content being viewed by others

Banach Space Theory

Theorems of Carathéodory, Helly, and Tverberg Without Dimension

Optimal Design of Sensors via Geometric Criteria

1 Introduction

Definition 1.1

1.1 Motivation from neuroscience

1.2 The geometry of convex sensing data

Definition 1.2

Lemma 1.3

Proof

Corollary 1.4

Corollary 1.5

Example 1

Proof

1.3 Dimension inference in a convex sensing problem

Definition 1.6

Definition 1.7

Definition 1.8

2 Empirical Dowker complex and the interleaving convergence theorem

Definition 2.1

Definition 2.2

Example 2.3

Lemma 2.4

Lemma 2.5

Proof

Definition 2.6

Definition 2.7

Definition 2.8

Definition 2.9

Theorem 2.10

3 Estimating the stimulus space dimension

3.1 Persistence modules and maximal persistence length

Definition 3.1

Theorem 3.2

Definition 3.3

3.2 The length \(L_k{({\mathcal {F}},P_K)}\) and its relation to the dimension of \({({\mathcal {F}},P_K)}\)

Definition 3.4

Definition 3.5

Lemma 3.6

Proof

3.3 The length \(L_k(M)\) and its convergence to \(L_k{({\mathcal {F}},P_K)}\)

Definition 3.7

Theorem 3.8

Definition 3.9

Corollary 3.10

Proof

3.4 Algorithm for \(\hat{{\mathcal {K}}}_a\) and \(L_k(M)\)

3.5 How to use the algorithms in different situations

3.5.1 Subsample points when n is sufficiently large

3.5.2 Subsample functions when m is large

4 Complete regular pairs and \({\hat{d}}_{\mathrm{low}}(M,\varepsilon )\) as an asymptotically consistent dimension estimator

Definition 4.1

Definition 4.2

Theorem 4.3

Theorem 4.4

Proof

4.1 Proof of Theorem 4.3

Lemma 4.5

Proof of Theorem 4.3

Lemma 4.6

Proof

Proof of Lemma 4.5

5 Testing the completeness of \({({\mathcal {F}},P_K)}\) from sampled data

Definition 5.1

Definition 5.2

Lemma 5.3

Lemma 5.4

Proof

Definition 5.5

Theorem 5.6

5.1 Proof of Lemma 5.3

Lemma 5.7

Proof

Lemma 5.8

Proof

Lemma 5.9

Lemma 5.10