1 Introduction

This paper studies statistical properties of graph-based methods in unsupervised learning, and specifically the use of Cheeger cuts for clustering. When data points are samples from some underlying distribution on a continuum domain and graphs are proximity graphs (e.g. \(\varepsilon \) or k-NN graphs), most of the existing literature establishes asymptotic consistency of these methods without providing any rates of convergence. In this paper, we obtain new estimates for the graph total variation functional on data clouds that, when combined with recent quantitative isoperimetric inequalities on manifolds, allow us, for the first time, to provide high probability convergence rates for both the Cheeger constant (minima) and corresponding Cheeger cuts (minimizers) on graphs built from random data clouds.

The Cheeger cut methodology belongs to a broad family of graph-based methods in data analysis. These are learning procedures that typically rely on the optimization of objective functions that depend on a similarity graph \(G=({\mathcal {M}}_n,w)\) on the observed data set \({\mathcal {M}}_n= \{ x_1, \dots , x_n\}\) and whose role is to capture geometric information from \({\mathcal {M}}_n\). Here \(w=(w_{ij})_{i,j=1}^n\) represents a weight matrix quantifying the level of similarity between different data points. Our focus is on unsupervised learning, where the only information on the data available is the graph G itself and no labeling information is present. In such a scenario, a central goal of learning methods is to use the similarity graph G to construct a meaningful summarized representation of the data set into clusters. Heuristically, a cluster is a subset of \({\mathcal {M}}_n\) whose elements are more similar among themselves than when compared with those in other groups. This heuristic can be made more precise by describing clusters as groups of points that are well-connected among themselves and less so with elements from other groups. A concrete mathematical way to use the similarity graph G to capture this intuitive notion is to look for subsets A for which the cut functional

$$\begin{aligned} \mathrm {Cut}(A):= \sum _{x_i \in A, x_j \in {\mathcal {M}}_n \setminus A} w_{ij}, \quad A \subseteq {\mathcal {M}}_n \end{aligned}$$

is small. As can be seen from the above formula, \(\mathrm {Cut}\) penalizes subsets of \({\mathcal {M}}_n\) that have big interactions with their complements. An alternative name usually given to this functional is graph perimeter, but here we will save this name to refer to an appropriately rescaled version of \(\mathrm {Cut}\).

The cut functional has been studied for decades in mathematics and computer science, e.g. [2, 3, 8, 10, 35, 51, 55, 59, 71, 74, 76, 81, 83]. For example, a celebrated result in optimization relates the \(\mathrm {Cut}\) minimization problem, subject to hard membership constraints, with a maximum flow problem, a linear program, via duality theory [35]. In the context of clustering, it is worth noticing that, while direct minimization of \(\mathrm {Cut}\) is reasonable as it penalizes the size of the interface between sets, optimization of \(\mathrm {Cut}\) on its own is not able to rule out partitions into groups that are highly asymmetric in terms of size, and thus in order to avoid undesirable partitions, the cut functional must be modified appropriately. For this purpose extra balancing terms are added to the objective function in either additive or multiplicative form so as to penalize “volume” asymmetry. Some prototypical examples of optimization problems in the family of balanced cuts are

$$\begin{aligned}&\min _{A\subseteq {\mathcal {M}}_n} \frac{\mathrm {Cut}(A)}{\min \{\mathrm {vol}_{{\mathcal {M}}_n}(A), \mathrm {vol}_{{\mathcal {M}}_n}(A^c) \}}, \end{aligned}$$
(1.1)
$$\begin{aligned}&\min _{A \subseteq {\mathcal {M}}_n } \frac{\mathrm {Cut}(A)}{\mathrm {vol}_{{\mathcal {M}}_n}(A) \cdot \mathrm {vol}_{{\mathcal {M}}_n}(A^c) } \end{aligned}$$
(1.2)

or

$$\begin{aligned} \min _{A \subseteq {\mathcal {M}}_n } \mathrm {Cut}(A) + \gamma \left( \mathrm {vol}_{{\mathcal {M}}_n}(A)^2 + \mathrm {vol}_{{\mathcal {M}}_n}(A^c)^2 \right) . \end{aligned}$$
(1.3)

The problems (1.1) and (1.2) are known as Cheeger cut and ratio cut optimization problems respectively (see [22, 81]). Problem (1.3) is known to be equivalent to the problem of graph modularity clustering (see [6, 56]); in all the above formulas \(\mathrm {vol}_{{\mathcal {M}}_n}(\cdot )\) stands for the number of elements in a given set divided by n. We observe that, in essence, all the above are \(\mathrm {Cut}\) minimization problems (i.e. graph perimeter minimization) with some form of soft volume constraint. We also remark that in (1.1),(1.2) and (1.3) the target is a two-way partitioning of the data, meaning that the desired partition consists of exactly two sets, namely A and \(A^c\), but it is possible to formulate analogous optimization problems in the multi-way clustering setting (e.g. [6, 46, 56]). In the sequel we focus on the two-way clustering setting.

Although each of (1.1),(1.2) and (1.3) are quite natural objectives for clustering due to their geometric character, graph partitioning problems are, in their full generality, known to be NP-hard [58]. Accordingly, in the past two decades a lot of the theoretical effort in the graph-based learning communities focused on analyzing relaxations of these partitioning methods. For example, in the case of the ratio cut functional (1.2), up to a multiplicative constant, the energy associated with \(A \subset {\mathcal {M}}_n\) can be rewritten as

$$\begin{aligned} \frac{\sum _{i,j}w_{ij}|u(x_i) - u(x_j)|^2}{\sum _{i=1}^n | u(x_i) - \frac{1}{n}\sum _{j=1}^n u(x_j)|^2}, \end{aligned}$$
(1.4)

where \(u = \mathbbm {1}_{A}\) (assuming weights \(w_{ij}\) are symmetric). One relaxes the problem by allowing u to be an arbitrary function \(u: {\mathcal {M}}_n \rightarrow {\mathbb {R}}\). Upon minimizing this relaxed functional, one recovers the variational description for the first non-trivial eigenpair (i.e. the Fiedler eigenpair) of the unnormalized graph Laplacian. This relaxed problem is convenient from a computational point of view as it only requires an eigensolver for a positive semidefinite matrix, namely the graph Laplacian. One can subsequently find an appropriate sublevel set via thresholding the Fiedler eigenvector in order to cluster the data into two sets. This relaxation procedure for constructing graph partitions can be motivated by graph theoretical results such as Cheeger’s inequality in its graph version [62] which, just as in its continuum counterpart due to Cheeger [20], relates the Cheeger constant of the graph (i.e. the minimum value of the Cheeger cut) with the first non-trivial eigenvalue of the graph Laplacian. In addition, higher-order eigenmodes of the graph Laplacian are useful for multi-way clustering: this idea lies at the heart of the celebrated spectral clustering algorithm [67]. The spectrum of the graph Laplacian can also be used to extract multiscale geometric information from the original data set \({\mathcal {M}}_n\) by means of appropriate diffusion maps [27, 28, 69].

The general utility of these Laplacian-based methods can be seen by the wide body of literature analyzing their statistical properties in supervised, semi-supervised and unsupervised settings. Many of these works aim to provide specific connections between these graph-based methodologies with analogous geometric and analytical notions on manifolds and other objects at the continuum level; this goal is summarized with the term manifold learning. A typical model assumption of analysis in this field is to think of points \({\mathcal {M}}_n=\{ x_1, \dots , x_n \}\) as samples from some distribution supported on a low dimensional manifold \({\mathcal {M}}\) embedded in an ambient space \({\mathbb {R}}^d\), and to consider weights w that are determined by the proximity of points in \({\mathbb {R}}^d\). Namely, high weights are given to pairs of points that are close to each other. A popular construction involves the choice of a connectivity parameter \(\varepsilon >0\), defining weights by \(w_{ij}= \eta \left( \frac{|x_i-x_j|}{\varepsilon } \right) \), where \(\eta \) is a non-increasing function decaying fast enough to zero. Over the past two decades many theoretical results have been obtained for the asymptotic consistency of graph Laplacians towards continuum limits. These include pointwise consistency [13, 16, 53, 73], and stronger spectral consistency [4, 12, 14, 39, 43, 82] which were more suitable for the proposed learning methods. Subsequently, these analytical tools have facilitated a deeper understanding of various statistical methods [15, 32, 33, 40, 41].

In addition to this theoretical development of graph Laplacian methods, in the past decade there has been a renewed interest among researchers in analyzing the original un-relaxed cut functional and its use for data clustering. On the one hand, this was driven by significant theoretical activity in the image processing communities, where total variational functionals provide natural energies for describing sharp edges (see [18] for a broad overview). On the other hand, new algorithmic improvements for total variation minimization made the original family of balanced cut optimization problems more accessible [9, 10, 19, 54, 55, 76]. This renewed interest also motivated further theoretical analysis from a statistical point of view. Analyzing balanced cut type problems, such as (1.1) and (1.2), is difficult because the associated optimization problems are highly non-convex, with an objective function that depends non-trivially on random data. One approach to analyzing large sample properties of the Cheeger cut problem is given in [2, 66]. These works consider the minimization of the Cheeger cut functional on a \(\varepsilon \)-graph, but only over subsets of \({\mathcal {M}}_n\) that are restrictions to the data set of sufficiently regular subsets of the underlying manifold \({\mathcal {M}}\). In this constrained setting, the classical approaches of statistical learning that rely on the computation of VC dimensions or other capacity measures were used to deduce convergence rates of the “restricted" discrete Cheeger constant towards a corresponding counterpart at the manifold level. To our knowledge these are the only works which obtain convergence rates for Cheeger constants based upon continuum sampling. However, in many practical settings the manifold \({\mathcal {M}}\) is unknown, and hence the constraints imposed in [2] may be difficult to guarantee in applications.

Variational approaches to analyzing cuts on graphs in the large data limit were, to our knowledge, first introduced in [79]. There, the authors established a link between discrete and continuum Cheeger cuts via \(\Gamma \)-convergence type arguments for a Ginzburg–Landau approximation of the cut functional defined on a graph. The analysis, however, required the data points to form a regular lattice, and in particular, it was not clear how to extend the ideas to data sampled from a distribution or more irregular point arrays. The work [42] introduced a framework, built upon both optimal transportation and variational methods, which rigorously connects the Cut functional on graphs (built from random data) and continuum total variation. These tools provide an almost optimal condition on the scaling of \(\varepsilon \) (i.e. the connectivity length scale of the proximity graph) in terms of n for the consistency to hold. In order to link the discrete and continuum energies and their minimizers, the authors introduced the so-called \(\mathrm {TL}^{p}\) distance. The key idea with these spaces is to consider couplings between measures and functions, i.e. \((\mu ,f)\in \mathrm {TL}^{p}\) if \(\mu \) is a probability measure and \(f\in \mathrm {L}^{p}(\mu )\). A metric is then defined between couplings \((\mu ,f)\), \((\nu ,g)\) that is equivalent to the Wasserstein distance between \((\mathrm {Id}\times f)_{\#}\mu \) and \((\mathrm {Id}\times g)_{\#}\nu \). With this topology in hand the authors provide a link between graph and continuum total variations via \(\Gamma \)-convergence (e.g. [7, 30]), a tool developed precisely in order to describe the convergence of variational problems. This analytical framework has been applied directly to the Cheeger (and other) cut problems in [44, 46], but without rates. It has also been applied to the Ginzburg–Landau approximation to extend the results in [79] to the random data setting in [29, 78]. In order to avoid non-degeneracy of a graph total variation type functional, the connectivity radius \(\varepsilon \) cannot scale to zero too quickly. Originally this required the connectivity radius \(\varepsilon \) to be much greater than \(\infty \)-Wasserstein distance between the empirical measure \(\nu _n=\frac{1}{n}\sum _{i=1}^n \delta _{x_i}\) and the data generating measure \(\nu \) (i.e. \(\varepsilon \gg d_{\mathrm {W}^{\infty }}(\nu _n,\nu )\)). In dimensions \(d\geqq 3\) the connectivity of the graph is of the same order as \(d_{\mathrm {W}^{\infty }}(\nu _n,\nu )\), and for \(d=2\) there is a logarithmic correction factor which meant a small gap between the two [45, 68]. In [65] the exponent for the two dimensional case was improved, and as of right now, consistency of Cheeger constants and cuts is known to hold provided the graph connectivity is asymptotically above the connectivity threshold. All of this without providing any quantitative rates of convergence.

In summary, the works that have studied the convergence of graph Cheeger constants and corresponding Cheeger cuts towards continuum counterparts have done so without providing high probability convergence rates, while the only works that provide some convergence rates for Cheeger constants (and not for their associated optimal cuts) do so by modifying the original problem in a manner that is not fully satisfactory from the point of view of applications. Thus, the question of finding high probability convergence rates for graph Cheeger constants and cuts on proximity graphs, to the authors’ knowledge, has not been addressed anywhere in the literature. Our goal is to provide a first work in this direction.

In this paper we present high probability rates for the convergence of both Cheeger constants (i.e. minima) and corresponding Cheeger cuts (i.e. minimizers) towards their analogous continuum counterparts. Our approach will strongly highlight how analytical ideas can help provide answers to problems that have been elusive using traditional tools from statistics or statistical learning theory. Our contribution will begin by establishing new estimates on the graph total variation functional on proximity graphs. These estimates will allow us to obtain high probability convergence rates for Cheeger constants. Our second contribution will be to connect the problem of convergence of minimizers (i.e. cuts) to recent results in geometric measure theory related to quantitative isoperimetric inequalities. Quantitative isoperimetric inequalities are a family of relations that hold at a continuum level (i.e. in \({\mathbb {R}}^d\) or on generic manifolds). Heuristically, they capture a type of “strong local convexity” of the volume constrained perimeter minimization problem (at the continuum level) near minimizers. One of the morals we hope to convey in this paper is that in the geometric setting, where continuum variational problem often possesses significant structure, it is often possible to deduce strong statistical properties of data analysis procedures, which are defined at the discrete, finite sample level. We firmly believe that the bridge between data analysis and mathematical analysis is both necessary and promising.

Finally, it is worth saying that we do not claim optimality of our convergence rates, and in fact we believe that further improvement to our results is possible.

1.1 Outline

The rest of the paper is organized as follows: in Section 2 we present our problem setup and state our main results precisely. In particular, Sections 2.1 and 2.2 describe the discrete and continuum problems that we aim to connect in this paper. In Section 2.3.1 we present our first main result (Theorem 2.2) stating the high probability convergence rates of Cheeger constants (i.e. convergence of minima). In Section 2.3.2 we take a small detour and present a short discussion on isoperimetric problems, and discuss the connection of these problems with the problem of establishing convergence rates for Cheeger cuts (i.e. convergence of minimizers). Section 2.3.3 contains our convergence rates for Cheeger cuts (Theorem 2.15). Section 2.4 expands the discussion on isoperimetric stability and provides pointers to the literature.

Section 3 contains a series of analytical results on non-local total variation energies. We wrap up the paper in Section 4 where we present the proofs of our main results.

2 Problem Setup and Main Results

Throughout the paper \({\mathcal {M}}\) will denote a smooth, connected, orientable, compact, Riemannian manifold of dimension m, without boundary, embedded in \({\mathbb {R}}^d\). In particular, the Riemannian metric tensor on \({\mathcal {M}}\) is the one inherited from \({\mathbb {R}}^d\). This will allow us to relate the geodesic distance on \({\mathcal {M}}\) with the Euclidean metric for points on \({\mathcal {M}}\) that are close enough. We use \(\mathrm {vol}_{\mathcal {M}}\) to represent \({\mathcal {M}}\)’s volume form, and without loss of generality assume that the manifold \({\mathcal {M}}\) is normalized so that \(\mathrm {vol}_{\mathcal {M}}({\mathcal {M}})=1\), and in particular \(\nu :=\mathrm {vol}_{{\mathcal {M}}}\) is a probability measure (the uniform distribution) on \({\mathcal {M}}\). We let \(i_{\mathcal {M}}>0\) be the injectivity radius of \({\mathcal {M}}\), which defines the maximum radius of a m-dimensional ball centered at the origin of an arbitrary tangent plane \({\mathcal {T}}_x{\mathcal {M}}\) for which the exponential map \(\exp _x: {\mathcal {T}}_x{\mathcal {M}}\rightarrow {\mathcal {M}}\) is a diffeomorphism. Other notions from differential geometry that are used in the paper are discussed in Chapters 1-3 in [31], and are introduced as needed.

Throughout the paper we will use letters like \(v , {\tilde{v}}\) to represent tangent vectors on \({\mathcal {M}}\). We will use \(u,{\tilde{u}}\) to represent discrete functions from \({\mathcal {M}}_n\) into \({\mathbb {R}}\), and use \(f,{\tilde{f}}\) to represent functions from \({\mathcal {M}}\) into \({\mathbb {R}}\). For two sets we write \(A \Delta B = (A \backslash B) \cup (B \backslash A)\) to denote their symmetric difference.

2.1 Discrete Set-up

We will use \({\mathcal {M}}_n=\{x_i\}_{i=1}^n\) to represent the discrete approximation of the manifold \({\mathcal {M}}\) via sampling from a distribution supported on \({\mathcal {M}}\). We use \(\nu _n = \frac{1}{n}\sum _{i=1}^n \delta _{x_i}\) to represent the associated empirical measure. The graph \(G_n\) is defined to be the set of vertices \({\mathcal {M}}_n\) with edge weights \(\{w_{ij}\}_{i,j=1}^n\), representing the similarities between nodes \(x_i\) and \(x_j\). Here we consider weights of the form

$$\begin{aligned} w_{ij} := \eta \left( \frac{|x_i-x_j|}{\varepsilon } \right) , \end{aligned}$$

where \(\varepsilon >0\) is a small parameter representing a length scale on which we consider points to be neighbors. We will assume for simplicity that the kernel \(\eta : {\mathbb {R}}^+ \rightarrow {\mathbb {R}}^+\) takes the form

$$\begin{aligned} \eta (t) = \mathbbm {1}_{\{t\leqq 1\}} = \left\{ \begin{array}{ll} 1 &{} \text {if } t\leqq 1 \\ 0 &{} \text {else} \end{array} \right. , \end{aligned}$$

although the analysis presented here can be extended to other choices of decreasing kernel with only minor modifications needed. We will, later on, rescale these edge weights in order to obtain appropriate asymptotic limits.

We will assume for simplicity that the points \(\{x_i\}_{i=1}^n\) are sampled from the distribution \(\nu \) with density given by \(\rho \equiv 1\) (with respect to the natural volume form \(\mathrm {vol}_{\mathcal {M}}\)). The extension to smooth, non-degenerate \(\rho \) does not cause any significant technical changes, but burdens the notation: we only consider the uniform case for clarity.

Given two arbitrary functions \(u, {\tilde{u}} : {\mathcal {M}}_n \rightarrow {\mathbb {R}}\) we define their inner product as

$$\begin{aligned} \langle u, {\tilde{u}} \rangle _{\mathrm {L}^{2}({\mathcal {M}}_n)}:= \frac{1}{n}\sum _{i=1}^n u(x_i) {\tilde{u}}(x_i), \end{aligned}$$

as well as the \(\mathrm {L}^{2}({\mathcal {M}}_n)\) norm

$$\begin{aligned} \Vert u \Vert _{\mathrm {L}^{2}({\mathcal {M}}_n)} := \sqrt{\langle u, u\rangle _{\mathrm {L}^{2}({\mathcal {M}}_n)}} \end{aligned}$$

and the \(\mathrm {L}^{1}({\mathcal {M}}_n)\) norm

$$\begin{aligned} \Vert u \Vert _{\mathrm {L}^{1}({\mathcal {M}}_n)} := \frac{1}{n}\sum _{i=1}^n |u(x_i)|. \end{aligned}$$

For a function \(u: {\mathcal {M}}_n \rightarrow {\mathbb {R}}\), we also define the graph total variation seminorm as

$$\begin{aligned} \mathrm {GTV}_{n,\varepsilon }(u):= \frac{1}{n^2\varepsilon ^{m+1}} \sum _{i=1}^n \sum _{j=1}^n w_{ij}| u(x_i) - u(x_j)|. \end{aligned}$$
(2.1)

We notice that when u is an indicator function of a subset \(E_n\) of \({\mathcal {M}}_n\) we get the following:

$$\begin{aligned} \mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{E_n}) = \frac{2}{n^2\varepsilon ^{m+1}} \mathrm {Cut}(E_n). \end{aligned}$$

In the remainder we will use \({\mathcal {C}}_{n,\varepsilon }\) to denote the rescaled graph Cheeger constant

$$\begin{aligned} {\mathcal {C}}_{n,\varepsilon }:= \min _{E_n \subseteq {\mathcal {M}}_n} \frac{\mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{E_n})}{ \min \{ \nu _n(E_n), 1 - \nu _n(E_n) \} }, \end{aligned}$$
(2.2)

and use \(A_n^*\) or \(E_n^*\) to denote an arbitrary minimizer.

2.2 Continuum Framework

We consider the classical spaces \(\mathrm {L}^{1}({\mathcal {M}})\) and \(\mathrm {L}^{2}({\mathcal {M}})\) with norms

$$\begin{aligned} \Vert f \Vert _{\mathrm {L}^{1}({\mathcal {M}})}:= & {} \int _{{\mathcal {M}}} |f(x) | \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x), \qquad \\ \Vert f \Vert _{\mathrm {L}^{2}({\mathcal {M}})}:= & {} \left( \int _{{\mathcal {M}}} |f(x) |^2 \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x)\right) ^{1/2}. \end{aligned}$$

For a given \(f \in \mathrm {L}^{1}({\mathcal {M}})\) we define its total variation seminorm as

$$\begin{aligned} \mathrm {TV}(f):= \sup \left\{ \langle \mathrm {div}(V) , f \rangle _{\mathrm {L}^{2}({\mathcal {M}})} \, : \, V \in {\mathfrak {X}}({\mathcal {M}}), \quad |V(x)|_x \leqq 1, \quad \forall x \in {\mathcal {M}}\right\} , \nonumber \\ \end{aligned}$$
(2.3)

and say that \(f \in \mathrm {BV}({\mathcal {M}})\) if \(\mathrm {TV}(f) < \infty \). In the above, \({\mathfrak {X}}({\mathcal {M}})\) denotes the set of all smooth vector fields on \({\mathcal {M}}\), \(\mathrm {div}\) is the divergence operator (on \({\mathcal {M}}\)) mapping smooth vector fields into real valued smooth functions on \({\mathcal {M}}\), and \(|\cdot |_x\) is the Riemannian norm in the tangent space at \(x\in {\mathcal {M}}\). We define the perimeter of a measurable \(E \subset {\mathcal {M}}\) via

$$\begin{aligned} {\mathcal {P}}(E) := \mathrm {TV}(\mathbbm {1}_E). \end{aligned}$$

We notice that the expression (2.3) reduces to

$$\begin{aligned} \mathrm {TV}(f)= \int _{{\mathcal {M}}} |\nabla f (x)|_x \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) , \end{aligned}$$
(2.4)

when f is a smooth function (i.e. \(f \in \mathrm {C}^{\infty }({\mathcal {M}})\)), where here \(\nabla f\) is the gradient (with respect to \({\mathcal {M}}\)’s Riemannian metric) of f. Likewise, the perimeter of a subset \(E \subseteq {\mathcal {M}}\) with smooth boundary \(\partial E\) can be written as

$$\begin{aligned} {\mathcal {P}}(E) = \int _{\partial E} \mathrm {d}{\mathcal {H}}^{m-1}(x), \end{aligned}$$

where \({\mathcal {H}}^{m-1}\) is the \(m-1\) dimensional Hausdorff measure on \({\mathcal {M}}\).

We recall that the coarea formula allows us to write the total variation of a \(\mathrm {BV}\) function in terms of the perimeter of its level sets. Namely, for a given \(f \in \mathrm {BV}({\mathcal {M}})\) we have

$$\begin{aligned} \mathrm {TV}(f) = \int _{-\infty }^{\infty } {\mathcal {P}}(\{ x \in {\mathcal {M}}\, : \, f(x) \leqq t \}\} ) \, \mathrm {d}t. \end{aligned}$$
(2.5)

In the remainder we will use \({\mathcal {C}}_{\mathcal {M}}\) to denote the Cheeger constant

$$\begin{aligned} {\mathcal {C}}_{\mathcal {M}}:= \min _{E \subseteq {\mathcal {M}}} \frac{{\mathcal {P}}(E)}{ \min \{ \nu (E), 1 - \nu (E) \} }, \end{aligned}$$
(2.6)

and use \(A^*\) or \(E^*\) to denote an arbitrary minimizer.

2.3 Main Results

For ease of reference we summarize the assumptions on the manifold \({\mathcal {M}}\) and the length scale \(\varepsilon \).

Assumption 2.1

Let \({\mathcal {M}}\) be a smooth, connected, orientable, compact, Riemannian manifold of dimension m, without boundary, embedded in \({\mathbb {R}}^d\). We assume that \(\varepsilon \leqq \varepsilon _0\) where \(\varepsilon _0>0\) is some sufficiently small parameter and, in particular, is smaller than the injectivity radius \(i_{\mathcal {M}}\).

We have two main results: the first is a rate of convergence for the Cheeger constants (Theorem 2.2) and the second is a rate of convergence for Cheeger cuts (Theorem 2.15). We give the result for the convergence of Cheeger constants in the next section. The convergence of Cheeger cuts requires more notation and assumptions than we have introduced thus far so we include in Section 2.3.2 a detour into isoperimetric stability before stating our rate of convergence for Cheeger cuts in Section 2.3.3.

2.3.1 Convergence Rates for Cheeger Constants

Our first main result establishes a quantitative convergence rate of the graph Cheeger constant (2.2) towards the Cheeger constant on the manifold \({\mathcal {M}}\) (2.6). The precise theorem is as follow:

Theorem 2.2

(Cheeger constants) Let \({\mathcal {M}}\) and \(\varepsilon \) satisfy Assumptions 2.1. Then, there exists constants (that may depend on \({\mathcal {M}}\)) \(\theta _0,\zeta _0,C_1,C_2,C,c,c'>0\) such that for any \(\delta ,\theta ,\zeta >0\), with \(n\zeta \varepsilon ^{\frac{m+1}{2}}\geqq c\), \(\delta \leqq \frac{\varepsilon }{4}\) , \( c' \log (n)/n \leqq \theta ^2 \delta ^m \), \(\theta \leqq \theta _0\) and \(\zeta \leqq \zeta _0\), we have that, with probability at least \(1- n \exp (-cn\theta ^2\delta ^m)-C \exp \left( -cn\zeta \min \{\varepsilon ^{\frac{m+1}{2}},\varepsilon \zeta \}\right) \),

  1. i)

    Upper bound:

    $$\begin{aligned} {\mathcal {C}}_{n,\varepsilon } \leqq \sigma _\eta {\mathcal {C}}_{\mathcal {M}}+ C_1\left( \varepsilon ^2 + \zeta \right) . \end{aligned}$$
  2. ii)

    Lower bound:

    $$\begin{aligned} \sigma _\eta {\mathcal {C}}_{\mathcal {M}}\leqq {\mathcal {C}}_{n,\varepsilon } + C_2 \left( \root 3 \of {\varepsilon } + \frac{\delta }{\varepsilon } + \theta + \zeta \right) , \end{aligned}$$

where \(\sigma _\eta \) is the “surface tension”

$$\begin{aligned} \sigma _\eta := \int _{{\mathbb {R}}^m}\eta (|z|)|z_1|\, \mathrm {d}z = \int _{B(0,1)} |z_1| \, \mathrm {d}z, \end{aligned}$$
(2.7)

and in the above, \(z_1\) stands for the first coordinate of a given vector \(z \in {\mathbb {R}}^m\).

The parameter \(\varepsilon \) has already been defined as the length scale determining when two data points should be considered neighbors. The parameters \(\delta \) and \(\theta \) control how fast the empirical measure \(\nu _n\) is converging. More precisely we consider a smooth approximation \({\widetilde{\nu }}_n\in {\mathcal {P}}({\mathcal {M}})\) of \(\nu \) with the properties that the density \({\widetilde{\rho }}_n\) of \({\widetilde{\nu }}_n\) satisfies \({\widetilde{\rho }}_n(x)\in [1-\delta -\theta ,1+\delta +\theta ]\) and \(d_{\mathrm {W}^{\infty }}(\nu _n,{\widetilde{\nu }}_n)\leqq \delta \) for every \(x\in {\mathcal {M}}\). Such a \({\widetilde{\nu }}_n\) exists with probability at least \(1-ne^{-cn\delta ^m\theta ^2}\). The reason for using the intermediary probability measure \({\widetilde{\nu }}_n\) is to get a better probability bound when \(m=2\); in particular, for \(m>2\) we could use the bound \(d_{\mathrm {W}^{\infty }}(\nu _n,\nu )\leqq \delta \) with probability at least \(1-Cn^{-\alpha }\) for any \(\alpha >1\) [39]. When \(m=2\) there is an additional logarithmic correction factor which makes the rates suboptimal (and rather cumbersome to continually give \(m=2\) a different rate). However, the introduction of the intermediary \({\widetilde{\nu }}_n\) avoids these problems and we recover optimal (up to constants) probability bounds.

The final parameter \(\zeta \) comes from a concentration inequality for U-statistics. Our result is approximately that for each function \(u:{\mathcal {M}}\rightarrow {\mathbb {R}}\) the difference between \(\mathrm {GTV}_{n,\varepsilon }(u)\) and \({\mathbb {E}}[\mathrm {GTV}_{n,\varepsilon }(u)]\) can be bounded by \(\zeta \) with high probability (where the “high probability” depends on \(\zeta \)).

Remark 2.3

Of course, one can choose the various parameters \(\delta ,\theta ,\zeta \) in order to obtain concrete convergence rates with high probability as a function of the number of data points n. In particular, if we neglect log terms, assume \(m\geqq 4\), and set the parameters as follows (by optimizing first the lower bound, and then subsequently the upper bound):

$$\begin{aligned} \delta&= n^{-k_\delta },\qquad&k_\delta&= \frac{2}{1+2m} \\ \theta&= n^{-k_\theta },\qquad&k_\theta&= \frac{1}{2(1+2m)} \\ \varepsilon&= n^{-k_\varepsilon },\qquad&k_\varepsilon&= \frac{3}{2(1+2m)} \\ \zeta&= n^{-k_\zeta },\qquad&k_\zeta&= \frac{3}{1+2m}. \\ \end{aligned}$$

This gives, for the lower bound, a rate of \(n^{-\frac{1}{2+4m}}\) and for the upper bound a rate of \(n^{-\frac{6}{2+4m}}\). We notice that in these rates \(\delta \ll \varepsilon \) and \(n\zeta \varepsilon ^{\frac{m+1}{2}}\gg 1\) as required, and that the argument of the exponentials in the probability estimates are all bounded away from zero (and hence could be made large by using appropriate logarithmic corrections). Hence, after neglecting these logarithmic terms, we obtain converge rates of the Cheeger constants of order \(n^{-\frac{1}{2+4m}}\).

We also notice that as long as \(\left( \frac{\log (n)}{n}\right) ^{1/m}\ll \varepsilon \), then the convergence of \({\mathcal {C}}_{n,\varepsilon }\) towards \(\sigma _\eta C_{\mathcal {M}}\) as \(\varepsilon \rightarrow 0\) and \(n \rightarrow \infty \) is guaranteed.

Remark 2.4

We recall that the optimization problem in (1.4) is a relaxation of the closely related ratio cut problem (1.2). In particular, the variational problems coincide when one restricts the minimization in (1.4) to be over functions of the form \(u=\mathbbm {1}_A\). By the Courant-Fischer-Weyl min-max principle the minimization of (1.4) over \(u:{\mathcal {M}}_n\rightarrow {\mathbb {R}}\) is equivalent to finding the Fiedler eigenpair of the graph Laplacian; in particular the minimum is the first non-zero eigenvalue and the minimizer the corresponding eigenvector. In [39] the authors use similar techniques to show that the eigenvalue converges with rate (ignoring logarithms) \(n^{-\frac{1}{2m}}\), which is better by approximately a squared factor (for large m and ignoring logarithms the rate for the Cheeger constant is \(n^{-\frac{1}{4m}}\)). To go between discrete and continuum both this paper and [39] use a smooth interpolating operator; in particular, a piecewise constant interpolation followed by mollification. The reason for the difference in rates is that in [39] the authors can choose the mollifying kernel based on the choice of interaction potential \(\eta \) that allowed a tighter control over the Dirichlet energy of the continuum approximation. Here, there is not a natural choice of mollifying kernel and hence we must introduce an extra length scale at which we can control the total variation of the continuum approximation. We also point out that the scaling in [39] is also suboptimal and has recently been improved to \(n^{-\frac{1}{m+4}}\) using different techniques [14].

As we will show in Section 4.1 the upper bound in Theorem 2.2 is obtained by comparing the graph total variation \(\mathrm {GTV}_{n,\varepsilon }\) and the total variation \(\mathrm {TV}\) of a fixed BV function \(f : {\mathcal {M}}\rightarrow {\mathbb {R}}\), namely, a minimizer \(f=\mathbbm {1}_{E^*}\) for the continuum Cheeger problem. The probabilistic estimates behind this comparison rely on general concentration inequalities for U-statistics from [47]. It will become clear from our analysis that the error estimates for the upper bound are much tighter than the ones for the lower bound, where a simple pointwise convergence estimate (i.e. fixing a function on \({\mathcal {M}}\)) does not suffice. To be able to prove the lower bound, one of the main technical tools that we introduce is the construction of an interpolation map that relates subsets of \({\mathcal {M}}_n\) with subsets of \({\mathcal {M}}\), in a way that is possible to keep track of the error of approximation in a quantitative form. This map is introduced in Section 4.2. Section 3 will lay the groundwork for this construction, and in particular we will prove several results relating the total variation functional \(\mathrm {TV}\) and the non-local TV seminorms \(\mathrm {TV}_h\) defined by

$$\begin{aligned} \mathrm {TV}_{h}(f):= & {} \frac{1}{h^{m+1}}\int _{\mathcal {M}}\int _{\mathcal {M}}|f(x) \nonumber \\&- f(y)| \eta \left( \frac{d_{\mathcal {M}}(x,y)}{h} \right) \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y), \quad f \in \mathrm {L}^{1}({\mathcal {M}}),\nonumber \\ \end{aligned}$$
(2.8)

where here and in the remainder, \(d_{\mathcal {M}}(x,y)\) denotes the geodesic distance between the points \(x,y \in {\mathcal {M}}\). It is worth remarking that there are some connections between the interpolation map that we define here and the one used in [12] when analyzing Dirichlet energies. We will discuss more on this in Section 3.

2.3.2 Convergence Rates for Cheeger Cuts: a Detour into Isoperimetric Stability

A significant part of this work is devoted to proving convergence rates of the minimal values of graph based TV energies towards the minimal values of TV energies. If our objective functions were a-priori strongly convex, one could immediately infer quantitative convergence rates for minimizers. For example, in the case of the Rudin-Osher-Fatemi [70] type problems, one seeks to minimize a functional given by \(J(u) = \mathrm {TV}(u) + \Vert u-u_0\Vert _{\mathrm {L}^{2}}^2\); here the strong convexity of the \(\mathrm {L}^{2}\) penalty could then be used to infer rates of convergence in \(\mathrm {L}^{2}\) using only information about the value of J. The same principle holds in the case of trend filtering models; arguments regarding how our tools apply in those settings to those problems will be given in future works.

However, for Cheeger cut problems it is not obvious in what sense the TV semi-norm is strongly convex (at least locally around minimizers). Fortunately, a significant body of recent work studying isoperimetric stability provides tools for addressing this issue. We will begin by describing the type of results from this work, and then give an overview of developments in this field. To begin, we must first give some geometric definitions.

Definition 2.5

We define the isoperimetric profile of our manifold \({\mathcal {M}}\) via

$$\begin{aligned} {\mathbb {I}}(\vartheta ) = \inf _{E \subseteq {\mathcal {M}}}\left\{ {\mathcal {P}}(E) : \mathrm {vol}_{\mathcal {M}}(E) = \vartheta \right\} . \end{aligned}$$

We define the Fraenkel asymmetry of a set via

$$\begin{aligned} \alpha (E) = \inf _{E^*}\left\{ \mathrm {vol}_{\mathcal {M}}(E \Delta E^*) : \mathrm {vol}_{\mathcal {M}}(E^*)= \mathrm {vol}_{\mathcal {M}}(E), {\mathcal {P}}(E^*) = {\mathbb {I}}(\mathrm {vol}_{\mathcal {M}}(E^*))\right\} . \end{aligned}$$

In the case of continuum Cheeger problem, any minimizer is a mass-constrained perimeter minimizer, and hence the Cheeger problem reduces to minimizing \(\frac{{\mathbb {I}}(\vartheta )}{\min (\vartheta ,1-\vartheta )}\) over \(\vartheta \in (0,1)\). The Fraenkel asymmetry measures the distance between a given set and the family of mass constrained perimeter minimizers. This turns out to be the natural \(\mathrm {L}^{1}\) type distance for this problem, because in many manifolds with symmetries there exists a continuous family of mass-constrained perimeter minimizers. One example is translates of a ball in \({\mathbb {R}}^d\). An analogous example on a compact manifold is given in Example 2.9.

Having defined the energy and distance of interest, we now define variations of the boundary of a set (an in-depth description of these objects can be found in, e.g., [61]).

Definition 2.6

Given \(E \subset {\mathcal {M}}\) with smooth boundary \(\partial E\), we let \({\mathcal {D}}_{E}\) be the family of all smooth \(\phi : {\mathcal {M}}\times [-1,1] \rightarrow {\mathcal {M}}\) so that \(\phi (x,0) = x\) and \(\mathrm {vol}_{\mathcal {M}}(\phi (E,t)) = \mathrm {vol}_{\mathcal {M}}(E)\). We let \(v_\phi = \frac{\mathrm {d}}{\mathrm {d}t} \phi (\cdot ,0)\) be the initial velocity of the diffeomorphism \(\phi \). This is the family of smooth diffeomorphisms which preserve the mass of E.

We let

$$\begin{aligned} {\mathcal {B}}_\delta (\partial E) = \left\{ f \in \mathrm {C}^{2,\alpha }(\partial E;{\mathbb {R}}) : |f|_{\mathrm {C}^{2,\alpha }} < \delta ,\quad \int _{\partial E} f \, \mathrm {d}\sigma _{g,E} = 0\right\} , \end{aligned}$$

where \(\mathrm {d}\sigma _{g,E}\) is the volume form that \(\partial E\) inherits from \({\mathcal {M}}\).

We let \(\varvec{n}_E\) be a vector field which restricted to \(\partial E\) is an outward unit normal vector field, and which is (in a \(\delta \) tube around \(\partial E\)) a geodesic vector field. For \(f \in {\mathcal {B}}_\delta (\partial E)\), we let \(E + f = \phi (E,1)\) for \(\frac{\mathrm {d}}{\mathrm {d}t} \phi = \varvec{n}_{E} {\tilde{f}}\), where \({\tilde{f}}\) is a smooth extension of f. In other words, \(E + f\) is the set obtained by flowing the boundary points of E a distance f along geodesics which are normal to E.

We write \(D{\mathcal {P}}(E)\) and \(D^2 {\mathcal {P}}(E)\) to represent the first and second variations of the functional \({\mathcal {P}}(E + f)\) for \(f \in {\mathcal {B}}_\delta (\partial E)\).

We remark that the previous definitions have used the assumed regularity on \(\partial E\), in particular when choosing \(\delta \) so that \(\varvec{n}_{E}\) will be a geodesic field. The regularity of mass constrained perimeter minimizers is a well-studied topic. In fact, the boundary of such a minimizer is known to be analytic at any point in the reduced boundary (namely where a weak normal vector can be defined); this is related to the fact that the boundary will have constant mean curvature, which can be converted into an equation of elliptic type. Furthermore, the set of points which are not in the reduced boundary for such minimizers, henceforth called singular points, is known to have dimension at most \(m-8\). Details of this theory may be found in [61], and further discussion is given later in this section.

Having stated this, we follow [21] and assume that \({\mathcal {M}}\) is such that the boundary of a mass constrained perimeter minimizer \(E^*\) has no singular points, meaning that it is \(\mathrm {C}^{\infty }\) (which only requires us to assume that it is \(\mathrm {C}^{3}\)); see Assumptions 2.14 below. We also notice that we have restricted to mean zero variations in the definition of \({\mathcal {B}}_\delta (\partial E^*)\) in order to preserve the volume of regions along variations.

In any case, the previous definition describes variations of a set in terms of deformations by smooth vector fields. Using these definitions of variations, one can easily establish necessary conditions for a set to be a local minimizer of the perimeter. In particular, we will be interested in the following necessary second-order conditions for local perimeter minimizers.

Definition 2.7

A smooth minimizer \(E^*\) of the mass constrained isoperimetric problem is called strict if there exists \(c>0\) such that for any \(f \in {\mathcal {B}}_\delta (\partial E^*)\)

$$\begin{aligned} D^2 {\mathcal {P}}(E^*) [f,f] \geqq c\Vert f \Vert _{\mathrm {W}^{1,2}(\partial E^*)}^2. \end{aligned}$$

A mass constrained perimeter minimizer \(E^*\) is called integrable if for any \(f \in {\mathcal {B}}_\delta (\partial E^*)\) with \(D^2 {\mathcal {P}}(E^*)[f,f] = 0\) there exists a diffeomorphism \(\phi \) satisfying \(\frac{\mathrm {d}}{\mathrm {d}t} \phi (\cdot ,0) = f\) and \(\phi (x,0) = x\) which satisfies \(\mathrm {vol}_{\mathcal {M}}(\phi (E^*,t)) = \mathrm {vol}_{\mathcal {M}}(E^*)\) such that \(\phi (E^*,t)\) is a critical point of the mass-constrained perimeter for all \(t \in (-1,1)\).

The definition of strict minimality can be seen as providing a type of Poincaré inequality for variations of the boundary of \(E^*\); the importance of such Poincaré inequalities is further demonstrated in Example 2.12. Indeed, strict minimality is stronger than being an isolated local minimizer.

For the sake of illustration, we now give two examples of perimeter minimizers which are, respectively, strict and integrable, followed by an example where the assumption fails.

Example 2.8

Suppose that \({\mathcal {M}}\) is the ellipsoid shown in Figure 1a. Then the Cheeger set will be a strict mass-constrained perimeter minimizer for mass equal to \(\mathrm {vol}_{\mathcal {M}}({\mathcal {M}})/2\).

Example 2.9

Suppose that \({\mathcal {M}}\) is a two dimensional torus embedded in \({\mathbb {R}}^3\), as shown in Figure 1b. The Cheeger set shown is a mass constrained perimeter minimizer for mass equal to \(v=\mathrm {vol}_{\mathcal {M}}({\mathcal {M}})/2\). In this case this minimizer is not strict, but will be integral, as there is a family of mass constrained perimeter minimizers given by rotating the set around the torus.

We note that the one-parameter family of mass-constrained perimeter minimizers in this case is associated with an underlying symmetry of the manifold. In the Euclidean case (i.e. \({\mathcal {M}}= {\mathbb {R}}^d\)), the translational symmetry always has to be accounted for in isoperimetric problems, and indeed provides significant motive to study integrable minimizers.

Example 2.10

Suppose that we modify the previous torus example so that the radius of the circles varies \(\pi \) periodically in \(\theta \) according to some function \(f(\theta )\). The Cheeger set, which corresponds to the global mass-constrained perimeter minimizer for volume equal to \(\mathrm {vol}_{\mathcal {M}}({\mathcal {M}})/2\), should partition the manifold at local minimizers of f. However, if the second derivative of f is zero at that local minimizer, then we should not have that such a mass-constrained perimeter minimizer is strict in the sense given above. This example relies on the \(\pi \)-symmetry of f in order to handle the mass-constrained variations; a degenerate well is not sufficient by itself.

Clearly not all manifolds have perimeter minimizers which are either strict or integrable, but the situation in Example 2.10 where the assumption fails seems somewhat degenerate. It is possible that many manifolds of interest will have Cheeger sets which satisfy these properties.

Fig. 1
figure 1

Examples of Cheeger sets that are (a) a strict mass constrained perimeter minimizer and (b) an integral mass constrained perimeter minimizer

With these definitions in hand, we now state the isoperimetric stability result that we will utilize in this paper.

Proposition 2.11

[21, Lemma 3.4] Suppose that \(E^*\subset {\mathcal {M}}\) is a global, mass-constrained perimeter minimizer with mass \(\vartheta \) and is both i) smooth (meaning it has empty singular set) and ii) either a strict perimeter minimizer or integrable. Then, there exists constants \(c,\delta >0\) such that for any measurable \(E \subseteq {\mathcal {M}}\) with \(\mathrm {vol}_{\mathcal {M}}(E) = \vartheta \) and \(\mathrm {vol}_{\mathcal {M}}(E \Delta E^*) < \delta \) one has

$$\begin{aligned} {\mathcal {P}}(E) - {\mathcal {P}}(E^*) \geqq c \alpha (E)^2. \end{aligned}$$
(2.9)

One heuristic way of describing this result, in the spirit of the viewpoint given in [84], is that the mass-constrained perimeter functional is strongly convex with respect to the \(\mathrm {L}^{1}\) norm. We will utilize this result to show that any set which has small “perimeter deficit”, i.e. which has perimeter close to a mass-constrained minimizer, must be close to a mass-constrained perimeter minimizer in the sense of Fraenkel distance, which is an \(\mathrm {L}^{1}\) type distance, with a clearly quantified estimate. It is worth noting that the result in [21] is significantly more general than the one given here. Indeed, in their work they also prove that for analytic manifolds there always exists some exponent \(\gamma _{\mathcal {M}}\), so that the previous proposition holds without assuming strict or integrable minimizers, under the modification that we replace 2 with \(\gamma _{\mathcal {M}}\). The analysis in our paper would continue to work in this more general setting, at the cost of now having an exponent that depends upon the manifold. For concreteness of presentation, we have opted to state the result in terms of a single exponent (i.e. 2) which does not depend upon the manifold \({\mathcal {M}}\), but at the cost of making additional assumptions on the minimizing Cheeger set.

Example 2.12

To better understand how one derives the estimates, it is useful to consider a toy example. Consider a domain \(S = (-1,1) \times (-M/2,M/2)\), with M large, and let us consider the isoperimetric problem with mass constrained to be equal to M. In that case one perimeter minimizer is given by the set \(E^* = \{(x,y) \in S \, : \,y<0, \quad x \in (-1,1)\}\). Now, consider a set \(E = \{ (x,y) \in S \, : \, y \leqq g(x), \quad x\in (-1,1)\}\) for some smooth g, which has mean zero and is compactly supported on \((-1,1)\). We can then write

$$\begin{aligned} {\mathcal {P}}(E;S) - {\mathcal {P}}(E^*;S) = \int _{-1}^1 \sqrt{1 + (g')^2} -1 \,\mathrm {d}x. \end{aligned}$$

The goal is to show that this quantity is greater than \(c(\int |g|\,\mathrm {d}x)^2\).

Restricting to the situation where \(g'\) is relatively small (which is the critical case for proving the inequality), and using a Taylor expansion, we find the bound:

$$\begin{aligned} {\mathcal {P}}(E) -{\mathcal {P}}(E^*) \geqq c \int _{-1}^1 (g')^2 \,\mathrm {d}x. \end{aligned}$$

In turn, using Poincare’s and Hölder’s inequality

$$\begin{aligned} {\mathcal {P}}(E) - {\mathcal {P}}(E^*) \geqq c\Vert g\Vert _{\mathrm {L}^{2}}^2 \geqq c\Vert g\Vert _{\mathrm {L}^{1}}^2 = c\mathrm {vol}_{\mathcal {M}}(E \Delta E^*)^2. \end{aligned}$$

The extension of the previous example to graphs of smooth functions, centered around smooth surfaces is entirely analogous. The biggest technical challenge in establishing results such as Proposition 2.11 is that one a-priori cannot reduce to the case where E is locally expressed as the graph of a smooth function, see further discussion at the end of the section.

In this work, we will be concerned with global minimizers of the continuum Cheeger energy, and hence with global mass-constrained perimeter minimizers. Thus we will rely on the following, slightly modified version of the previous proposition:

Proposition 2.13

Fix a mass \(0< \vartheta < 1\). Suppose that any global, mass-constrained perimeter minimizer with mass \(\vartheta \) is both i) smooth (meaning it has empty singular set) and ii) either a strict perimeter minimizer or integrable. Then, there exists a \(c>0\) so that for any \(\mathrm {vol}_{\mathcal {M}}(E) = \vartheta \) we have

$$\begin{aligned} {\mathcal {P}}(E) - {\mathbb {I}}(\vartheta ) \geqq c\alpha (E)^2. \end{aligned}$$

Proof

Suppose, for the sake of contradiction, that no such c exists. Then there exists a sequence of sets \(E_k\) satisfying \(\mathrm {vol}_{\mathcal {M}}(E_k)=\vartheta \) so that

$$\begin{aligned} \frac{{\mathcal {P}}(E_k) - {\mathbb {I}}(\vartheta )}{\alpha (E_k)^2} \rightarrow 0. \end{aligned}$$

By \(\mathrm {L}^{1}\) compactness of sets of finite perimeter, after taking a subsequence (not relabeled) we have that there exists a set \({\tilde{E}}\) satisfying \(\mathrm {vol}_{\mathcal {M}}(E_k \Delta {\tilde{E}}) \rightarrow 0\) and \({\mathbb {I}}(\vartheta ) = \liminf _k {\mathcal {P}}(E_k) \geqq {\mathcal {P}}({\tilde{E}}) \geqq {\mathbb {I}}(\vartheta )\). This implies that \({\tilde{E}}\) is a mass-constrained perimeter minimizer, and hence satisfies the assumptions of Proposition 2.11. But this then implies that, for k large enough, \(\frac{{\mathcal {P}}(E_k) - {\mathbb {I}}(\vartheta )}{\alpha (E_k)^2}> c>0\), where c is chosen as in that proposition. This is a contradiction, and concludes the proof.\(\quad \square \)

2.3.3 Convergence Rates for Cheeger Cuts: Results

With tools from the previous section in hand, we now state a final main result of our work. Before we do so, we state in detail the technical assumptions that we make on minimizers of the continuum Cheeger problem (discussion of these assumptions is given after the statement of the theorem).

Assumption 2.14

(Assumptions on Cheeger sets of \({\mathcal {M}}\))

  1. i)

    (Local strong convexity of isoperimetric profile) First, we define \(\Theta = {{\,\mathrm{arg\,min}\,}} \frac{{\mathbb {I}}(\vartheta )}{\min (\vartheta ,1-\vartheta )}\). We assume that for every \(\vartheta \in \Theta \) there exists a \(C>0\) and an \(\eta \) so that for any \(\vartheta ' \in (\vartheta -\eta ,\vartheta +\eta )\) we have that the function \(g(\vartheta ) = \frac{{\mathbb {I}}(\vartheta )}{\min (\vartheta ,1-\vartheta )}\) satisfies \(g(\vartheta ') \geqq g(\vartheta ) + C(\vartheta -\vartheta ')^2\).

  2. ii)

    Second, we assume that all of the minimizers of the mass constrained isoperimetric problem with masses in \(\Theta \) satisfy all of the assumptions in Proposition 2.13, meaning that they are at least \(\mathrm {C}^{3}\) and are either strict or integrable.

As in the case of the assumptions in Proposition 2.11, it is likely possible to weaken Assumption 2.14 i) as long as one is willing to permit the exponent in the results to depend upon the manifold. In particular, if the function \(\frac{{\mathbb {I}}(\vartheta )}{\min (\vartheta ,1-\vartheta )}\) can be bounded from below near its minimizers using a different exponent, then all of the analysis that we give here would continue to apply using that exponent. We conjecture that for any analytic manifold Assumption 2.14 i) holds as long as one is willing to replace 2 with a manifold dependent exponent. Again, as in the case of Proposition 2.11, we have opted to work in terms of a concrete (but natural), manifold independent exponent, at the cost of making additional assumptions on the manifold.

Using these assumptions we will then establish our main result regarding the convergence of Cheeger sets.

Theorem 2.15

(Asymptotic consistency of Cheeger cuts, with rates) Let \({\mathcal {M}}\) and \(\varepsilon \) satisfy Assumptions 2.1 and \({\mathcal {M}}\) satisfy Assumptions 2.14. Then, there exists constants (that may depend on \({\mathcal {M}}\)) \(\theta _0,\zeta _0,C_1,C_2,C,c,c'>0\) such that for any \(\delta ,\theta ,\zeta >0\), with \(n\zeta \varepsilon ^{\frac{m+1}{2}}\geqq c\), \(\delta \leqq \frac{\varepsilon }{4}\) , \( c' \log (n)/n \leqq \theta ^2 \delta ^m \), \(\theta \leqq \theta _0\) and \(\zeta \leqq \zeta _0\), we have that, with probability at least \(1- n \exp (-cn\theta ^2\delta ^m)-C \exp \left( -cn\zeta \varepsilon ^{\frac{m+1}{2}}\right) \), for any minimizer \(E_n^*\) of the discrete Cheeger energy there exists a minimizer \(E^*\) of the continuum Cheeger problem and a map \(T_n\) from \({\mathcal {M}}\) to \({\mathcal {M}}_n\) which satisfy

$$\begin{aligned} \Vert \mathbbm {1}_{E_n^*} \circ T_n - \mathbbm {1}_{E^*} \Vert _{\mathrm {L}^{1}({\mathcal {M}})} \leqq C\kappa ^{\frac{m-1}{4m}}, \qquad \Vert T_n - \mathrm {Id}\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} \leqq \delta , \end{aligned}$$
(2.10)

where

$$\begin{aligned} \kappa := \root 6 \of {\varepsilon } + \frac{\delta }{\varepsilon } + \theta + \zeta . \end{aligned}$$

Remark 2.16

Again, here we are permitted to make choices of the various parameters in order to determine convergence rates as a function of n. In particular, if we pick our parameters as in Remark 2.3 we get that, with high probability and after neglecting logarithms, \(\kappa = n^{-\frac{1}{4(1+2m)}}\). This in turn gives an \(\mathrm {L}^{1}\) convergence rate of the Cheeger cuts of order \(n^{-\frac{(m-1)}{16m(1+2m)}}\).

Remark 2.17

As for our rates established for the Cheeger constant, we believe that the rate of convergence for Cheeger cuts we achieve are not optimal. If, as in Remark 2.4, we compare with the eigenvalue problem (1.4) then [39] establishes a convergence rate of \(n^{-\frac{1}{4m}}\) (ignoring logarithms) for eigenvectors of the graph Laplacian, to be compared with an approximate rate of \(n^{-\frac{1}{32m}}\) we establish in Theorem 2.15. In particular, we have likely given up something in terms of rates in several ways when applying the mass-constrained isoperimetric stability results (see Remark 4.8). We do note that the stability estimates for isoperimetric inequalities are, in terms of their exponent, known to be sharp. In any case, demonstrating sharpness of convergence rates, given the highly non-linear nature of the optimization problem that we analyze, would require completely different tools.

Remark 2.18

A different type of convergence for minimizers can be deduced from Theorem 2.15, namely, for \(E^*\) in (2.10) with smooth boundary we can prove that (with probability at least \(1- n \exp (-cn\theta ^2\delta ^m)-C \exp \left( -cn\zeta \varepsilon ^{\frac{m+1}{2}}\right) \)):

$$\begin{aligned} \nu _n( E^* \Delta E_n^* ) \leqq C \kappa ^{\frac{m-1}{4m}} + C \delta . \end{aligned}$$

We show in Remark 4.7 how this inequality can be obtained from the analysis we develop for our proof of Theorem 2.15.

We make a few remarks about the statement of Theorem 2.15. The first condition in Assumption 2.14 is to guarantee that minimizers of the Cheeger problem are in some sense strict minimizers, at least with respect to variations in volume. We remark that this condition, along with well-known asymptotics [64] for the isoperimetric profile near zero and compactness of the perimeter functional, imply that \(\Theta \) contains at most finitely many elements. The assumption that minimizers have smooth boundary is natural and there are reasons to believe that this holds for all minimizers in certain classes of manifolds (see the discussion below).

We notice that Theorem 2.15 only proves closeness to some Cheeger set \(E^*\): the case of the torus in Example 2.9 makes it readily clear why this is necessary. We also remark that although the theorem was stated in terms of the global minimizer of the discrete energy, we could prove analogous bounds for approximate minimizers of the discrete Cheeger energy. More concretely, if \(J_n(E_n):= \frac{\mathrm {GTV}_{n\varepsilon }(\mathbbm {1}_{E_n})}{\min \{\nu _n(E_n),1-\nu _n(E_n)\}}\) and we knew that \(J_n(E_n) - \inf J_n < \gamma _n \rightarrow 0\), then we could provide an analogous estimate with a right hand side that depends upon \(\gamma _n\).

In this paper we have focused exclusively on proofs for the Cheeger problem. However, the techniques are extendable, mutatis mutandis, to other regularized cut problems such as the ratio cut in (1.2) and graph modularity clustering in (1.3). In particular, condition i) in Assumptions 2.14 would need to be appropriately adjusted to each cut problem.

2.4 Discussion of Literature for Isoperimetric Stability Problems

There is a long history of the study of the stability of isoperimetric problems. Early works established these types of inequalities in the case of smooth sets in \({\mathbb {R}}^2\) [5], or for smooth sets in \({\mathbb {R}}^d\) [52]. Another important early work was [84], which proposed the principle that any set which is stable in terms of second variation of the (mass-constrained) perimeter should be a local minimizer of the mass-constrained isoperimetric problem in a local, \(\mathrm {L}^{1}\) sense. However, their argument did not provide quantified estimates of this minimality. In the last ten years, many works have sought to quantify this relationship. The first of these works, [37] established the quantified relationship in the case of the ball in \({\mathbb {R}}^d\). A crucial element of their proof was the observation that by symmetrizing the set of interest (in the sense of a Steiner symmetrization) one can reduce the perimeter of the set without increasing its Fraenkel asymmetry by too much. After symmetrizing several times, one can reduce to the type of one-dimensional estimate that we presented in Example 2.12. A later, elegant alternative proof for the ball was obtained by using techniques from optimal transportation [34].

Isoperimetric problems in \({\mathbb {R}}^d\) turn out to be central to a number of problems in “hard analysis”. In particular, these estimates have been used to provide quantitative estimates of the stability of Sobolev [24], Polya-Szegő [23] and other functional inequalities [38].

The extension of the stability results for isoperimetric problems on \({\mathbb {R}}^d\) to more general settings is the focus of current research. These problems are often quite challenging, because the loss of explicit minimizers and symmetries renders many of the technical tools unusable. However, an alternative approach to stability for the classical isoperimetric problem is given in [25]. There they study the quotient \(\frac{{\mathcal {P}}(E)-{\mathcal {P}}(B)}{\alpha (E)^2}\), and demonstrate that for a fixed value of the denominator, the minimizer of the numerator is very well behaved (i.e. regular, and locally expressible as the graph of a function). The procedure of selecting minimizers of this quotient, and then studying their properties is known as a selection principle, and allows one to reduce the study of isoperimetric stability to the study of relatively simple sets. Technically, this reduction is accomplished by studying penalized variational problems, and using the regularity theory of minimizers of penalized isoperimetric problems [77]. Using this reduction, one can then establish the quantitative isoperimetric inequality using standard techniques [36], which are very similar to those we described in Example 2.12. The recent work [21], which we rely on in this paper, extends the techniques in [25] and [36] to the setting of Riemannian manifolds.

Very recently [26] utilizes similar ideas as the second part of this paper within the context of discrete to continuum bounds for crystallization energies. In particular, they study an (anisotropic) analog of the perimeter on certain classes of periodic lattices. In their work they use stability of the limiting anisotropic isoperimetric problem, proven in [34] and analogous to [21] in our setting, in order to bound a certain distance between discrete configurations and the continuum wulff set, in terms of sort of energy difference. This is directly analogous to the work that we do in Section 4.3, but in the context of a different energy as well as different motivation.

Central to many of these works is the question of regularity of minimizers for isoperimetric problems. The study of this problem is very classical. Necessary conditions for the mass-constrained isoperimetric problem in \({\mathbb {R}}^d\) require that, if solutions are smooth, they must be surfaces of constant mean curvature which are \(\mathrm {C}^{\infty }\) up to a set of measure \(d-8\) [48, 50]. There are examples of surfaces which possess constant mean curvature everywhere except on a set of dimension \(d-8\) [72]. In the context of isoperimetric problems on convex domains there are recent conjectures, which posit that such surfaces cannot be minimizers of the isoperimetric problem [57]. This conjecture is informed by earlier work in the context of convex domains [75], which strongly suggests that there are topological obstructions which prevent the example in [72] from being a global perimeter minimizer. Of course convex domains are in a sense much simpler than the manifolds we consider in this paper, for which, to our knowledge, very little is known about whether global minimizers to isoperimetric problems can be guaranteed to be regular. This then means that we can only be certain that the assumptions in Theorem 2.15 are met, in general, in the case where \({\mathcal {M}}\) is of small dimension.

In summary, there is a rich mathematical theory studying the minimizers of isoperimetric problems, and the stability of such minimizers. Our paper draws on this theory to provide new quantitative tools for the asymptotic consistency of statistical problems based upon geometric optimization.

3 Quantitative Estimates for Non-Local Operators

In this section we establish some results concerning the non-local TV seminorm (2.8). These results are extensions to the manifold case of somewhat well known results in the flat Euclidean case. The non-local TV seminorm is analogous to the non-local Dirichlet energy studied in [12]. In form, these two functionals differ only in the powers used for the integrand \(|f(x) - f(y)|\) and in their rescaling factors, but their properties are markedly different. The study of non-local Dirichlet energies was a fundamental piece in [12, 39] to obtain quantitative rates for the spectral convergence of graph Laplacians.

To fix some ideas, we first introduce a collection of definitions from differential geometry that will provide us with the right language to formalize our arguments. For the most part the notions introduced below will suffice for proving most of our estimates: the only exception is Proposition 3.6 where more tools are needed; these will be presented in the Appendix.

For a given \(x \in {\mathcal {M}}\) we use \(\exp _x\) to denote the exponential map at x. This is a diffeomorphism between the set \(B(0,h) \subseteq {\mathcal {T}}_x{\mathcal {M}}\) and \(B_{\mathcal {M}}(x,h)\) as long as \(h \leqq i_{\mathcal {M}}\) (where we recall \(i_{\mathcal {M}}\) is \({\mathcal {M}}\)’s injectivity radius), and is characterized as follows: for \(v\in B(0,h) \subseteq {\mathcal {T}}_x{\mathcal {M}}\), the curve \(t \in [0,1] \mapsto \exp _x(tv)\) describes the constant speed geodesic emanating from x with initial velocity v. We use \(\mathrm {d}(\exp _x)_v\) to denote the differential of \(\exp _x\) at \(v \in B(0, h) \subseteq {\mathcal {T}}_x{\mathcal {M}}\), which maps tangent vectors to \({\mathcal {T}}_x{\mathcal {M}}\) at v to tangent vectors to \({\mathcal {M}}\) at \(\exp _x(v)\). Implicit in this definition of exponential map is the choice of connection or covariant derivative which here we take it to be the Levi-Civita connection; recall that the metric on \({\mathcal {M}}\) is the one inherited from the ambient space \({\mathbb {R}}^d\). We use \(J_x(v)\) to denote the Jacobian of the exponential map at the point v. The Jacobian describes the Riemannian volume form of the manifold \({\mathcal {M}}\) (which we will denote by \(\mathrm {vol}_{\mathcal {M}}\)) in the local coordinates of the exponential map (a.k.a. normal coordinates). That is, we can write integrals of the form

$$\begin{aligned} \int _{B_{\mathcal {M}}(x, h)} \zeta (y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y), \end{aligned}$$

(for small enough h) as

$$\begin{aligned} \int _{B(0, h) \subseteq {\mathcal {T}}_x{\mathcal {M}}} \zeta (\exp _x(v))J_x(v) \, \mathrm {d}v. \end{aligned}$$

According to [11, 12], the Jacobian \(J_x\) satisfies the following bounds

$$\begin{aligned} 1- C |v|^2 \leqq J_x(v) \leqq 1+ C |v|^2, \end{aligned}$$
(3.1)

for all \(v\in B(0, h) \subseteq {\mathcal {T}}_x{\mathcal {M}}\), where h is sufficiently small, and in particular smaller than \(i_{\mathcal {M}}\). The constant C can be written as \(C = cm K\), where c is a universal constant, m is the dimension of \({\mathcal {M}}\), and K is an upper bound on the absolute value of all sectional curvatures of \({\mathcal {M}}\) at all points within the ball \(B_{\mathcal {M}}(x,h)\). Since we are assuming \({\mathcal {M}}\) to be compact, this constant can be picked to uniformly bound the discrepancy between the volume form (in normal coordinates) and the uniform measure. In the remainder we will write \(h \leqq h_{\mathcal {M}}\) to indicate that h is small enough (smaller than a fixed quantity that only depends on \({\mathcal {M}}\)). We define the tangent bundle by

$$\begin{aligned} {\mathcal {T}}{\mathcal {M}}:= \left\{ (x,v) \, : \, x \in {\mathcal {M}}, \quad v \in {\mathcal {T}}_x{\mathcal {M}}\right\} . \end{aligned}$$

Recall that we wrote the non-local \(\mathrm {TV}\) seminorm \(\mathrm {TV}_h\) as a double integration over \(x\in {\mathcal {M}}\) and \(y\in {\mathcal {M}}\), where x is close to y, see (2.8). In Euclidean settings we can use a change of variables and integrate over x and \(v=\frac{x-y}{h}\). With this change of coordinates the non-local functional converges easily to a local functional. On a manifold we can’t immediately make the same transformation as we need to be careful where v lives. However, this is really just a technical detail and (formally) by viewing v as a tangent vector in \({\mathcal {T}}_x{\mathcal {M}}\) we can again view the double integration as a integral over \(x\in {\mathcal {M}}\) and \(v\in {\mathcal {T}}_x{\mathcal {M}}\). This leads us to requiring the volume form on \({\mathcal {T}}{\mathcal {M}}\), i.e. \(\mathrm {d}\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}(x,v)\). We give some background on how one rigorously defines \(\mathrm {d}\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}\) in the appendix where it is needed for a more technical computation, but within the main body it is enough to understand that \(\int _{{\mathcal {T}}{\mathcal {M}}}\zeta (\xi )\,\mathrm {d}\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}(\xi ) \) can be written as

$$\begin{aligned} \int _{{\mathcal {T}}{\mathcal {M}}}\zeta (x,v)\,\mathrm {d}\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}(x,v) = \int _{\mathcal {M}}\int _{B(0, h) \subseteq {\mathcal {T}}_x{\mathcal {M}}} \zeta (x,v)\, \mathrm {d}v \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) \end{aligned}$$

for any \(\zeta :{\mathcal {T}}{\mathcal {M}}\rightarrow {\mathbb {R}}\) with the property that \(\zeta (x,\cdot )\) has compact support in \(B(0,h_{\mathcal {M}})\) for every \(x\in {\mathcal {M}}\).

We will use the geodesic flow \(\Phi = \{ \Phi _s \}_{s \in [0,1]}\) which takes an arbitrary point

$$\begin{aligned} (x,v) \in {\mathcal {B}} := \{ (x,v)\, : \, x \in {\mathcal {M}}, \quad v\in {\mathcal {T}}_x {\mathcal {M}}, \quad |v|_x < i_{\mathcal {M}}\}, \end{aligned}$$

into the point

$$\begin{aligned} \Phi _s(x,v):= (\Phi ^1_s(x,v), \Phi _s^2(x,v)), \end{aligned}$$

where

$$\begin{aligned} \Phi _s^1(x,v):= \exp _x(sv), \quad \Phi _s^2(x,v):= \mathrm {d}(\exp _x)_{sv}(v). \end{aligned}$$

It can be checked that for every \(s\in [0,1]\), \(\Phi _s\) is a diffeomorphism of \({\mathcal {B}}\) into itself. Moreover, the geodesic flow leaves \(\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}\) invariant: this is the content of Liouville’s theorem (see Chapter 3 in [31]).

In the sequel we may use \(\xi =(x,v)\) to represent a generic point in the tangent bundle \({\mathcal {T}}{\mathcal {M}}\) and abuse notation slightly to write things like \( g(\xi )=g(x)\) whenever g is a real valued function on \({\mathcal {M}}\). We will also use \(\mathrm {d}g\) (i.e. the differential of g) which is the 1-form that when acting on a tangent vector v returns the directional derivative of g in the direction v, and will write things like \(\mathrm {d}g(\xi )\) to denote the directional derivative of g at the point x in the direction v. If a function \(g\in \mathrm {C}^{1}({\mathcal {M}})\) then we understand it to be differentiable in the sense of Fréchet, in particular, at every \(x\in {\mathcal {M}}\) there exists \(\nabla g(x)\in {\mathcal {T}}_x{\mathcal {M}}\) such that \(\mathrm {d}g(v)(x) = \langle \nabla g(x), v\rangle _x\) for all \(v\in {\mathcal {T}}_x{\mathcal {M}}\).

With all the above definitions in hand, we are ready to state and prove our first auxiliary results.

Proposition 3.1

There is a constant C such that for all \(f \in \mathrm {BV}\) \(({\mathcal {M}})\) (i.e. \(f \in \mathrm {L}^{1}({\mathcal {M}})\) and \(\mathrm {TV}(f) < \infty \)) and all \(0<h\leqq h_{\mathcal {M}}\) we have

$$\begin{aligned} \mathrm {TV}_{h}(f) \leqq (1+ Ch^2)\sigma _\eta \mathrm {TV}(f), \end{aligned}$$

where \(\sigma _\eta \) is the surface tension defined in (2.7).

Proof

Using the density of smooth functions in \(\mathrm {L}^{1}({\mathcal {M}})\) and an approximating result for the \(\mathrm {TV}\) seminorm like that in Theorem 13.9 in [60] (see also Theorem 2.4 in [1]), it is enough to show the result for \(f \in \mathrm {C}^{\infty }({\mathcal {M}})\). Let \(x\in {\mathcal {M}}\) and \(y \in B_{\mathcal {M}}(x,h)\). Then, there is a unique \(v \in B(0,h) \subseteq {\mathcal {T}}_x{\mathcal {M}}\) such that \(\exp _x(v)=y\). By the fundamental theorem of calculus we can write

$$\begin{aligned} f(y) - f(x) = f(\exp _x(v))- f(x)= \int _{0}^1 \frac{\mathrm {d}}{\mathrm {d}t}f(\exp _x(tv))\,\mathrm {d}t \end{aligned}$$

and so

$$\begin{aligned} \begin{aligned}&\int _{B_{\mathcal {M}}(x, h )} |f(y)- f(x)| \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y) \\&\quad = \int _{B(0,h) \subseteq {\mathcal {T}}_x{\mathcal {M}}} |f(\exp _x(v))- f(x)| J_x(v) \, \mathrm {d}v \\&\quad \leqq (1+ C h^2) \int _{B(0,h) \subseteq T_x {\mathcal {M}}} \int _{0}^1 |\mathrm {d}f(\Phi _t(x,v))| \, \mathrm {d}t \, \mathrm {d}v, \end{aligned} \end{aligned}$$

where we have used (3.1) to bound the Jacobian. It follows that

$$\begin{aligned} \begin{aligned}&\int _{\mathcal {M}}\int _{B_{\mathcal {M}}(x, h)} |f(y)- f(x)| \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) \\&\quad \leqq (1+ Ch^2) \int _{0}^1 \int _{{\mathcal {B}}_h } |\mathrm {d}f(\Phi _t(\xi ))| \, \mathrm {d}\mathrm {vol}_{T{\mathcal {M}}}(\xi ) \, \mathrm {d}t \end{aligned} \end{aligned}$$

where

$$\begin{aligned} {\mathcal {B}}_h := \{ (x,v) \in {\mathcal {T}}{\mathcal {M}}\, : \, x \in {\mathcal {M}}, \quad v\in B(0,h) \subseteq {\mathcal {T}}_x{\mathcal {M}}\}. \end{aligned}$$

From the fact that the geodesic flow leaves \(\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}\) invariant, and the fact that \(\Phi _t({\mathcal {B}}_h) = {\mathcal {B}}_h\), it follows after a change of variables that for all \(t \in (0,1)\)

$$\begin{aligned}&\int _{{\mathcal {B}}_h } |\mathrm {d}f(\Phi _t(\xi ))| \, \mathrm {d}\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}(\xi )\\&\quad = \int _{{\mathcal {B}}_h} |\mathrm {d}f(\xi )| \, \mathrm {d}\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}(\xi ) \\&\quad = \int _{\mathcal {M}}\int _{B(0,h) \subseteq {\mathcal {T}}_x{\mathcal {M}}} |\langle \nabla f(x),v\rangle _x| \, \mathrm {d}v \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) \\&\quad = \int _{{\mathcal {M}}} |\nabla f(x)|_x \int _{B(0,h) \subseteq {\mathcal {T}}_x{\mathcal {M}}} \left| \left\langle \frac{\nabla f(x)}{|\nabla f(x)|_x},v \right\rangle _x\right| \, \mathrm {d}v \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) \\&\quad = \int _{\mathcal {M}}\sigma _\eta |\nabla f(x)|_x \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x), \end{aligned}$$

where in the last line we have used the radial symmetry of the integrand and the definition of \(\sigma _\eta \) in (2.7). From the above it follows that

$$\begin{aligned}&\int _{\mathcal {M}}\int _{B_{\mathcal {M}}(x,h)} |f(y)- f(x)| \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x)\\&\quad \leqq (1+Ch^2)\sigma _\eta \int _{{\mathcal {M}}}|\nabla f(x)|_x \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x). \end{aligned}$$

Using (2.4) (and the fact that f is smooth) we deduce the desired inequality. \(\square \)

The next result is a somewhat converse to the previous one, but is restricted to smooth enough functions.

Proposition 3.2

(Non-local vs local for smooth functions) Let \(0 < h\leqq h_{\mathcal {M}}\) and let f be a \(\mathrm {C}^{1,1}({\mathcal {M}})\) function. Then,

$$\begin{aligned} \sigma _\eta \mathrm {TV}(f) \leqq (1+Ch^2)\mathrm {TV}_h(f) + C\Vert f \Vert _{\mathrm {C}^{1,1}({\mathcal {M}})} h, \end{aligned}$$

where C is independent of f or h.

Proof

Let \(f\in \mathrm {C}^{1,1}({\mathcal {M}})\). For a fixed \(x\in {\mathcal {M}}\) we can Taylor expand f around x and write

$$\begin{aligned} f(y) = f(x) + \langle \nabla f(x),\exp _x^{-1}(y) \rangle _x + R_x(y), \quad y\in B_{\mathcal {M}}(x,h) \end{aligned}$$

where the remainder \(R_x(y)\) satisfies

$$\begin{aligned} \sup _{y \in B_{\mathcal {M}}(x,h)} |R_{x}(y)| \leqq C\Vert f \Vert _{\mathrm {C}^{1,1}({\mathcal {M}}) }h^2 \end{aligned}$$

for a constant C that only depends on \({\mathcal {M}}\) (and in particular does not depend on f). It follows that

$$\begin{aligned}&\frac{1}{h^{m+1}} \int _{\mathcal {M}}|f(x) - f(y)| \eta \left( \frac{d_{\mathcal {M}}(x,y)}{h}\right) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y) + C h \Vert f \Vert _{\mathrm {C}^{1,1}({\mathcal {M}})} \\&\quad \geqq \frac{1}{h^{m+1}} \int _{\mathcal {M}}\left| \left\langle \nabla f(x),\exp _x^{-1}(y) \right\rangle _x \right| \eta \left( \frac{d_{\mathcal {M}}(x,y)}{h} \right) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y). \end{aligned}$$

The term on the right hand side of the above expression can be written as

$$\begin{aligned}&\frac{|\nabla f(x)|_x}{h^{m+1}}\int _{B(0,h) \subseteq {\mathcal {T}}_x{\mathcal {M}}} \left| \left\langle \frac{\nabla f(x)}{|\nabla f(x)|_x},v\right\rangle _x \right| \eta \left( \frac{|v|}{h}\right) J_x(v) \, \mathrm {d}v \\&\quad \geqq (1-Ch^2)\frac{|\nabla f(x)|_x}{h^{m+1}}\int _{B(0,h) \subseteq {\mathcal {T}}_x{\mathcal {M}}} \left| \left\langle \frac{\nabla f(x)}{|\nabla f(x)|_x},v\right\rangle _x \right| \eta \left( \frac{|v|}{h}\right) \, \mathrm {d}v \\&\quad = \sigma _\eta (1- Ch^2) |\nabla f(x) |_x, \end{aligned}$$

where in order to go from the first to the second line we have used (3.1), and where in the last line we have used the radial symmetry of the integrand and the definition of \(\sigma _\eta \) in (2.7). Integration over x gives us the desired inequality. \(\quad \square \)

After considering the relationship between local and non-local energies we now present some results that relate non-local energies at different length-scales. First we prove a subadditivity property.

Lemma 3.3

(Subadditivity) Let A be a Borel subset of \({\mathcal {M}}\) and let \(g_A: (0,h_{\mathcal {M}}) \rightarrow {\mathbb {R}}\) be the function given by

$$\begin{aligned} g_A(h):= \int _{{\mathcal {T}}{\mathcal {M}}} \eta (|v|_x)\mathbbm {1}_A(x) \mathbbm {1}_{A^c}(\Phi _h^1(x,v)) \, \mathrm {d}\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}(x,v). \end{aligned}$$

Then, for every \(h,h'\) with \(h+h' < h_{\mathcal {M}}\) we have

$$\begin{aligned} g_A(h+h') \leqq g_A(h) + g_A(h'). \end{aligned}$$

In particular, if \(N \in {\mathbb {N}}\) is such that Nh is smaller than \(h_{\mathcal {M}}\) we can iterate the previous identity to obtain

$$\begin{aligned} g_A(Nh) \leqq Ng_A(h). \end{aligned}$$

Proof

First of all notice that for every \((x,v) \in {\mathcal {T}}{\mathcal {M}}\) we have

$$\begin{aligned}&\mathbbm {1}_{A}(x)\mathbbm {1}_{A^c}(\Phi ^1_{h+h'}(x, v))\\&\quad \leqq \mathbbm {1}_{A}(\Phi ^1_{h}(x,v))\mathbbm {1}_{A^c}(\Phi _{h+h'}^1(x,v))+ \mathbbm {1}_{A}(x)\mathbbm {1}_{A^c}(\Phi ^1_{h}(x,v)). \end{aligned}$$

Indeed, if the left hand side is equal to one, necessarily one of the two terms on the right hand side must be equal to one. From this we conclude that

$$\begin{aligned} \begin{aligned} g_A(h+h')&\leqq \int _{{\mathcal {T}}{\mathcal {M}}} \eta (|v|_x) \mathbbm {1}_{A}( \Phi ^1_h(x,v)) \mathbbm {1}_{A^c}( \Phi ^1_{h+h'}(x,v)) \, \mathrm {d}\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}(x,v) \\&\quad + \int _{{\mathcal {T}}{\mathcal {M}}} \eta (|v|_x) \mathbbm {1}_A(x) \mathbbm {1}_{A^c}( \Phi ^1_h(x,v)) \, \mathrm {d}\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}(x,v) \\&= g_A(h)\\&\quad +\int _{{\mathcal {T}}{\mathcal {M}}} \eta (|\Phi ^2_h(x,v)|_{\Phi ^1_h(x,v)}) \mathbbm {1}_{A}( \Phi ^1_{h}(x,v)) \mathbbm {1}_{A^c}( \Phi ^1_{h'}\\&\quad \left( \Phi ^1_{h}(x,v), \Phi ^2_h(x,v) \right) ) \, \mathrm {d}\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}(x,v), \end{aligned} \end{aligned}$$

where the last equality follows from the fact that \(|v|_x =|\Phi ^2_h(x,v)|_{\Phi ^1_h(x,v)}\) (because lengths are preserved along geodesics) and the fact that \(\Phi ^1_{h+h'}(x,v)=\Phi ^1_{h'}\left( \Phi ^1_{h}(x,v), \Phi ^2_h(x,v) \right) \) (which essentially says that moving along the geodesic starting at x in the direction v for \(h+h'\) units of time is the same as moving only for h unites of time and then continue moving for an extra \(h'\) units of time). To conclude, we consider the change of variables

$$\begin{aligned} (x,v) \mapsto (x',v') =\Phi _{h}(x,v) \end{aligned}$$

and rewrite the last integral as

$$\begin{aligned} \int _{{\mathcal {T}}{\mathcal {M}}} \eta (|v'|_{x'}) \mathbbm {1}_{A}(x')\mathbbm {1}_{A^c}( \Phi _{h'}(x',v')) \, \mathrm {d}\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}(x',v')= g_A(h'), \end{aligned}$$

using the fact that the geodesic flow leaves the volume form \(\mathrm {vol}_{{\mathcal {T}}{\mathcal {M}}}\) invariant according to Liouville’s theorem. This proves the result. \(\quad \square \)

We can now use the previous lemma to prove that the non-local TV seminorm with a small length-scale dominates the ones with larger length-scale.

Proposition 3.4

There is a constant \(C>0\) such that for every \(f \in L^1({\mathcal {M}})\) and every ha with \(0< h \leqq a < \frac{h_{{\mathcal {M}}}}{2}\) we have

$$\begin{aligned} \mathrm {TV}_{a}(f) \leqq C \mathrm {TV}_{h}(f). \end{aligned}$$

Proof

We first restrict our attention to the case \(f= \mathbbm {1}_A\) for some Borel subset A of \({\mathcal {M}}\). Notice that for arbitrary \(h \leqq h_{\mathcal {M}}\) we can rewrite \(\mathrm {TV}_{h}(\mathbbm {1}_A)\) as:

$$\begin{aligned} \begin{aligned}&\mathrm {TV}_h(\mathbbm {1}_A) \\&\quad := \frac{1}{h^{m+1}}\int _{{\mathcal {M}}}\int _{\mathcal {M}}\eta \left( \frac{d_{\mathcal {M}}(x,y)}{h} \right) |\mathbbm {1}_A(x) - \mathbbm {1}_{A}(y)| \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) \\&\quad = \frac{2}{h^{m+1}}\int _{{\mathcal {M}}}\int _{\mathcal {M}}\eta \left( \frac{d_{\mathcal {M}}(x,y)}{h} \right) \mathbbm {1}_A(x) \mathbbm {1}_{A^c}(y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) \\&\quad = \frac{2}{h^{m+1}} \int _{{\mathcal {M}}}\int _{B(0,h)\subseteq {\mathcal {T}}_x{\mathcal {M}}} \eta \left( \frac{|v|_x}{h} \right) \mathbbm {1}_{A}(x)\mathbbm {1}_{A^c}(\exp _x(v)) J_x(v) \, \mathrm {d}v \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) \\&\quad = \frac{2}{h} \int _{{\mathcal {M}}}\int _{B(0,1)\subseteq {\mathcal {T}}_x{\mathcal {M}}}\eta \left( |v|_x \right) \mathbbm {1}_{A}(x)\mathbbm {1}_{A^c}(\exp _x(h v)) J_x( h v) \, \mathrm {d}v \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x). \end{aligned} \end{aligned}$$

Now, from (3.1) we know that for all \(x \in {\mathcal {M}}\) and \(v \in {\mathcal {T}}_x{\mathcal {M}}\) with \(|v| \leqq 1\) we have

$$\begin{aligned} 1-Ch^2 \leqq J_{x}(h v) \leqq 1+ C h^2 . \end{aligned}$$

From this it follows that

$$\begin{aligned} \begin{aligned}&(1- Ch^2)\frac{2}{h} \int _{{\mathcal {M}}}\int _{B(0,1)\subseteq {\mathcal {T}}_x{\mathcal {M}}}\eta \left( |v|_x\right) \mathbbm {1}_{A}(x)\mathbbm {1}_{A^c}(\exp _x(h v)) \, \mathrm {d}v \, \mathrm {d}x \leqq \mathrm {TV}_h(\mathbbm {1}_A) \\&\quad \leqq (1+ Ch^2)\frac{2}{h} \int _{{\mathcal {M}}}\int _{B(0,1)\subseteq {\mathcal {T}}_x{\mathcal {M}}} \eta \left( |v|_x\right) \mathbbm {1}_{A}(x)\mathbbm {1}_{A^c}(\exp _x(h v)) \, \mathrm {d}v \, \mathrm {d}x \end{aligned} \end{aligned}$$

which is the same as (using the notation from Lemma 3.3)

$$\begin{aligned} (1- Ch^2) \frac{2g_A(h)}{h} \leqq \mathrm {TV}_h(\mathbbm {1}_A) \leqq (1+ Ch^2) \frac{2g_A(h)}{h}. \end{aligned}$$
(3.2)

Let us now take a such that \(h \leqq a < \frac{h_{\mathcal {M}}}{2}\), and let \(N\in {\mathbb {N}}\) be such that

$$\begin{aligned} (N-1) h < a \leqq Nh. \end{aligned}$$

Since \(Nh \leqq h_{\mathcal {M}}\) we have

$$\begin{aligned} \mathrm {TV}_{Nh}(\mathbbm {1}_A) \leqq \frac{(1+CN^2h^2) 2g_A(Nh)}{Nh} \leqq \frac{(1+CN^2h^2) 2 g_A(h)}{h}, \end{aligned}$$

by Lemma 3.3. It now follows (using the lower bound for \(\mathrm {TV}_h(\mathbbm {1}_A)\) in (3.2)) that

$$\begin{aligned} \mathrm {TV}_{Nh}(\mathbbm {1}_A) \leqq \frac{(1+ CN^2h^2)}{1-ch^2}\mathrm {TV}_h(\mathbbm {1}_A). \end{aligned}$$

By Taylor’s theorem we have \(\frac{1}{1-ch^2}=1+Ch^2+O(h^4)\) and so for h sufficiently small we can assume \(\frac{1}{1-ch^2}\leqq 1+Ch^2\). Moreover,

$$\begin{aligned} Nh = (N-1)h + h \leqq 2a. \end{aligned}$$

So,

$$\begin{aligned} \mathrm {TV}_{Nh}(\mathbbm {1}_A) \leqq (1+Ca^2)(1+ Ch^2)\mathrm {TV}_h(\mathbbm {1}_A). \end{aligned}$$

Finally, notice that, from the choice of N, it follows that

$$\begin{aligned} \mathrm {TV}_{a}(\mathbbm {1}_A)&\leqq \left( \frac{Nh}{a}\right) ^{m+1} \mathrm {TV}_{Nh}(\mathbbm {1}_A) \\&\leqq \left( \frac{N}{N-1}\right) ^{m+1} \mathrm {TV}_{Nh}(\mathbbm {1}_A) \\&\leqq \left( \frac{N}{N-1}\right) ^{m+1}(1+Ca^2)(1 + Ch^2) \mathrm {TV}_{h}(\mathbbm {1}_A). \end{aligned}$$

Given the assumed smallness of h and a we can bound the right hand side by a constant times \(TV_h(\mathbbm {1}_A)\).

Now that we have proved the desired inequality for functions of the form \(f=\mathbbm {1}_A\) it is straightforward to extend it to general \(f\in \mathrm {L}^{1}({\mathcal {M}})\) by means of the coarea formula for the non-local total variation. Indeed, using a layer cake representation, one can show that for every \(f\in \mathrm {L}^{1}({\mathcal {M}})\) and for every h one has:

$$\begin{aligned} \mathrm {TV}_h(f)= \int _{-\infty }^\infty \mathrm {TV}_{h}(\mathbbm {1}_{\{ f \leqq t \}}) \, \mathrm {d}t; \end{aligned}$$

e.g., in Euclidean domains, [80]. From this it follows that

$$\begin{aligned} \mathrm {TV}_{a}(f) = \int _{-\infty }^\infty \mathrm {TV}_{a}(\mathbbm {1}_{\{ f \leqq t \}}) \, \mathrm {d}t \leqq C \int _{-\infty }^\infty \mathrm {TV}_{h}(\mathbbm {1}_{\{ f \leqq t \}}) \, \mathrm {d}t = C \mathrm {TV}_h(f). \end{aligned}$$

\(\square \)

In what follows we consider \(\phi :[0,\infty ) \rightarrow [0,\infty )\) a smooth function with compact support for which

$$\begin{aligned} \phi (t) \leqq C \eta (t), \quad \forall t \geqq 0 \end{aligned}$$

for some \(C>0\), and for which

$$\begin{aligned} \int _{{\mathbb {R}}^m}\phi (|x|)\, \mathrm {d}x =1. \end{aligned}$$

We use \(\phi _a\) to denote the rescaled version

$$\begin{aligned} \phi _a(t):= \frac{1}{a^m} \phi \left( \frac{t}{a}\right) , \end{aligned}$$

and use it to define the smoothing operator

$$\begin{aligned}&\Lambda _a : \mathrm {L}^{1}({\mathcal {M}}) \rightarrow \mathrm {C}^{\infty }({\mathcal {M}})\nonumber \\&\Lambda _a f(x) = \frac{1}{\tau _a(x)}\int _{B_{\mathcal {M}}(x,a)}\phi _a(d_{\mathcal {M}}(x,y)) f(y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y), \end{aligned}$$
(3.3)

where \(\tau _a(x)\) is the normalization constant

$$\begin{aligned} \tau _a(x) := \int _{B_{\mathcal {M}}(x,a)}\phi _a(d_{\mathcal {M}}(x,y)) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y). \end{aligned}$$

The parameter \(a>0\) is a free parameter that we will choose later on.

Here we are attempting to mimic the construction of smoothing operators in [12] in the context of Dirichlet energies, i.e. the construction of an operator \(\Lambda :\mathrm {L}^{2}({\mathcal {M}}) \rightarrow \mathrm {C}^{\infty }({\mathcal {M}}) \) for which a tight relationship between non-local and local Dirichlet energies can be obtained. In our setting this amounts to finding an operator \(\Lambda : \mathrm {L}^{1}({\mathcal {M}}) \rightarrow \mathrm {C}^{\infty }({\mathcal {M}})\) satisfying, roughly speaking,

$$\begin{aligned} \sigma _\eta \mathrm {TV}(\Lambda _a f) \leqq (1+o(1))\mathrm {TV}_h(f) , \quad \forall f \in \mathrm {L}^{\infty }({\mathcal {M}}) \end{aligned}$$
(3.4)

as well as

$$\begin{aligned} \Vert \Lambda _a f - f \Vert _{\mathrm {L}^{1}({\mathcal {M}})} \leqq o(1)\mathrm {TV}_h(f) \end{aligned}$$
(3.5)

where \(\sigma _\eta \) is the surface tension in (2.7). In [12], the analogous statement is obtained by selecting the smoothing operator as a convolution operator with respect to a conveniently chosen kernel and with a bandwidth of the same order as the connectivity lengthscale \(h>0\). The structure of the \(\mathrm {L}^{2}\)-type Dirichlet seminorms makes the selection of this special kernel possible, but a kernel with similar properties does not seem to be found easily in the \(\mathrm {TV}\) case. As we will see below we will be forced to define our smoothing operator as a convolution operator whose kernel has a bandwidth \(a>0\) that is much larger than the length-scale h used to define the non-local TV seminorm, and more involved computations will be needed. The subadditivity property for the TV seminorm, proved in Proposition 3.3, combined with Proposition 3.6 below allow us to show that when \(h \ll a\) we can still get the desired relations (3.4) and (3.5), at the cost of losing some orders in the convergence rates.

We start by presenting some elementary properties of the smoothing operator \(\Lambda _a\).

Proposition 3.5

For every \(0<h\leqq a<\frac{h_{\mathcal {M}}}{2}\) and \(k\in {\mathbb {N}}\) there exists C (which may depend on k and \({\mathcal {M}}\) but is independent of a and h) such that

  1. i)

    \(\Vert \Lambda _af\Vert _{\mathrm {C}^{k}({\mathcal {M}})} \leqq Ca^{-k} \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}\) for all \(f\in \mathrm {L}^{\infty }({\mathcal {M}})\).

  2. ii)

    \(\Vert f- \Lambda _a f\Vert _{\mathrm {L}^{1}({\mathcal {M}})} \leqq Ca \mathrm {TV}_h(f)\) for all \(f \in \mathrm {L}^{1}({\mathcal {M}})\).

Proof

For the first inequality we just illustrate the case \(k=1\), the other cases are obtained similarly (and anyway, are standard in the flat Euclidean case). The gradient of \(\Lambda _af\) is computed as

$$\begin{aligned} \nabla \Lambda _a f(x)&= - \frac{1}{\tau _a(x)a^{m+1}} \int _{{\mathcal {M}}} \phi '\left( \frac{d_{\mathcal {M}}(x,y)}{a} \right) \frac{\exp _x^{-1}(y)}{d_{\mathcal {M}}(x,y)} f(y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y) \\&+ \frac{\Lambda _a f(x)}{\tau _a(x) a^{m+1}} \int _{{\mathcal {M}}} \phi '\left( \frac{d_{\mathcal {M}}(x,y)}{a} \right) \frac{\exp _x^{-1}(y)}{d_{\mathcal {M}}(x,y)} \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y). \end{aligned}$$

Taking the norm on both sides and using the triangle inequality we see that

$$\begin{aligned} |\nabla \Lambda _a f(x)|_x \leqq \frac{2\Vert f \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}}{\tau _a(x)a^{m+1}} \int _{{\mathcal {M}}} \left| \phi '\left( \frac{d_{\mathcal {M}}(x,y)}{a} \right) \right| \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y). \end{aligned}$$

This term on the other hand can be bounded by \(Ca^{-1}\Vert f \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}\). This follows from the fact that \(\tau _a(x)\) can be written as

$$\begin{aligned} \tau _a(x)= & {} \frac{1}{a^m} \int _{B(0,a)}\phi \left( \frac{|v|_x}{a}\right) J_x(v) \, \mathrm {d}v\nonumber \\\geqq & {} (1-Ca^2)\frac{1}{a^m} \int _{B(0,a)}\phi \left( \frac{|v|_x}{a}\right) \, \mathrm {d}v= (1-Ca^2), \end{aligned}$$
(3.6)

and also

$$\begin{aligned} \frac{1}{a^m} \int _{{\mathcal {M}}} \left| \phi '\left( \frac{d_{\mathcal {M}}(x,y)}{a} \right) \right| \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y)&= \frac{1}{a^m}\int _{B(0,a)} \left| \phi '\left( \frac{|v|_x}{a}\right) \right| J_x(v) \, \mathrm {d}v \\&\leqq \frac{1+Ca^2}{a^m}\int _{B(0,a)}\left| \phi '\left( \frac{|v|_x}{a} \right) \right| \, \mathrm {d}v \\&\leqq C(1+Ca^2), \end{aligned}$$

thanks to the bounds on the Jacobian (3.1).

For the second identity we can write

$$\begin{aligned} \Lambda _a f (x) - f(x)&= \frac{1}{\tau _a(x)}\int _{{\mathcal {M}}}\phi _a(d_{\mathcal {M}}(x,y))f(y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y) - f(x) \\&= \frac{1}{\tau _a(x)} \int _{{\mathcal {M}}}\phi _a(d_{\mathcal {M}}(x,y))(f(y)-f(x)) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y), \end{aligned}$$

from where it follows that

$$\begin{aligned} \begin{aligned}&\Vert \Lambda _a f - f \Vert _{\mathrm {L}^{1}({\mathcal {M}})} \\&\quad \leqq (1+ Ca^2) \int _{{\mathcal {M}}}\int _{{\mathcal {M}}}\phi _a(d_{\mathcal {M}}(x,y))|f(y)-f(x)| \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) \\&\quad \leqq \frac{C}{a^m}\int _{{\mathcal {M}}}\int _{{\mathcal {M}}}\eta \left( \frac{d_{\mathcal {M}}(x,y)}{a}\right) |f(y)-f(x)| \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) \\&\quad = C a\mathrm {TV}_a(f) \leqq Ca \mathrm {TV}_h(f), \end{aligned} \end{aligned}$$

where for the first inequality we have used the lower bound on \(\tau _a\) in (3.6), for the second inequality we have used the fact that \(\phi \) was chosen to be dominated by \(\eta \), and in the last inequality we have used Proposition 3.4. \(\quad \square \)

The next result is an important technical piece that we use in the sequel. In the flat Euclidean case, the proof is quite elementary only involving a simple change of variables. However, in the curved manifold setting, more involved computations are needed. Three technical facts from differential geometry that are used in the proof are discussed in detail in the Appendix.

Proposition 3.6

(Monotonicity by convolution) There are constants \(C_1, C_2\) such that for all \(0< h \leqq a \leqq \frac{h_{\mathcal {M}}}{2}\) and for all \(f\in \mathrm {L}^{\infty }({\mathcal {M}})\) we have

$$\begin{aligned} \mathrm {TV}_{{\widetilde{h}}} (\Lambda _a f) \leqq (1+ C_1a)\mathrm {TV}_h(f) + C_1 a \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}, \end{aligned}$$

where \({\widetilde{h}}:=h(1- C_2a)\).

Proof

Let \({\widetilde{h}}:= h(1-Ca)\) for some constant C that will be chosen later. We start by estimating \(\mathrm {TV}_{{{\widetilde{h}}}}(\Lambda _a f)\), using the triangle inequality to bound it by the sum of

$$\begin{aligned} A_1&:= \frac{1}{{\widetilde{h}}^{m+1}} \int _{{\mathcal {M}}}\int _{{\mathcal {M}}} \left| \left( \frac{1}{\tau _a(x)}- \frac{1}{\tau _a(y)} \right) {{\widetilde{\Lambda }}}_af(x) \right| \eta \left( \frac{d_{\mathcal {M}}(x,y)}{{\widetilde{h}}}\right) \\&\quad \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x)\mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(y) \end{aligned}$$

and

$$\begin{aligned} A_2&:= \frac{1}{{\widetilde{h}}^{m+1}} \int _{{\mathcal {M}}}\int _{{\mathcal {M}}} \left| \frac{1}{\tau _a(y)}( {\widetilde{\Lambda }}_af(x) - {\widetilde{\Lambda }}_af(y)) \right| \eta \left( \frac{d_{\mathcal {M}}(x,y)}{{\widetilde{h}}}\right) \\&\quad \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x)\mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(y), \end{aligned}$$

where

$$\begin{aligned} {\widetilde{\Lambda }}_a f(x):= \int _{{\mathcal {M}}} \phi _a(d_{\mathcal {M}}(x,z))f(z)\mathrm {d}\mathrm {vol}_{\mathcal {M}}(z). \end{aligned}$$

Let us first bound the term \(A_1\). Since we can bound \(|{{\widetilde{\Lambda }}}_a f(x)|\) uniformly by \(C \Vert f \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}\) we just need to find a bound for

$$\begin{aligned}&\frac{1}{{\widetilde{h}}^{m+1}} \int _{{\mathcal {M}}}\int _{{\mathcal {M}}} \left| \frac{1}{\tau _a(x)}- \frac{1}{\tau _a(y)} \right| \eta \left( \frac{d_{\mathcal {M}}(x,y)}{{\widetilde{h}}}\right) \\&\qquad d\mathrm {vol}_{{\mathcal {M}}}(x)d\mathrm {vol}_{{\mathcal {M}}}(y) = \mathrm {TV}_{{\widetilde{h}}}( \tau _a^{-1}). \end{aligned}$$

Now, using Proposition 3.1 we know \(\mathrm {TV}_{{\widetilde{h}}}(\tau _a^{-1})\) is smaller than \(C\mathrm {TV}(\tau _a^{-1})\) (using also the smallness of h), which can be written as \(C\mathrm {TV}(\tau _a^{-1})= C \int _{{\mathcal {M}}} |\nabla \tau _a^{-1}(x)| d\mathrm {vol}_{\mathcal {M}}(x) \) given that \(\tau _a^{-1}\) is smooth (because \(\phi \) is smooth and because \(\tau _a\) is bounded away from zero as shown in (3.6)). Thus we just need to compute an estimate for \(|\nabla \tau _a^{-1}(x)|\); this has already been done in [12], but here we produce the argument for completeness. Indeed,

$$\begin{aligned}&\nabla \tau _a^{-1}(x) = -\frac{1}{\tau _a(x)^2} \nabla \tau _a(x) \\&\quad = \frac{1}{a^{m+1}\tau _a(x)^2} \int _{{\mathcal {M}}}\phi '\left( \frac{d_{\mathcal {M}}(x,z)}{a}\right) \frac{\exp _x^{-1}(z)}{d_{\mathcal {M}}(x,z)}\mathrm {d}\mathrm {vol}_{\mathcal {M}}(z), \end{aligned}$$

and in turn, the above integral can be written as

$$\begin{aligned} \int _{B(0,a)} \phi '(|v|_x/a)\frac{v}{|v|_x}J_x(v) \, \mathrm {d}v = \int _{B(0,a)} \phi '(|v|_x/a)\frac{v}{|v|_x}(J_x(v)-1) \, \mathrm {d}v, \end{aligned}$$

where the last equality is due to radial symmetry. Using the estimates on the Jacobian (3.1) the norm of the above expression can be bounded by \(Ca^{m+2}\) and thus \(|\nabla \tau _a^{-1}(x)|\leqq C a\). Putting all estimates together we conclude that \(A_1 \leqq C\Vert f \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} a\).

Now we focus on estimating \(A_2\). First of all, given the lower bounds for \(\tau _a\) in (3.6) we can focus on finding bounds for \(\mathrm {TV}_{{\widetilde{h}}}({\widetilde{\Lambda }}_a f)\). In other words, thanks to (3.6) we already have

$$\begin{aligned} A_2 \leqq (1+Ca^2) \mathrm {TV}_{{\widetilde{h}}}({\widetilde{\Lambda }}_a f ), \end{aligned}$$

and so, we just need to bound \(\mathrm {TV}_{{\widetilde{h}}}({\widetilde{\Lambda }}_a f )\). To achieve this, we start by noticing that for any given \(x,y\in {\mathcal {M}}\) satisfying \(d_{\mathcal {M}}(x,y)\leqq {\widetilde{h}}\) we can write

$$\begin{aligned}&{\widetilde{\Lambda }}_af(x) =\frac{1}{a^m} \int _{B(0,a) \subseteq {\mathcal {T}}_x{\mathcal {M}}}\phi (|v|_x/a)f(\exp _x(v))J_x(v)\,\mathrm {d}v,\\&{\widetilde{\Lambda }}_af(y) =\frac{1}{a^m} \int _{B(0,a) \subseteq {\mathcal {T}}_y{\mathcal {M}}}\phi (|v'|_y/a)f(\exp _y(v'))J_y(v')\, \mathrm {d}v'. \end{aligned}$$

We attempt to write this last expression in terms of an integral over tangent vectors at x. For that purpose we consider \(PT_{x,y}: {\mathcal {T}}_x{\mathcal {M}}\rightarrow {\mathcal {T}}_y{\mathcal {M}}\) the parallel transport from x to y along a constant speed geodesic connecting x and y. By definition of parallel transport, the map \(PT_{x,y}\) is an isometry between \({\mathcal {T}}_x{\mathcal {M}}\) and \({\mathcal {T}}_y {\mathcal {M}}\) and hence its Jacobian is equal to one. Therefore, we can write

$$\begin{aligned} {\widetilde{\Lambda }}_af(y) =\frac{1}{a^m} \int _{B(0,a) \subseteq {\mathcal {T}}_x{\mathcal {M}}}\phi (|v|_x/a)f(\exp _y(PT_{x,y}(v)))J_y(PT_{x,y}(v))\, \mathrm {d}v. \end{aligned}$$

The above expression allows us to write the difference \({{\widetilde{\Lambda }}}_af(x)-{{\widetilde{\Lambda }}}_af(y)\) in terms of a single integral

$$\begin{aligned}&\frac{1}{a^m} \int _{B(0,a) \subseteq {\mathcal {T}}_x{\mathcal {M}}}\phi (|v|_x/a)\\&\quad \left( f(\exp _x(v))J_x(v) -f(\exp _y(PT_{x,y}(v)))J_y(PT_{x,y}(v))\right) \, \mathrm {d}v. \end{aligned}$$

Now, as we show in the Appendix, there is a constant C such that

$$\begin{aligned} |J_x(v) - J_y(PT_{x,y}(v)) | \leqq C ah \end{aligned}$$
(3.7)

for all \(x \in {\mathcal {M}}\), all \(y \in B_{\mathcal {M}}(x,h)\) and all \(v\in T_x {\mathcal {M}}\) with \(|v|\leqq a\). As a consequence, we can use the triangle inequality to bound \(\mathrm {TV}_{{\widetilde{h}}}({\widetilde{\Lambda }}_af)\) by the sum of:

$$\begin{aligned} A_{2,1}:= C a\Vert f \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} \end{aligned}$$

and

$$\begin{aligned}&A_{2,2}:=\frac{1}{{\widetilde{h}}}\int _{{\mathcal {M}}}\int _{{\mathcal {M}}}\int _{{\mathcal {T}}_x {\mathcal {M}}} \eta _{{\widetilde{h}}}\\&\quad \left( d_{\mathcal {M}}(x,y)\right) \phi _a(|v|_x)| F(x, \exp _{x}^{-1}(y),v) | J_x(v) \, \mathrm {d}v \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x), \end{aligned}$$

where

$$\begin{aligned} F(x,w,v):= f(\exp _x(v)) - f (\exp _{\exp _x(w)}(PT_{x, \exp _x(w)}(v) )). \end{aligned}$$

\(A_{2,2}\) can be rewritten as

$$\begin{aligned} A_{2,2}&= \frac{1}{{\tilde{h}}} \int _{\mathcal {M}}\int _{{\mathcal {T}}_x {\mathcal {M}}} \int _{{\mathcal {T}}_x {\mathcal {M}}} \eta _{{\tilde{h}}} \left( |w|_x \right) \\&\quad \phi _a(|v|_x) |F(x,w,v)| J_x(w) J_x(v) \, \mathrm {d}v \, \mathrm {d}w \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x)\\&\quad \leqq \frac{(1+ Ca^2+ Ch^2)}{{\tilde{h}}} \int _{\mathcal {M}}\int _{{\mathcal {T}}_x {\mathcal {M}}} \int _{{\mathcal {T}}_x {\mathcal {M}}} \eta _{{\tilde{h}}} \left( |w|_x \right) \\&\quad \phi _a(|v|_x) |F(x,w,v)| \, \mathrm {d}v \, \mathrm {d}w \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x), \end{aligned}$$

where the inequality is due to the upper bound on \(J_x\) from (3.1). It will be convenient to rewrite the term on the right hand side of the inequality as a single integral

$$\begin{aligned} \frac{(1+ Ca^2 + Ch^2)}{{\tilde{h}}} \int _{{\mathcal {T}}^2 {\mathcal {M}}} \eta _{{\tilde{h}}} \left( |w|_x \right) \phi _a(|v|_x) |F(x,w,v)| \, \mathrm {d}\mathrm {vol}_{{\mathcal {T}}^2 {\mathcal {M}}}(x,w,v), \nonumber \\ \end{aligned}$$
(3.8)

over the fiber bundle

$$\begin{aligned} {\mathcal {T}}^2{\mathcal {M}}:= \{ (x,v_1,v_2) \, : \, x \in {\mathcal {M}}, \quad v_1, v_2 \in {\mathcal {T}}_x{\mathcal {M}}\}, \end{aligned}$$

whose volume form \(\mathrm {vol}_{{\mathcal {T}}^2{\mathcal {M}}}\) can be understood by \(\mathrm {d}\mathrm {vol}_{{\mathcal {T}}^2{\mathcal {M}}}(x,v_1,v_2) = \mathrm {d}v_1 \, \mathrm {d}v_2 \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x)\) (and we refer to the appendix for a rigorous formulation of the volume form on \({\mathcal {T}}^2{\mathcal {M}}\)). We would also like to notice that the integrand in (3.8) is zero outside the set

$$\begin{aligned} {\mathcal {B}}^2_{{\widetilde{h}},a}:= \{ (x,v_1,v_2) \in {\mathcal {T}}^2 {\mathcal {M}}\, :\, |v_1|_x\leqq {\tilde{h}} ,\quad |v_2|_x \leqq a \}. \end{aligned}$$

Next, we define a transformation \({\widetilde{\Psi }}\) of \((x,v_1,v_2)\) which will allow us to simplify the expression in (3.8). We let

$$\begin{aligned} {\widetilde{\Psi }}: (x,w,v) \longmapsto ({\tilde{x}} , {\tilde{w}} , {\tilde{v}}) \end{aligned}$$

be the map

$$\begin{aligned} {\tilde{x}}:= \Phi _1^1(x,v), \quad {\tilde{v}} := \Phi _1^2(x,v), \quad {\tilde{w}}:= \exp _{{\tilde{x}}}^{-1}\left( \exp _{\exp _x(w)}(PT_{x, \exp _x(w)}(v) ) \right) , \nonumber \\ \end{aligned}$$
(3.9)

defined for all \((x,w,v) \in {\mathcal {B}}^2_{c_1, c_2}\); here, \(c_1, c_2\) are order one quantities that are sufficiently small so as to guarantee that \({\widetilde{\Psi }}\) is a diffeomorphism. It is straightforward to verify that the image of \({\widetilde{\Psi }}\) is contained in \({\mathcal {B}}^2_{\tilde{c}_1, c_2}\) for some order one quantity \({\tilde{c}}_1\). In the Appendix we show that

$$\begin{aligned} \left| |w|_x -|{\tilde{w}}|_{{\tilde{x}}} \right| \leqq Cah \end{aligned}$$
(3.10)

for all \((x,w,v) \in {\mathcal {B}}^2_{ {\widetilde{h}},a}\), and in particular, for such a triple we have \(|{\tilde{w}}|_{{\tilde{x}}} \leqq {\tilde{h}} + Cah = h \) (here is where we make the choice of C in the definition of \({\tilde{h}}\), i.e. \({\tilde{h}}: = h(1-Ca) \)). The bottom line is that for all \((x,v,w) \in {\mathcal {T}}^2 {\mathcal {M}}\) we have

$$\begin{aligned} \eta \left( \frac{|w|_x}{{\tilde{h}}} \right) \phi _a(|v|_x ) \leqq \eta \left( \frac{|{\tilde{w}}|_{{\tilde{x}}}}{h} \right) \phi _a(|v|_x ). \end{aligned}$$

Furthermore, from the definition of \({\tilde{v}}\) it follows that

$$\begin{aligned} |v|_{x} = |{\tilde{v}}|_{{\tilde{x}}}. \end{aligned}$$

Thus, for all (xvw) we have

$$\begin{aligned} \eta \left( \frac{|w|_x}{{\tilde{h}}} \right) \phi _a(|v|_x ) \leqq \eta \left( \frac{|{\tilde{w}}|_{{\tilde{x}}}}{h} \right) \phi _a(|{\tilde{v}}|_{{\tilde{x}}} ). \end{aligned}$$

Now, from the above inequality and the fact that

$$\begin{aligned} F(x,w,v)= f({\tilde{x}})- f( \exp _{{\tilde{x}}}({\tilde{w}}) ) \end{aligned}$$

we can then upper bound (3.8) by

$$\begin{aligned} (1+Ca) \frac{1}{h} \int _{{\mathcal {T}}^2 {\mathcal {M}}} \eta _h(|{\tilde{w}}|_{{\tilde{x}}})| \phi _a(|{\tilde{v}} |_{{\tilde{x}}})| f({\tilde{x}}) - f(\exp _{{\tilde{x}}} ({\tilde{w}})) | \, \mathrm {d}\mathrm {vol}_{{\mathcal {T}}^2 {\mathcal {M}}}(x,w,v), \end{aligned}$$

where we have also used the smallness of a to expand \((1-Ca)^{m+1}\), and have used the fact that \(h \leqq a\). We can then change variables to rewrite the above expression as

$$\begin{aligned}&( 1+Ca) \frac{1}{h} \int _{{\mathcal {T}}^2 {\mathcal {M}}} \eta _h(|{\tilde{w}}|_{{\tilde{x}}})| \phi _a(|{\tilde{v}} |_{{\tilde{x}}})|f({\tilde{x}}) \nonumber \\&\quad - f(\exp _{{\tilde{x}}} ({\tilde{w}})) | \left| \frac{\partial {\widetilde{\Psi }}^{-1}(\tilde{x},\tilde{w},\tilde{v}) }{\partial (\tilde{x},\tilde{w},\tilde{v})} \right| \,\mathrm {d}\mathrm {vol}_{{\mathcal {T}}^2 {\mathcal {M}}}(\tilde{x},\tilde{w},\tilde{v}), \end{aligned}$$
(3.11)

where \(\left| \frac{\partial {\widetilde{\Psi }}^{-1}(\tilde{x},\tilde{w},\tilde{v}) }{\partial (\tilde{x},\tilde{w},\tilde{v})} \right| \) is the Jacobian of the transformation \({\widetilde{\Psi }}^{-1}\). As we prove in the Appendix, the Jacobian satisfies

$$\begin{aligned} \left| \left| \frac{\partial {\widetilde{\Psi }}^{-1}(\tilde{x},\tilde{w},\tilde{v}) }{\partial (\tilde{x},\tilde{w},\tilde{v})} \right| - 1 \right| \leqq Ca \end{aligned}$$
(3.12)

for all \((\tilde{x},\tilde{w},\tilde{v}) \in {\widetilde{\Psi }}({\mathcal {B}}_{\tilde{h},a}^2)\).

From (3.12) it follows that the term in (3.11) can be bounded by

$$\begin{aligned} ( 1+Ca) \frac{1}{h} \int _{{\mathcal {T}}^2 {\mathcal {M}}} \eta _h(|\tilde{w}|_{\tilde{x}})| \phi _a(|\tilde{v} |_{\tilde{x}})| f(\tilde{x}) - f( \exp _{\tilde{x}}(\tilde{w})) | \, \mathrm {d}\mathrm {vol}_{{\mathcal {T}}^2 {\mathcal {M}}}(\tilde{x},\tilde{w},\tilde{v}). \end{aligned}$$

This expression can be written as,

$$\begin{aligned}&(1+Ca) \frac{1}{h} \int _{{\mathcal {M}}}\int _{{\mathcal {T}}_{\tilde{x}}{\mathcal {M}}} \int _{{\mathcal {T}}_{\tilde{x}}{\mathcal {M}}} \eta _h(|\tilde{w}|_{\tilde{x}})| \phi _a(|\tilde{v}|_{\tilde{x}})| f(\tilde{x}) \\&\qquad - f( \exp _{\tilde{x}}(\tilde{w})) | \, \mathrm {d}\tilde{v}\, \mathrm {d}\tilde{w} \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(\tilde{x}), \end{aligned}$$

multiplying and dividing its integrand by \(J_{\tilde{x}}(\tilde{w})\), using the bounds on the Jacobian in (3.1) and the upper bound \(\tau _a(\tilde{x})\leqq 1+Ca^2\) (derived analogously to (3.6)), we can deduce that the above expression is smaller than

$$\begin{aligned}&(1+Ca) \frac{1}{h} \int _{{\mathcal {M}}}\int _{{\mathcal {T}}_{\tilde{x}}{\mathcal {M}}} \int _{{\mathcal {T}}_{\tilde{x}}{\mathcal {M}}} \eta _h(|\tilde{w}|_{\tilde{x}})| \phi _a(|\tilde{v}|_{\tilde{x}})| f(\tilde{x})\\&\qquad - f( \exp _{\tilde{x}}(\tilde{w})) |J_{\tilde{x}}(\tilde{w}) \, \mathrm {d}\tilde{v} \, \mathrm {d}\tilde{w} \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(\tilde{x}). \end{aligned}$$

Upon integration on \(\tilde{v}\), we recognize that the above term is precisely

$$\begin{aligned} (1+Ca) \mathrm {TV}_h(f). \end{aligned}$$

Putting all the estimates together, we deduce that:

$$\begin{aligned} \mathrm {TV}_{{{\widetilde{h}}}} (\Lambda _a f) \leqq (1+ Ca)\mathrm {TV}_h(f) + C a \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} \end{aligned}$$

as we wanted to show. \(\quad \square \)

Combining all propositions above we obtain the following.

Corollary 3.7

There exists a constant \(C>0\), such that for all \(0< h\leqq a \leqq \frac{h_{\mathcal {M}}}{2}\), we have for all \(f \in \mathrm {L}^{\infty }({\mathcal {M}})\),

  1. i)

    \(\sigma _\eta \mathrm {TV}(\Lambda _a f) \leqq (1+ C(h^2 + a))\mathrm {TV}_h(f) + C \left( \frac{h}{a^2} + a \right) \Vert f \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}\);

  2. ii)

    \(\Vert \Lambda _a f - f \Vert _{\mathrm {L}^{1}({\mathcal {M}})} \leqq C a \mathrm {TV}_h(f)\).

Proof

From Proposition 3.5 we know that \(\Lambda _a f \in \mathrm {C}^{2}({\mathcal {M}})\) for every \(f\in \mathrm {L}^{\infty }({\mathcal {M}})\) and moreover

$$\begin{aligned} \Vert \Lambda _a f \Vert _{\mathrm {C}^{2}({\mathcal {M}})} \leqq \frac{C}{a^2}\Vert f \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}. \end{aligned}$$

Thus, by Proposition 3.2, it follows that

$$\begin{aligned} \sigma _\eta \mathrm {TV}(\Lambda _a f) \leqq (1+ C{\widetilde{h}}^2) \mathrm {TV}_{{\widetilde{h}}}(\Lambda _a f) + \frac{C h}{a^2} \Vert f \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}, \end{aligned}$$

where \({\widetilde{h}}= h(1-Ca)\). Using Proposition 3.6 we deduce

$$\begin{aligned} \sigma _\eta \mathrm {TV}(\Lambda _a f) \leqq (1+ C(h^2 + a))\mathrm {TV}_h(f) + C \left( \frac{h}{a^2} + a \right) \Vert f \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} \end{aligned}$$

as we wanted to show. The second inequality is just ii) in Proposition 3.5. \(\quad \square \)

In the remainder we make use of the following two lemmas.

Lemma 3.8

For every \(C_0>0\) there exists \(\beta _0>0\) and \(h_{\mathcal {M}}\) such that for every \(0<h<h_{\mathcal {M}}\), and for every \(A\subseteq {\mathcal {M}}\) for which

$$\begin{aligned} \frac{\mathrm {TV}_{h}(\mathbbm {1}_A)}{\min \{ \mathrm {vol}_{\mathcal {M}}(A),\mathrm {vol}_{\mathcal {M}}(A^c)\}}\leqq C_0, \end{aligned}$$

we have

$$\begin{aligned} \min \{ \mathrm {vol}_{\mathcal {M}}(A),\mathrm {vol}_{\mathcal {M}}(A^c) \} \geqq \beta _0. \end{aligned}$$

Proof

Let us fix \(C_0>0\). For the sake of contradiction suppose the result is not true. In that case we would be able to build a sequence \(\{h_k \}_{k \in {\mathbb {N}}}\) with \(h_k \rightarrow 0\) as \(k \rightarrow \infty \) and a sequence of sets \(\{A_k \}_{k \in {\mathbb {N}}}\) satisfying

$$\begin{aligned} \frac{\mathrm {TV}_{h_k}(\mathbbm {1}_{A_k})}{\min \{ \mathrm {vol}_{\mathcal {M}}(A_k),\mathrm {vol}_{\mathcal {M}}(A_k^c) \}}\leqq C_0, \quad \min \{ \mathrm {vol}_{\mathcal {M}}(A_k),\mathrm {vol}_{\mathcal {M}}(A_k^c) \} \leqq \frac{1}{k}. \end{aligned}$$

Without loss of generality we can assume \(\min \{ \mathrm {vol}_{\mathcal {M}}(A_k),\mathrm {vol}_{\mathcal {M}}(A_k^c) \}= \mathrm {vol}_{\mathcal {M}}(A_k)\) (for otherwise we can take complements). Let us now introduce rescaled functions

$$\begin{aligned} {\widetilde{f}}_k := \alpha _k \mathbbm {1}_{A_k}, \quad \alpha _k := \frac{1}{\mathrm {vol}_{\mathcal {M}}(A_k)}, \quad k \in {\mathbb {N}}. \end{aligned}$$

This sequence satisfies:

  1. i)

    \(\Vert {\widetilde{f}}_k \Vert _{\mathrm {L}^{1}({\mathcal {M}})}=1\), for all \(k \in {\mathbb {N}}\),

  2. ii)

    \(\mathrm {TV}_{h_k}({\widetilde{f}}_k)\leqq C_0\), for all \(k \in {\mathbb {N}}\).

Thanks to the above properties we can use the compactness result from [42, Lemma 4.4] to conclude that there exists \({\widetilde{f}} \in \mathrm {L}^{1}({\mathcal {M}})\) such that (up to subsequence not relabeled)

$$\begin{aligned} {\widetilde{f}}_k \rightarrow _{\mathrm {L}^{1}({\mathcal {M}})} {\widetilde{f}}. \end{aligned}$$

We notice that although the result in [42] is stated in the flat Euclidean case, it is still good enough for our purpose as one can work in local coordinates and use the compactness and smoothness of \({\mathcal {M}}\) and a gluing argument to extend the local result to a global one.

Now, observe that the function \({\widetilde{f}}\) must satisfy \(\Vert {\widetilde{f}} \Vert _{\mathrm {L}^{1}({\mathcal {M}})}=1\). Moreover, since the functions \({\widetilde{f}}_{k}\) are of the form \({\widetilde{f}}_k = \alpha _k \mathbbm {1}_{A_k}\), necessarily the function \({\widetilde{f}}\) has the form \({\widetilde{f}}= \alpha \mathbbm {1}_{A}\) for a positive real number \(\alpha \). It should also hold that

$$\begin{aligned} \mathrm {vol}_{\mathcal {M}}(A_k) \rightarrow \mathrm {vol}_{\mathcal {M}}(A), \quad k \rightarrow \infty . \end{aligned}$$

From this and the fact that \(\alpha \mathrm {vol}_{\mathcal {M}}(A) =1\) we conclude that

$$\begin{aligned} \alpha = \frac{1}{\mathrm {vol}_{\mathcal {M}}(A)}= \lim _{k \rightarrow \infty } \frac{1}{\mathrm {vol}_{\mathcal {M}}(A_k)}. \end{aligned}$$

This however contradicts the fact that \(\frac{1}{\mathrm {vol}_{\mathcal {M}}(A_k)} \geqq k\) for all \(k \in {\mathbb {N}}\). \(\quad \square \)

The next lemma states that the Cheeger constant \({\mathcal {C}}_{\mathcal {M}}\) can also be written as the minimum value of an optimization problem over BV functions taking values on [0, 1]. The proof is standard and we present it for the convenience of the reader. We refer the reader to Section 2 in [22] for a proof of an analogous result in the graph setting.

Lemma 3.9

The Cheeger constant \({\mathcal {C}}_{\mathcal {M}}\) admits the representation

$$\begin{aligned} {\mathcal {C}}_{\mathcal {M}}= \min _{f:{\mathcal {M}}\rightarrow [0,1]} \frac{\mathrm {TV}(f)}{\Vert f- m_1(f)\Vert _{\mathrm {L}^{1}({\mathcal {M}})}} \end{aligned}$$

where in the above, \(m_1(f)\) is a median of f, i.e., any number c in [0, 1] where the minimum

$$\begin{aligned} \min _{c\in [0,1]} \Vert f -c \Vert _{\mathrm {L}^{1}({\mathcal {M}})} \end{aligned}$$
(3.13)

is achieved.

Proof

A simple computation shows that for a BV set \(E \subseteq {\mathcal {M}}\) one has

$$\begin{aligned} \Vert \mathbbm {1}_{E} -m_1(\mathbbm {1}_E) \Vert _{\mathrm {L}^{1}({\mathcal {M}})} = \min \{ \mathrm {vol}_{\mathcal {M}}(E), \mathrm {vol}_{\mathcal {M}}(E^c) \} \end{aligned}$$

and thus is clear that

$$\begin{aligned} {\mathcal {C}}_{\mathcal {M}}\geqq \min _{f:{\mathcal {M}}\rightarrow [0,1]} \frac{\mathrm {TV}(f)}{\Vert f- m_1(f)\Vert _{\mathrm {L}^{1}({\mathcal {M}})}}. \end{aligned}$$

Conversely, for a BV function \(f: {\mathcal {M}}\rightarrow [0,1]\) we let \(g:= f -m_1(f)\). We claim

$$\begin{aligned} \mathrm {vol}_{\mathcal {M}}(\{g\leqq t\})&\leqq \mathrm {vol}_{\mathcal {M}}(\{g>t\})& \forall t\leqq 0 \\ \mathrm {vol}_{\mathcal {M}}(\{g\leqq t\})&\geqq \mathrm {vol}_{\mathcal {M}}(\{g>t\})& \forall t\geqq 0. \end{aligned}$$

Since the LHS is increasing and the RHS is decreasing then it is enough to show that

$$\begin{aligned} \mathrm {vol}_{\mathcal {M}}(\{g\leqq 0\}) = \mathrm {vol}_{\mathcal {M}}(\{g>0\}). \end{aligned}$$

This is just the optimality condition for (3.13) so it holds by definition of \(m_1(f)\). Now,

$$\begin{aligned} \begin{aligned} \mathrm {TV}(f)= \mathrm {TV}(g)&= \int _{-\infty }^{\infty }\mathrm {TV}( \mathbbm {1}_{ \{ g\leqq t \}}) \, \mathrm {d}t \\&\geqq {\mathcal {C}}_{\mathcal {M}}\left( \int _{-\infty }^0 \mathrm {vol}_{{\mathcal {M}}}( \{ g \leqq t \}) \mathrm {d}t + \int _{0}^\infty \mathrm {vol}_{{\mathcal {M}}}( \{ g > t \}) \, \mathrm {d}t \right) \\&= {\mathcal {C}}_{\mathcal {M}}\Vert g\Vert _{\mathrm {L}^{1}({\mathcal {M}})}. \end{aligned} \end{aligned}$$

Hence,

$$\begin{aligned} \frac{\mathrm {TV}(f)}{\Vert f- m_1(f) \Vert _{\mathrm {L}^{1}({\mathcal {M}})}} \geqq {\mathcal {C}}_{\mathcal {M}}. \end{aligned}$$

From the fact that f was arbitrary we obtain the desired inequality. \(\quad \square \)

4 Proofs of the Main Results

Sections 4.1 and 4.2 combine to prove Theorem 2.2. In Section 4.3 we prove Theorem 2.15.

4.1 Upper Bound

In this section we quantify the relationship

$$\begin{aligned} {\mathcal {C}}_{n,\varepsilon } \lesssim \mathcal {\sigma _\eta }{\mathcal {C}}_{\mathcal {M}}. \end{aligned}$$

For this purpose we discuss pointwise estimates for the approximation of \(\mathrm {TV}(f)\) with \(\mathrm {GTV}_{n,\varepsilon }(f)\) for a fixed BV function \(f : {\mathcal {M}}\rightarrow {\mathbb {R}}\). We begin by stating an estimate for \(|\mathrm {GTV}_{n,\varepsilon }(f)- {\mathbb {E}}(\mathrm {GTV}_{n,\varepsilon }(f))|\).

Proposition 4.1

There is a constant \(C>0\) such that for all \(0<\varepsilon <\varepsilon _0\) (i.e. \(\varepsilon \) small enough), all \(0<\zeta < \zeta _0\) (all \(\zeta \) small enough) and all \(f\in \mathrm {BV}({\mathcal {M}})\cap \mathrm {L}^{\infty }({\mathcal {M}})\) satisfying \(n\zeta \varepsilon ^{\frac{m+1}{2}}\geqq \left( \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}+\mathrm {TV}(f)\right) \geqq \sqrt{\zeta }\), we have:

$$\begin{aligned}&{{\,\mathrm{{\mathbb {P}}}\,}}\left( \left| \frac{n}{n-1}\mathrm {GTV}_{n,\varepsilon }(f) - {\widetilde{\mathrm {TV}}}_\varepsilon (f) \right| \geqq ( 1+ \frac{1}{R(f)} ) \zeta \right) \\&\quad \leqq C \exp \left( -\frac{Cn{\zeta } \min \{\varepsilon ^{\frac{m+1}{2}},\varepsilon \zeta \}}{(R(f))^2}\right) , \end{aligned}$$

where

$$\begin{aligned} {\widetilde{\mathrm {TV}}}_{\varepsilon }(f):= & {} \frac{1}{\varepsilon ^{m+1}}\int _{\mathcal {M}}\int _{\mathcal {M}}|f(x) - f(y)| \eta \\&\times \left( \frac{|x-y|}{\varepsilon } \right) \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y), \quad f \in \mathrm {L}^{1}({\mathcal {M}}), \end{aligned}$$

and

$$\begin{aligned} R(f):= \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} + \mathrm {TV}(f). \end{aligned}$$

In addition, with probability at least \(1-C \exp \left( -\frac{Cn {\zeta } \min \{ \varepsilon ^{\frac{m+1}{2}},\varepsilon \zeta \}}{(R(f))^2}\right) \) we have that

$$\begin{aligned} \mathrm {GTV}_{n,\varepsilon }(f) \leqq \sigma _\eta (1+C\varepsilon ^2) \mathrm {TV}(f) + (1+R(f)) {\zeta }. \end{aligned}$$

Proof

When f is an indicator function, these types of estimates have been previously proved in [44]. We follow the same proof, highlighting changes that occur due to considering general BV functions.

Since the variables \(x_1, \dots , x_n\) are i.i.d. samples, \(\mathrm {GTV}_{n,\varepsilon }(f)\) is a U-statistic [47] of order two as it can be written as

$$\begin{aligned} \mathrm {GTV}_{n,\varepsilon }(f) = \frac{1}{n^2}\sum _{i\ne j} \phi (x_i,x_j),\qquad \phi (x,y):= \frac{1}{\varepsilon ^{m+1}}\eta \left( \frac{|x-y|}{\varepsilon }\right) |f(x)-f(y)|. \end{aligned}$$

Notice that

$$\begin{aligned} {\mathbb {E}}(\mathrm {GTV}_{n,\varepsilon }(f)) = \frac{n-1}{n} \widetilde{\mathrm {TV}_{\varepsilon }}(f). \end{aligned}$$

We remark that the only difference between \({\widetilde{\mathrm {TV}}}_\varepsilon (f)\) and \(\mathrm {TV}_\varepsilon (f)\) defined in (2.8) is the fact that in the former we use the Euclidean distance to determine the proximity of points whereas for the latter we use the geodesic distance. We will first provide concentration inequalities for \({\widetilde{\mathrm {TV}}}_\varepsilon (f)\), and then relate back to \(\mathrm {TV}_\varepsilon (f)\).

Using the Hoeffding decomposition of U-statistics, we will write

$$\begin{aligned} \frac{n}{n-1}\mathrm {GTV}_{n,\varepsilon }(f) - {\widetilde{\mathrm {TV}}}_\varepsilon (f)= 2U_{1} + U_2 \end{aligned}$$

where

$$\begin{aligned} U_1:= \frac{1}{n}\sum _{i=1}^n g_{1}(x_i) \end{aligned}$$

and

$$\begin{aligned} U_2:= \frac{2}{n(n-1)}\sum _{i=1}^n \sum _{j>i}g_{2}(x_i, x_j). \end{aligned}$$

\(U_1\) and \(U_2\) are canonical U-statistics of order one and two respectively. In the above,

$$\begin{aligned} g_1(x):= & {} {\overline{\phi }}(x) -{\widetilde{\mathrm {TV}}}_\varepsilon (f) \\ g_2(x,y):= & {} \phi (x,y) - {\overline{\phi }}(x) - {\overline{\phi }}(y)+ {\widetilde{\mathrm {TV}}}_\varepsilon (f) \\ {\overline{\phi }}(x):= & {} \int _{{\mathcal {M}}}\phi (x,y) \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(y). \end{aligned}$$

We can then make use of the concentration inequalities for canonical U statistics discussed in section 3 of [47], which imply that: for all \(t>0\),

$$\begin{aligned} {{\,\mathrm{{\mathbb {P}}}\,}}\left( |U_1| \geqq \frac{t}{n} \right) \leqq K \exp \left( \frac{- t^2}{K(t A_1 + B_1^2)} \right) , \end{aligned}$$

and, for all \(t>\frac{B_2}{K}\),

$$\begin{aligned} {{\,\mathrm{{\mathbb {P}}}\,}}\left( |U_2| \geqq \frac{t}{n(n-1)} \right) \leqq K \exp \left( -\frac{1}{K} \min \left\{ \left( \frac{t}{A_2} \right) ^{1/2}, \frac{t}{B_2}, \left( \frac{t}{C_2} \right) ^{2/3} \right\} \right) , \end{aligned}$$

for a universal constant \(K>0\). In the above,

$$\begin{aligned} A_1:= \Vert g_1 \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}, \quad B_1:= \sqrt{n}\Vert g_1 \Vert _{\mathrm {L}^{2}({\mathcal {M}})}, \end{aligned}$$

and

$$\begin{aligned} A_2:= & {} \Vert g_2 \Vert _{\mathrm {L}^{\infty }({\mathcal {M}}\times {\mathcal {M}})}, \quad \\ B_2:= & {} n \Vert g_2 \Vert _{\mathrm {L}^{2}({\mathcal {M}}\times {\mathcal {M}})}, \quad \\ (C_2)^2:= & {} n \left\Vert \int _{{\mathcal {M}}}g_2^2(\cdot ,y) \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y) \right\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}. \end{aligned}$$

We remark that the first inequality controls the order one U-statistic using precisely Bernstein’s inequality.

We seek to control all the above quantities using estimates as in [44], with the modification that we allow f to be a function of bounded variation. For the \(\mathrm {L}^{\infty }\) estimates, by recalling the definition of \(\phi ,{\bar{\phi }}\), we see that

$$\begin{aligned} A_1 \leqq \frac{C \Vert f \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}}{\varepsilon },\qquad A_2 \leqq \frac{C\Vert f \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}}{\varepsilon ^{m+1}}. \end{aligned}$$

On the other hand, in estimating the squared terms, one needs to estimate various integrals of \(\phi ^2\) and \({\bar{\phi }}^2\). In the case of \(B_2\), after some straightforward expansions we obtain

$$\begin{aligned} B_2^2= & {} n^2 \left( \int _{\mathcal {M}}\int _{\mathcal {M}}\phi (x,y)^2 \,\mathrm {d}\mathrm {vol}_{\mathcal {M}}(x) \,\mathrm {d}\mathrm {vol}_{\mathcal {M}}(y) \right. \\&\left. + ({\widetilde{\mathrm {TV}}}_\varepsilon (f))^2 -2 \int _{\mathcal {M}}{\bar{\phi }}^2(x)\, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(x)\right) . \end{aligned}$$

Using Jensen’s inequality, and then bounding \(\phi ^2 \leqq \frac{2\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}}{\varepsilon ^{m+1}}\phi \), we can write, for \(\varepsilon \leqq \varepsilon _0\),

$$\begin{aligned} B_2^2 \leqq Cn^2\left( \frac{\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}{\widetilde{\mathrm {TV}}}_\varepsilon (f)}{\varepsilon ^{m+1}} + ({\widetilde{\mathrm {TV}}}_\varepsilon (f))^2\right) \leqq Cn^2 \frac{(\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} +{\widetilde{\mathrm {TV}}}_\varepsilon (f))^2 }{\varepsilon ^{m+1}} \end{aligned}$$

Similarly, by expanding \(g_2\), we can bound, for \(\varepsilon \leqq \varepsilon _0\),

$$\begin{aligned} C_2^2&\leqq 2n \left( \left\| \int _{\mathcal {M}}\phi (\cdot ,y)^2 \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}(y)\right\| _{\mathrm {L}^{\infty }({\mathcal {M}})} + \left\| \int _{\mathcal {M}}{\bar{\phi }}(y)^2\,\mathrm {d}\mathrm {vol}_{\mathcal {M}}(y)\right\| _{\mathrm {L}^{\infty }({\mathcal {M}})}\right. \\&\left. \quad + \left\| {\bar{\phi }}^2\right\| _{\mathrm {L}^{\infty }({\mathcal {M}})} + {\widetilde{\mathrm {TV}}}_\varepsilon (f)^2 \right) \\&\leqq Cn \left( \frac{\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}^2}{\varepsilon ^{m+2}} + \frac{\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}{\widetilde{\mathrm {TV}}}_\varepsilon (f)}{\varepsilon ^{m+1}} + {\widetilde{\mathrm {TV}}}_\varepsilon (f)^2 \right) \\&\quad \leqq Cn \frac{(\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} +{\widetilde{\mathrm {TV}}}_\varepsilon (f))^2 }{\varepsilon ^{m+2}}. \end{aligned}$$

To bound \(B_1\) [44], we first compute

$$\begin{aligned}&\int _{{\mathcal {M}}} \bar{\phi }^2(x) \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x) \\&\quad = \frac{1}{\varepsilon ^{2(m+1)}} \int _{{\mathcal {M}}} \left( \int _{{\mathcal {M}}} \eta \left( \frac{|x-y|}{\varepsilon }\right) |f(x) - f(y)| \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(y) \right) ^2 \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x) \\&\quad \leqq \frac{1}{\varepsilon ^{2(m+1)}} \int _{{\mathcal {M}}} \left( \int _{{\mathcal {M}}} \eta \left( \frac{d_{{\mathcal {M}}}(x,y)}{C\varepsilon }\right) |f(x) - f(y)| \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(y) \right) ^2 \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x) \\&\qquad \text {since } C|x-y|\geqq d_{{\mathcal {M}}}(x,y) \\&\quad = \frac{C}{\varepsilon ^2} \int _{{\mathcal {M}}} \left( \int _{B(0,1)\subseteq {\mathcal {T}}_x{\mathcal {M}}} \eta (|w|_x) |f(x)-f(\exp _x(C\varepsilon w))| J_x(C\varepsilon w) \, \mathrm {d}w \right) ^2 \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x). \end{aligned}$$

Now, since

$$\begin{aligned}&\int _{B(0,1)\subseteq {\mathcal {T}}_x{\mathcal {M}}} \eta (|w|_x) |f(x)-f(\exp _x(C\varepsilon w))| J_x(C\varepsilon w) \, \mathrm {d}w \\&\quad \leqq 2\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} \int _{B(0,1)\subseteq {\mathcal {T}}_x{\mathcal {M}}} \eta (|w|_x) (1+C|\varepsilon w|^2) \, \mathrm {d}w \\&\quad \leqq C\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} , \end{aligned}$$

we have that

$$\begin{aligned} \int _{{\mathcal {M}}} \bar{\phi }^2(x) \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x)&\leqq \frac{C\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}}{\varepsilon ^2} \int _{{\mathcal {M}}} \int _{B(0,1)\subseteq {\mathcal {T}}_x{\mathcal {M}}} \eta (|w|_x) |f(x)\\&\quad -f(\exp _x(C\varepsilon w))| J_x(C\varepsilon w) \, \mathrm {d}w \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x) \\&= \frac{C\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}}{\varepsilon ^{2+m}} \int _{{\mathcal {M}}} \int _{{\mathcal {M}}} \eta \left( \frac{d_{{\mathcal {M}}}(x,y)}{C\varepsilon }\right) |f(x)\\&\quad -f(y))| \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(y) \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x) \\&= \frac{C\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}}{\varepsilon } \mathrm {TV}_{C\varepsilon }(f) \\&\leqq \frac{C\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}}{\varepsilon } \mathrm {TV}(f) \end{aligned}$$

by Proposition 3.1. Hence expanding again we obtain

$$\begin{aligned} B_1^2 \leqq \frac{Cn}{\varepsilon }\left( \mathrm {TV}(f)\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} + {\widetilde{\mathrm {TV}}}_\varepsilon (f)^2\right) . \end{aligned}$$

Now we provide a way to compare \(\mathrm {TV}(f)\) and \({\widetilde{\mathrm {TV}}}_\varepsilon (f)\). First, we recall (see [39, Proposition 2]) that

$$\begin{aligned} d_{\mathcal {M}}(x,y) \leqq |x-y| + C|x-y|^3. \end{aligned}$$

In turn, after applying Proposition 3.1 we obtain

$$\begin{aligned} {\widetilde{\mathrm {TV}}}_{\varepsilon }(f)\leqq & {} \left( 1+C\varepsilon ^2 \right) ^{m+1} \mathrm {TV}_{\varepsilon (1+ C\varepsilon ^2)}(f) \nonumber \\\leqq & {} (1+ C\varepsilon ^2)\mathrm {TV}_{\varepsilon (1+ C\varepsilon ^2)}(f) \leqq (1+C \varepsilon ^2)\sigma _\eta \mathrm {TV}(f), \end{aligned}$$
(4.1)

where in the above the constant C changes from inequality to inequality. We obtain:

$$\begin{aligned} B_1^2 \leqq \frac{Cn}{\varepsilon } \left( \mathrm {TV}(f)\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} +\mathrm {TV}(f)^2\right) \leqq \frac{Cn}{\varepsilon }\left( \mathrm {TV}(f)+\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}\right) ^2, \end{aligned}$$

and

$$\begin{aligned} B_2^2&\leqq \frac{Cn^2}{\varepsilon ^{m+1}}\left( \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}+\mathrm {TV}(f)\right) ^2 \\ C_2^2&\leqq \frac{Cn}{\varepsilon ^{m+2}}\left( \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}+\mathrm {TV}(f)\right) ^2. \end{aligned}$$

Hence, for \(t>0\), we find that

$$\begin{aligned} {{\,\mathrm{{\mathbb {P}}}\,}}\left( |U_1| \geqq \frac{t}{n} \right) \leqq C\exp \left( \frac{- Ct^2}{t \frac{\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}}{\varepsilon } + n \frac{(\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} +\mathrm {TV}(f))^2 }{\varepsilon }} \right) , \end{aligned}$$
(4.2)

and, for \(t\geqq \frac{Cn}{\varepsilon ^{\frac{m+1}{2}}}\left( \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}+\mathrm {TV}(f)\right) \),

$$\begin{aligned} {{\,\mathrm{{\mathbb {P}}}\,}}\left( |U_2| \geqq \frac{t}{n(n-1)} \right) \leqq C e^{-CF_{t,\varepsilon ,n}(f)}, \end{aligned}$$
(4.3)

where

$$\begin{aligned}&F_{t,\varepsilon ,n}(f) = \min \left\{ \left( \frac{t\varepsilon ^{m+1}}{\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}} \right) ^{1/2}, \frac{t\varepsilon ^{\frac{m+1}{2}}}{n(\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} +\mathrm {TV}(f))}, \right. \\&\quad \left. \left( \frac{t\varepsilon ^{\frac{m+2}{2}}}{ (\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} +\mathrm {TV}(f)) \sqrt{n}} \right) ^{2/3} \right\} . \end{aligned}$$

Choosing \(t=\frac{n(n-1)\zeta }{2\left( \Vert f\Vert _{\mathrm {L}^{\infty }}+\mathrm {TV}(f)\right) }\) in (4.3), and from the upper bound \(\sqrt{\zeta }\leqq \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}+\mathrm {TV}(f)\) (which is the same as \( \zeta ^{1/3}\leqq \left( \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}+\mathrm {TV}(f)\right) ^{2/3}\)), we can infer that

$$\begin{aligned} F_{t,\varepsilon ,n}(f) \geqq \frac{Cn\zeta \varepsilon ^{\frac{m+1}{2}}}{\left( \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}+\mathrm {TV}(f)\right) ^2}; \end{aligned}$$

indeed, from the first term on the right hand side in the expression for \(F_{t,\varepsilon , n}(f)\) we get a term \(\sqrt{\zeta }\) which can be bounded from below by \(\zeta (\left( \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}+\mathrm {TV}(f)\right) )^{-1}\); from the third term, on the other hand, we get a term \(\zeta ^{2/3}\) which is bounded from below by \(\zeta \left( \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}+\mathrm {TV}(f)\right) ^{-2/3}\), and we use the smallness of \(\varepsilon \) to conclude that \(\varepsilon ^{(m+2)/3} \gtrsim \varepsilon ^{(m+1)/2}\).

Hence, choosing \(t=\frac{n\zeta }{2}\) in (4.2) and \(t=\frac{n(n-1)\zeta }{\left( \Vert f\Vert _{\mathrm {L}^{\infty }}+\mathrm {TV}(f)\right) }\) in (4.3) implies

$$\begin{aligned}&{{\,\mathrm{{\mathbb {P}}}\,}}\left( \left| \frac{n}{n-1}\mathrm {GTV}_{n,\varepsilon }(f) - {\widetilde{\mathrm {TV}}}_\varepsilon (f)\right| \geqq (1+\frac{1}{R(f)})\zeta \right) \\&\quad = {{\,\mathrm{{\mathbb {P}}}\,}}\left( |2U_1+U_2|\geqq (1+\frac{1}{R(f)}) \zeta \right) \\&\quad \leqq {{\,\mathrm{{\mathbb {P}}}\,}}\left( |U_1|\geqq \frac{\zeta }{2} \text { or } |U_2|\geqq \frac{\zeta }{R(f)}\right) \\&\quad \leqq {{\,\mathrm{{\mathbb {P}}}\,}}\left( |U_1|\geqq \frac{\zeta }{2}\right) + {{\,\mathrm{{\mathbb {P}}}\,}}\left( |U_2|\geqq \frac{\zeta }{R(f)}\right) \\&\quad \leqq C\exp \left( -\frac{Cn\zeta ^2\varepsilon }{\left( \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}+\mathrm {TV}(f)\right) ^2}\right) \\&\qquad + C\exp \left( -\frac{Cn\zeta \varepsilon ^{\frac{m+1}{2}}}{\left( \Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})}+\mathrm {TV}(f)\right) ^2}\right) . \end{aligned}$$

Using Equation (4.1) then implies that with probability at least \(1-C \exp \left( -\frac{Cn\zeta \min \{\varepsilon ^{\frac{m+1}{2}},\varepsilon \zeta \}}{(\Vert f\Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} + \mathrm {TV}(f))^2}\right) \) we have that

$$\begin{aligned} \mathrm {GTV}_{n,\varepsilon }(f) \leqq \sigma _\eta (1+C\varepsilon ^2) \mathrm {TV}(f) + (1+\frac{1}{R(f)})\zeta . \end{aligned}$$

\(\square \)

Remark 4.2

It is possible to get quantitative estimates for the bias of the estimator \(\mathrm {GTV}_{n,\varepsilon }(f)\) under smoothness assumptions on the function f. In [2] a similar estimate has been carried out for functions of the form \(f= \mathbbm {1}_A\) where A is a \(\mathrm {BV}\) set, provided that the boundary of A is smooth (see, e.g., [2, Lemma 6]). An improved version of these bias estimates appears in [44] (at least in the flat Euclidean setting). For smooth functions f a simple Taylor expansion allows us to obtain error estimates for the difference \(| \mathrm {TV}(f) - {\mathbb {E}}(\mathrm {GTV}_{n,\varepsilon }(f)))|\). However, for a general \(\mathrm {BV}\) function for which we may not have information on its regularity, these estimates can not be used. Fortunately, thanks to Proposition 3.1 we do not need an estimate on the bias of the estimator.

Remark 4.3

It is worth highlighting that the constants C in the statement of Proposition 4.1 depend only on the dimension of the manifold \({\mathcal {M}}\), on the reach of \({\mathcal {M}}\) (where the reach is the largest number R such that any point closer than R to the manifold has a unique nearest point on the manifold) to measure the discrepancy between Euclidean and geodesic distances in \({\mathcal {M}}\), on an upper bound on the absolute value of sectional curvatures of \({\mathcal {M}}\), and on the injectivity radius of \({\mathcal {M}}\). The aforementioned geometric quantities determine how small \(\varepsilon \) must be for all the comparisons between non-local and local total variation energies discussed in section 3 to hold.

Proof of upper bound in Theorem 2.2

For simpler notation in this proof, for \(A \subset {\mathcal {M}}\) we write \(\mathrm {Bal}(A) = \min \{ \mathrm {vol}_{\mathcal {M}}(A),1 - \mathrm {vol}_{\mathcal {M}}(A)\}\) and \(\mathrm {Bal}_n(A) = \min \{\nu _n(A_n), \nu _n(A^c)\}\). For any fixed measurable set A one can use Hoeffding’s inequality to show that, with probability at least \(1-2\exp (-2t^2 (\mathrm {Bal}(A))^2 n)\), \(|\mathrm {Bal}(A) - \mathrm {Bal}_n(A)| \leqq t \mathrm {Bal}(A)\).

We now let \(A=A^*\) be a solution to the continuum Cheeger problem and set \(f= \mathbbm {1}_{A^*}\) to be plugged in Proposition 4.1; existence of solutions to the Cheeger problem on \({\mathcal {M}}\) can be guaranteed using standard tools in the calculus of variations. Moreover, as the manifold \({\mathcal {M}}\) is connected, compact, and smooth we necessarily have \(\mathrm {Bal}(A^*) > 0\). By the definition of \({\mathcal {C}}_{n,\varepsilon }\), and Proposition 4.1, with probability at least \(1-2\exp (-2t^2 (\mathrm {Bal}(A^*))^2 n)-C \exp \left( -\frac{n}{R(f)}\zeta \min \{\varepsilon ^{\frac{m+1}{2}},\varepsilon \zeta \}\right) \) we have

$$\begin{aligned} {\mathcal {C}}_{n,\varepsilon }&\leqq \frac{\mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{A^*})}{\mathrm {Bal}_n(A^*)} \nonumber \\&\leqq \frac{(1 + C\varepsilon ^2) \sigma _\eta \mathrm {TV}(\mathbbm {1}_{A^*}) + (1+ \frac{1}{R(f)}) \zeta }{\mathrm {Bal}_n(A^*)}\nonumber \\&\leqq (1+2t) \left( \frac{(1 + C\varepsilon ^2) \sigma _\eta \mathrm {TV}(\mathbbm {1}_{A^*}) + (1+ \frac{1}{R(f)}) \zeta }{\mathrm {Bal}(A^*)}\right) \nonumber \\&\leqq \frac{\sigma _\eta \mathrm {TV}(\mathbbm {1}_{A^*})}{\mathrm {Bal}(A^*)} + C\left( \varepsilon ^2 + \zeta + t \right) \nonumber \\&= \sigma _\eta {\mathcal {C}}_{\mathcal {M}}+ C\left( \varepsilon ^2 + \zeta + t\right) , \end{aligned}$$
(4.4)

provided \(0<t<1/2\). We can then select \(t=\varepsilon ^{\frac{m+1}{4}}\zeta ^{\frac{1}{2}}\) and obtain the desired result. We notice that the constant C appearing in front of \((\varepsilon ^2 +\zeta +t\)) depends on the geometric quantities mentioned in Remark 4.3 as well as on the total variation and balance term of a minimizer \(A^*\) of the continuum problem (i.e. a quantity that depends on the manifold \({\mathcal {M}}\)). \(\quad \square \)

4.2 Lower Bound

In this section we quantify the relationship

$$\begin{aligned} \mathcal {\sigma _\eta }{\mathcal {C}}_{\mathcal {M}}\lesssim {\mathcal {C}}_{n,\varepsilon }. \end{aligned}$$

For that purpose we introduce an interpolation operator \({\mathcal {I}}_a\) mapping discrete functions to functions on \({\mathcal {M}}\) while reducing the total variation energy. More concretely, we construct a map

$$\begin{aligned} {\mathcal {I}}_a : \mathrm {L}^{1}(\nu _n) \rightarrow \mathrm {C}^{\infty }({\mathcal {M}}) \end{aligned}$$

with the crucial property that for all \(u \in \mathrm {L}^{1}(\nu _n)\) we have (roughly speaking)

$$\begin{aligned} \sigma _\eta \mathrm {TV}( {\mathcal {I}}_a u) \lesssim \mathrm {GTV}_{n,\varepsilon }(u) \end{aligned}$$

and for which \(u \approx {\mathcal {I}}_a(u)\). Here \(\sigma _\eta \) is as defined in (2.7). These estimates hold with very high probability, as described in more detail in Proposition 4.5 below. The interpolation operator takes the form

$$\begin{aligned} {\mathcal {I}}_a : \mathrm {L}^{1}(\nu _n) \rightarrow \mathrm {C}^{\infty }({\mathcal {M}}) \\ {\mathcal {I}}_a u = \Lambda _a (u\circ T_n) , \end{aligned}$$

where \(T_n: {\mathcal {M}}\rightarrow {\mathcal {M}}_n\) is a suitable transport map as defined in the following proposition, and \(\Lambda _a\) is the smoothing operator defined in (3.3). The proof of the next proposition can be found in [14, Proposition 2.11].

Proposition 4.4

Let \({\mathcal {M}}\) be a smooth, connected, orientable, compact, manifold of dimension m embedded in \({\mathbb {R}}^d\). Then, there exists constants (that may depend on \({\mathcal {M}}\)) \(\theta _0,\delta _0,C,c,c'>0\) such that if \(\delta \leqq \delta _0\) , \( c' \log (n)/n \leqq \theta ^2 \delta ^m \) and \(\theta _0\geqq \theta >0\) we have, with probability at least \(1-ne^{-cn\theta ^2\delta ^m}\), that there exists a probability measure \({{\widetilde{\nu }}}_n\) with density function \({{\widetilde{\rho }}} _n : {\mathcal {M}}\rightarrow {\mathbb {R}}\) and a transport map \(T_n\) between \({\tilde{\nu }}_n\) and \(\nu _n\) (written \(T_{n \sharp }{{\tilde{\nu }}}_n = \nu _n\)) such that

$$\begin{aligned} \sup _{x \in {\mathcal {M}}} d_{\mathcal {M}}(x, T_n(x)) \leqq {\delta } , \end{aligned}$$
(4.5)

and such that

$$\begin{aligned} \Vert 1 - {\widetilde{\rho }}_n \Vert _{\mathrm {L}^{\infty }({\mathcal {M}})} \leqq C \left( \theta + \delta \right) . \end{aligned}$$
(4.6)

Proposition 4.5

Let \(\Lambda _a\) be as in (3.3) for \(a>0\) small enough and for given \(\theta ,\delta \) define \({\mathcal {I}}_a : \mathrm {L}^{2}(\nu _n) \rightarrow \mathrm {C}^{\infty }_c({\mathcal {M}})\) by

$$\begin{aligned} {\mathcal {I}}_a(u) := \Lambda _a (u \circ T_n) \end{aligned}$$
(4.7)

where \(T_n\) is an optimal transport map as in Proposition 4.4. Then, with probability at least \(1- n \exp \left( - C n \theta ^2 {\delta }^{m} \right) \), for every \(u: {\mathcal {M}}_n \rightarrow {\mathbb {R}}\) we have

  1. i)

    \(\Vert {\mathcal {I}}_a(u)\Vert _{\mathrm {C}^{k}({\mathcal {M}})} \leqq Ca^{-k} \Vert u\Vert _{{\mathrm {L}^{\infty }(\nu _n)}}\).

  2. ii)

    \(\Vert u\circ T_n- {\mathcal {I}}_a(u)\Vert _{\mathrm {L}^{1}({\mathcal {M}})} \leqq Ca\mathrm {GTV}_{n,\varepsilon }(u)\).

  3. iii)

    \(\sigma _\eta \mathrm {TV}({\mathcal {I}}_a u ) \leqq (1+ C(\varepsilon ^2 +a + \frac{{\delta }}{\varepsilon } + \theta ) ) \mathrm {GTV}_{n,\varepsilon }(u) + C\left( \frac{\varepsilon }{a^2}+a \right) \Vert u \Vert _{\mathrm {L}^{\infty }(\nu _n)}\).

Proof

i) is a direct consequence of Proposition 3.5. To show ii) and iii) we let

$$\begin{aligned} h:= \varepsilon - 2 {\delta } >0 \end{aligned}$$

and \({\widetilde{\nu }}_n\) be as in Proposition 4.4. From the fact that \(T_{n \sharp } {\widetilde{\nu }}_n=\nu _n\) we can write for an arbitrary \(u: {\mathcal {M}}\rightarrow {\mathbb {R}}\)

$$\begin{aligned} \mathrm {GTV}_{n,\varepsilon }(u)= & {} \frac{1}{\varepsilon } \int _{{\mathcal {M}}}\int _{{\mathcal {M}}} \eta _\varepsilon \left( |T_n(x)-T_n(y)|\right) |u(T_n(x)) \\&- u(T_n(y))| {\widetilde{\rho }}_n(y) {\widetilde{\rho }}_n(x)\,\mathrm {d}\mathrm {vol}_{\mathcal {M}}(y)\,\mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x). \end{aligned}$$

Now we notice that if \(x,y \in {\mathcal {M}}\) are such that \(d_{\mathcal {M}}(x,y)\leqq h\) then necessarily \(|T_n(x) - T_n(y)| \leqq \varepsilon \) (because \(|x- T_n(x)|\), \(|x-y|\) are bounded by \(\delta \), \(d_{\mathcal {M}}(x,y)\) respectively). This means that for any \(x,y \in {\mathcal {M}}\) it holds

$$\begin{aligned} \eta \left( \frac{d_{{\mathcal {M}}}(x,y)}{h} \right) \leqq \eta \left( \frac{|T_n(x)-T_n(y)|}{\varepsilon } \right) , \end{aligned}$$

and thus

$$\begin{aligned} \mathrm {GTV}_{n,\varepsilon }(u)\geqq & {} \frac{h^m}{\varepsilon ^{m+1}}\int _{\mathcal {M}}\int _{\mathcal {M}}\eta _h\left( d_{\mathcal {M}}(x,y)\right) |u \circ T_n(x) \\&- u \circ T_n(y)|{\widetilde{\rho }}_n(x){{\widetilde{\rho }}}_n(y)\,\mathrm {d}\mathrm {vol}_{\mathcal {M}}(y)\,\mathrm {d}\mathrm {vol}_{\mathcal {M}}(x). \end{aligned}$$

We now use the second inequality in Proposition 4.4 to conclude that

$$\begin{aligned} \mathrm {GTV}_{n,\varepsilon }(u)\geqq \left( 1- \frac{2{\delta }}{\varepsilon } \right) ^{m+1}\left( 1 - C(\theta +{\delta })\right) \mathrm {TV}_{h}(u \circ T_n). \end{aligned}$$

Using Corollary 3.7 then proves ii).

Again using Corollary 3.7 and the fact that \(h \leqq \varepsilon \) we conclude that

$$\begin{aligned} \sigma _\eta \mathrm {TV}\left( \Lambda _a (u \circ T_n) \right)&\leqq (1+ C(\varepsilon ^2 + a))\left( 1- \frac{2{\delta }}{\varepsilon } \right) ^{-(m+1)}\\&\quad \times \left( 1 - C(\theta +{\delta })\right) ^{-1}\mathrm {GTV}_{n, \varepsilon }(u) \\&\quad + C\left( \frac{\varepsilon }{a^2}+a \right) \Vert u \Vert _{\mathrm {L}^{\infty }(\nu _n)}. \end{aligned}$$

This expression can be simplified using the relative smallness assumptions on \(\varepsilon ,\theta ,\delta \), and in particular we can write

$$\begin{aligned}&\sigma _\eta \mathrm {TV}(\Lambda _a (u \circ T_n)) \\&\quad \leqq (1+ C(\varepsilon ^2 +a + \frac{{\delta }}{\varepsilon } + \theta +{\delta } ) ) \mathrm {GTV}_{n,\varepsilon }(u) + C\left( \frac{\varepsilon }{a^2}+a \right) \Vert u \Vert _{\mathrm {L}^{\infty }(\nu _n)}. \end{aligned}$$

This is precisely iii). \(\quad \square \)

Proof of the lower bound in Theorem 2.2

We are now ready to obtain a lower bound for \({\mathcal {C}}_{n,\varepsilon }\). To achieve this let \(A_n^*\) be a solution to the graph Cheeger cut problem, so that in particular

$$\begin{aligned} {\mathcal {C}}_{n,\varepsilon }= \frac{\mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{A_n^*}) }{\min \{ \nu _n(A_n^*), \nu _n(A_n^{*c}) \}} , \end{aligned}$$

and let \(\mathbbm {1}_{A}:=\mathbbm {1}_{A_n^*}\circ T_n\).

As a first step we obtain an a-priori lower bound on \(\min \{\nu _n(A_n^*), \nu _n(A_n^{*c})\}\). First of all notice that \({\widetilde{\nu }}_n(A)=\nu _n(A_n^*)\) as well as \({\widetilde{\nu }}_n(A^c)= \nu _n (A_n^{*c})\), where \({\widetilde{\nu }}_n\) is the measure in Proposition 4.4. This is simply because \(T_n\) is a transport map between \({\widetilde{\nu }}_n\) and \(\nu _n\). On the other hand, notice that by (4.6) and the smallness assumptions on \({\delta }\) and \(\theta \) it follows that

$$\begin{aligned} \frac{1}{2}\min \{ \nu (A) , \nu (A^c)\}\leqq \min \{ {\widetilde{\nu }}_n(A) , {\widetilde{\nu }}_n(A^c) \} \leqq (1+C(\delta +\theta ))\min \{ \nu (A) , \nu (A^c)\}. \end{aligned}$$

Now, as in the first part of the proof of Proposition 4.5 (with the choice \(\delta =\frac{3}{4}\)) we have

$$\begin{aligned} C\mathrm {GTV}_{n, \varepsilon }(\mathbbm {1}_{A_n^*}) \geqq \mathrm {TV}_{\varepsilon /2}(\mathbbm {1}_A), \end{aligned}$$

for a constant C that only depends on dimension. By the upper bound estimates we know that \({\mathcal {C}}_{n,\varepsilon /2} \leqq \sigma _\eta {\mathcal {C}}_{\mathcal {M}}+ C=: \frac{C_0}{2}\) and so we may use Lemma 3.8 to conclude that

$$\begin{aligned} \frac{\beta _0}{2} \leqq \frac{1}{2} \min \{\nu (A), \nu (A^c)\} \leqq \min \{ {\nu }_n(A_n^*) ,{\nu }_n(A_n^{*c})\}, \end{aligned}$$

for some fixed constant \(\beta _0>0\).

Let us now observe that for all \(c \in [0,1]\), by Proposition 4.5(ii)

$$\begin{aligned}&\left| \Vert {\mathcal {I}}_a \mathbbm {1}_{A_n^*} -c \Vert _{\mathrm {L}^{1}({\mathcal {M}})} - \Vert \mathbbm {1}_{A} -c \Vert _{\mathrm {L}^{1}({\mathcal {M}})} \right| \\&\quad \leqq \Vert {\mathcal {I}}_a \mathbbm {1}_{A_n^*} - \mathbbm {1}_{A_n^*}\circ T_n \Vert _{\mathrm {L}^{1}({\mathcal {M}})} \leqq C a \mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{A_n^*}), \end{aligned}$$

for a fixed constant C. From this uniform estimate it follows that

$$\begin{aligned}&\left| \min _{c \in [0,1]}\Vert {\mathcal {I}}_a \mathbbm {1}_{A_n^*} -c \Vert _{\mathrm {L}^{1}({\mathcal {M}})} - \min _{c\in [0,1]}\Vert \mathbbm {1}_{A} -c \Vert _{\mathrm {L}^{1}({\mathcal {M}})} \right| \\&\quad \leqq \Vert {\mathcal {I}}_a \mathbbm {1}_{A_n^*} - \mathbbm {1}_{A_n^*} \Vert _{\mathrm {L}^{1}({\mathcal {M}})} \leqq C a \mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{A_n^*}). \end{aligned}$$

Using the above inequalities and iii) in Proposition 4.5 we conclude that

$$\begin{aligned}&\frac{\sigma _\eta \mathrm {TV}({\mathcal {I}}_a \mathbbm {1}_{A_n^*})}{ \Vert {\mathcal {I}}_a \mathbbm {1}_{A_n^*}- m_1({\mathcal {I}}_a \mathbbm {1}_{A_n^*})\Vert _{\mathrm {L}^{1}({\mathcal {M}})}}\\&\quad \leqq \frac{\left( 1+C\left( \varepsilon ^2+a+\frac{\delta }{\varepsilon }+\theta \right) \right) \mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{A_n^*}) + C\left( \frac{\varepsilon }{a^2}+a\right) }{\min _{c\in [0,1]} \Vert \mathbbm {1}_A -c\Vert _{\mathrm {L}^{1}({\mathcal {M}})}- Ca\mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{A_n^*})} \\&\quad = \frac{\left( 1+C\left( \varepsilon ^2+a+\frac{\delta }{\varepsilon }+\theta \right) \right) \mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{A_n^*}) + C\left( \frac{\varepsilon }{a^2}+a\right) }{\min \{\nu (A),\nu (A^c)\}- Ca\mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{A_n^*})} \\&\quad \leqq \frac{\left( 1+C\left( \varepsilon ^2+a+\frac{\delta }{\varepsilon }+\theta \right) \right) \mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{A_n^*}) + C\left( \frac{\varepsilon }{a^2}+a\right) }{\left( 1-C(\delta +\theta )\right) \min \{\nu _n(A_n^*),\nu _n(A_n^{*c})\}- Ca\mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{A_n^*})} \end{aligned}$$

We divide each of the terms in the numerator and denominator of the right hand side of the above expression by \(\min \{\nu _n(A_n^*), \nu _n(A_n^{*c}) \}\) and use the a-priori lower bound on this term to obtain

$$\begin{aligned} \frac{\sigma _\eta \mathrm {TV}({\mathcal {I}}_a \mathbbm {1}_{A_n^*})}{ \Vert {\mathcal {I}}_a \mathbbm {1}_{A_n^*}- m_1({\mathcal {I}}_a \mathbbm {1}_{A_n^*})\Vert _{\mathrm {L}^{1}({\mathcal {M}})}}\leqq \frac{(1+ C( \varepsilon ^2 +a + \frac{{\delta }}{\varepsilon } + \theta )) {\mathcal {C}}_{n,\varepsilon } + C(\frac{\varepsilon }{a^2} + a) }{1- Ca {\mathcal {C}}_{n,\varepsilon } - C(\delta +\theta )}. \end{aligned}$$

Then, we use the upper bound on \({\mathcal {C}}_{n,\varepsilon }\) that we found in Section 4.1 to conclude that:

$$\begin{aligned} \frac{\sigma _\eta \mathrm {TV}({\mathcal {I}}_a \mathbbm {1}_{A_n^*})}{ \Vert {\mathcal {I}}_a \mathbbm {1}_{A_n^*}- m_1({\mathcal {I}}_a \mathbbm {1}_{A_n^*})\Vert _{\mathrm {L}^{1}({\mathcal {M}})}}\leqq & {} {\mathcal {C}}_{n,\varepsilon } + C {\mathcal {C}}_{\mathcal {M}}\left( \varepsilon ^2 +a + \frac{{\delta }}{\varepsilon } + \theta \right) \nonumber \\&+ C \left( \frac{\varepsilon }{a^2} +a + \theta + \delta +\zeta \right) . \end{aligned}$$
(4.8)

with probability at least \(1-Cne^{-cn\theta ^2\delta ^m}-Ce^{-cn\zeta \min \{\varepsilon ^{\frac{m+1}{2}},\varepsilon \zeta \}}\). Thanks to Lemma 3.9, we know that the left hand side is bounded below by \(\sigma _\eta {\mathcal {C}}_{\mathcal {M}}\). Choosing \(a=\root 3 \of {\varepsilon }\) we have

$$\begin{aligned} \sigma _\eta {\mathcal {C}}_{\mathcal {M}}\leqq & {} \frac{\sigma _\eta \mathrm {TV}({\mathcal {I}}_a \mathbbm {1}_{A_n^*})}{ \Vert {\mathcal {I}}_a \mathbbm {1}_{A_n^*}- m_1({\mathcal {I}}_a \mathbbm {1}_{A_n^*})\Vert _{\mathrm {L}^{1}({\mathcal {M}})}} \nonumber \\\leqq & {} {\mathcal {C}}_{n,\varepsilon } + C({\mathcal {C}}_{\mathcal {M}}+1)\left( \root 3 \of {\varepsilon } +\frac{\delta }{\varepsilon } + \theta + \zeta \right) \end{aligned}$$
(4.9)

as required. \(\quad \square \)

4.3 Proof of Theorem 2.15

Having established convergence rates for Cheeger constants in Theorem 2.2, we now turn to proving convergence estimates for the Cheeger cuts \(A_n^*\) towards continuum Cheeger sets \(A^*\). The first task will be to construct subsets of \({\mathcal {M}}\) which have the correct volume and for which we can control the perimeter. This requires two steps: first, constructing a subset \({\tilde{A}}\) of \({\mathcal {M}}\) with smooth boundary which approximates \({\mathcal {I}}_a \mathbbm {1}_{A_n^*}\) (which is the continuum object on which we can control the TV norm), and second, constructing a set \({\hat{A}}\) which adjusts the volume of \({\tilde{A}}\). These two steps are necessary because the stability results that are available in this context, that is the context of Proposition 2.13, are concerned with mass-constrained perimeter minimizers. The next lemma addresses this second step.

Lemma 4.6

Suppose that \({\mathcal {M}}\) satisfies Assumptions 2.1. Then for any \(B>0\) and \(\delta \in (0,1)\) there exists a \(C_{\delta ,B}>0\) so that for any \({\tilde{A}} \subset {\mathcal {M}}\) with \({\mathcal {P}}({\tilde{A}}) \leqq B\) and \(\delta \leqq \mathrm {vol}_{\mathcal {M}}({\tilde{A}}) \leqq 1-\delta \), and any \(\vartheta \) satisfying \(\delta< \mathrm {vol}_{\mathcal {M}}({\tilde{A}}) + \vartheta < 1-\delta \), there exists a set \({\hat{A}}\) satisfying

$$\begin{aligned} \mathrm {vol}_{\mathcal {M}}({\hat{A}})&= \mathrm {vol}_{\mathcal {M}}({\tilde{A}}) + \vartheta , \qquad \mathrm {vol}_{\mathcal {M}}({\hat{A}} \Delta {\tilde{A}}) = |\vartheta |, \\ {\mathcal {P}}({\hat{A}})&\leqq {\mathcal {P}}({\tilde{A}}) + C_{\delta , B} |\vartheta | ^{\frac{m-1}{m}}, \end{aligned}$$

where the constant \(C_{\delta , B}\) only depends upon the values of B and \(\delta \) (and not, for example, on \(\tilde{A}\)).

Proof

We restrict our attention to \(\vartheta >0\), as the other case is analogous by considering \({\tilde{A}}^c\). We also notice that, by considering \({\tilde{A}} \cup B_{\mathcal {M}}(x,r)\) for appropriate r that the conclusion is immediate for \(\vartheta \) bounded away from zero. Hence we only need to focus our attention on \(\vartheta \) in a neighborhood of zero.

To begin, we claim that for any choice of \(\delta ,B\) and for any \(\kappa \in (0,1)\) there exists an \(R_0\) so that for any set E satisfying \({\mathcal {P}}(E) \leqq B\) and \(\delta \leqq \mathrm {vol}_{\mathcal {M}}(E) \leqq 1-\delta \) there exists a point x such that \(\mathrm {vol}_{\mathcal {M}}(E \cap B_{\mathcal {M}}(x,r)) \leqq \kappa \mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r))\) for all \(r \leqq R_0\). Suppose that the claim were false. Then we would have a sequence of sets \(E_n\), which satisfy the volume and perimeter bounds, \(\delta \leqq \mathrm {vol}_{\mathcal {M}}(E_n) \leqq 1-\delta \) and \({\mathcal {P}}(E_n)\leqq B\), and which satisfy

$$\begin{aligned} \liminf _n \inf _{x \in {\mathcal {M}}} \sup _{r \leqq 1/n} \frac{\mathrm {vol}_{\mathcal {M}}(E_n \cap B_{\mathcal {M}}(x,r))}{\mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r))}\geqq \kappa . \end{aligned}$$
(4.10)

By BV compactness, we may assume that the \(E_n\) converge to some set \(E^*\) in \(\mathrm {L}^{1}({\mathcal {M}})\), and that \(E^*\) satisfies the same volume and perimeter bounds. By the Lebesgue differentiation theorem

$$\begin{aligned} \frac{\mathrm {vol}_{\mathcal {M}}(E^*\cap B_{\mathcal {M}}(x,r))}{\mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r))} \rightarrow \mathbbm {1}_{E^*}(x) \qquad \text {as } r\rightarrow 0 \end{aligned}$$

for almost every \(x\in {\mathcal {M}}\). By Egorov’s theorem we can find a subset \({\mathcal {M}}^\prime \subseteq {\mathcal {M}}\) such that the convergence is uniform over \({\mathcal {M}}^\prime \) and where \(\mathrm {vol}_{\mathcal {M}}({\mathcal {M}}^\prime )\geqq 1-\frac{\delta }{2}\). Hence, there exists \(\eta >0\) such that

$$\begin{aligned} \sup _{r<\eta } \frac{\mathrm {vol}_{\mathcal {M}}(E^*\cap B_{\mathcal {M}}(x,r))}{\mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r))} < \frac{\kappa }{3} \end{aligned}$$

for all \(x\in {\mathcal {M}}^\prime \setminus E^*\). Note that \(\mathrm {vol}_{\mathcal {M}}({\mathcal {M}}^\prime \setminus E^*) \geqq \frac{\delta }{2}\) and therefore

$$\begin{aligned} \mathrm {vol}_{\mathcal {M}}\left( \left\{ x : \sup _{r< \eta }\frac{\mathrm {vol}_{\mathcal {M}}(E^* \cap B_{\mathcal {M}}(x,r))}{\mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r))} < \frac{\kappa }{3} \right\} \right) > \frac{\delta }{2}. \end{aligned}$$
(4.11)

Now, by the weak maximal function inequality, we have that, for all \(\alpha > 0\)

$$\begin{aligned}&\mathrm {vol}_{\mathcal {M}}\left( \left\{ x: \sup _{r < \eta } \frac{1}{\mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r))} \int _{B_{\mathcal {M}}(x,r)} |\mathbbm {1}_{E^*} - \mathbbm {1}_{E_n}| \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}>\alpha \right\} \right) \\&\quad \leqq \frac{C_{\mathrm {max}}}{\alpha } \Vert \mathbbm {1}_{E^*}-\mathbbm {1}_{E_n}\Vert _{\mathrm {L}^{1}}; \end{aligned}$$

notice that the weak maximal function inequality is a direct consequence of Vitali covering lemma, which holds for arbitrary separable metric spaces; the constant \(C_{\mathrm {max}}\) can be written in terms of m (the intrinsic dimension of the manifold) and the constant \(C_2\) appearing in (4.12) below. By picking \(\alpha = \frac{\kappa }{3}\) and then picking n large enough that \(\Vert \mathbbm {1}_{E^*}-\mathbbm {1}_{E_n}\Vert _{\mathrm {L}^{1}} < \frac{\delta \kappa }{12 C_{\mathrm {max}}}\), we then have that

$$\begin{aligned}&\mathrm {vol}_{\mathcal {M}}\left( \left\{ x: \sup _{r < \eta } \frac{1}{\mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r))} \int _{B_{\mathcal {M}}(x,r)} |\mathbbm {1}_{E^*} - \mathbbm {1}_{E_n}| \, \mathrm {d}\mathrm {vol}_{\mathcal {M}}\leqq \frac{\kappa }{3} \right\} \right) \\&\quad \geqq 1-\frac{\delta }{4}. \end{aligned}$$

This inequality, combined with the triangle inequality and Equation (4.11) then necessarily contradicts equation (4.10), which proves our claim.

Now, let \(\kappa = 1/2\) and apply the claim to conclude that for the set \({\tilde{A}}\) we can find a point x so that for all \(r < R_0\) we have that \(\mathrm {vol}_{\mathcal {M}}({\tilde{A}}\cap B_{\mathcal {M}}(x,r)) \leqq 1/2 \mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r))\). Using a classical volume estimate from Riemannian geometry (see e.g. [49]), we may deduce that for all \(x\in {\mathcal {M}}\) and all small enough r we have:

$$\begin{aligned} C_1 r^m \leqq \mathrm {vol}_{\mathcal {M}}( B_{\mathcal {M}}(x,r)) \leqq C_2r^m. \end{aligned}$$
(4.12)

We then select \(r^*\) so that \(\mathrm {vol}_{\mathcal {M}}( B_{\mathcal {M}}(x,r^*) \setminus {\tilde{A}}) = \vartheta \), and define \({\hat{A}} := {\tilde{A}} \cup B_{\mathcal {M}}(x,r^*)\). We note that by the choice of the point x, we have that

$$\begin{aligned} \vartheta= & {} \mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r^*)\setminus \tilde{A}) = \mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r^*)) - \mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r^*)\cap \tilde{A}) \\\geqq & {} \frac{1}{2} \mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r^*)) \geqq \frac{C_1 r^{*m}}{2}, \end{aligned}$$

and clearly

$$\begin{aligned} \vartheta = \mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r^*)\setminus \tilde{A}) \leqq \mathrm {vol}_{\mathcal {M}}(B_{\mathcal {M}}(x,r^*)) \leqq C_2 r^{*m}. \end{aligned}$$

Thus,

$$\begin{aligned} \tilde{C}_1 \vartheta \leqq r^{*m} \leqq \tilde{C}_2 \vartheta , \end{aligned}$$

with constants that depend only upon \(B,\delta \), and not upon the particular set \({\tilde{A}}\). Moreover, since for all \(x \in {\mathcal {M}}\) and all small enough r we have

$$\begin{aligned} {\mathcal {P}}(B_{\mathcal {M}}(x,r)) \leqq Cr^{m-1}, \end{aligned}$$
(4.13)

by [63] or [64], then the perimeter of \(\hat{A}\) can be bounded by

$$\begin{aligned} {\mathcal {P}}(\hat{A})&\leqq {\mathcal {P}}({\widetilde{A}}) + {\mathcal {P}}(B_{{\mathcal {M}}}(x,r^*)) \\&\leqq {\mathcal {P}}({\widetilde{A}}) + C(r^*)^{m-1}\\&\leqq {\mathcal {P}}({\widetilde{A}}) + C\vartheta ^{\frac{m-1}{m}}. \end{aligned}$$

\(\square \)

Proof of Theorem 2.15

We use the notation \(\mathrm {Bal}(A)\) as introduced in the proof of the upper bound in Theorem 2.2.

First notice that for a given discrete minimizer \(A_n^*\) we can build a set \({\tilde{A}} \subseteq {\mathcal {M}}\) with the following properties (for \(0< z < 1/2\)):

  1. i)

    \({\mathcal {P}}({\tilde{A}}) \leqq \frac{\mathrm {TV}({\mathcal {I}}_a \mathbbm {1}_{A_n^*})}{(1-2z)}\).

  2. ii)

    \(\Vert \mathbbm {1}_{A_n^*} \circ T_n - \mathbbm {1}_{{\tilde{A}}}\Vert _{\mathrm {L}^{1}({\mathcal {M}})} \leqq \frac{Ca}{z}\).

To see this, notice that by the coarea formula, and the fact that \(0 \leqq {\mathcal {I}}_a \mathbbm {1}_{A_n^*} \leqq 1\), we have that

$$\begin{aligned} \mathrm {TV}({\mathcal {I}}_a \mathbbm {1}_{A_n^*}) = \int _0^1 {\mathcal {P}}(\{{\mathcal {I}}_a \mathbbm {1}_{A_n^*} \leqq s\}) \,\mathrm {d}s \geqq \int _z^{1-z} {\mathcal {P}}(\{{\mathcal {I}}_a \mathbbm {1}_{A_n^*\leqq s}\}) \, \mathrm {d}s. \end{aligned}$$

In turn there must exist a \( {\widetilde{z}} \in (z, 1-z)\) so that the set \({\tilde{A}} = \{{\mathcal {I}}_a \mathbbm {1}_{A_n^*} > {\widetilde{z}} \}\) satisfies \({\mathcal {P}}({\tilde{A}}) \leqq \frac{\mathrm {TV}({\mathcal {I}}_a \mathbbm {1}_{A_n^*})}{(1-2z)}\). To obtain the second inequality we use Proposition 4.5, \({\mathcal {I}}_a\mathbbm {1}_{A_n^*}\geqq \tilde{z}\mathbbm {1}_{\tilde{A}}\) and \(1-{\mathcal {I}}_a\mathbbm {1}_{A_n^*} \geqq (1-\tilde{z})\mathbbm {1}_{{\widetilde{A}}^c}\), to estimate

$$\begin{aligned} Ca \mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{A_n^*})&\geqq \Vert \mathbbm {1}_{A_n^*} \circ T_n - {\mathcal {I}}_a \mathbbm {1}_{A_n^*}\Vert _{\mathrm {L}^{1}({\mathcal {M}})} \\&= \int _{{\mathcal {M}}} \left| \mathbbm {1}_{A_n^*}(T_n(x)) - {\mathcal {I}}_a\mathbbm {1}_{A_n^*}(x)\right| \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x) \\&= \int _{T_n^{-1}(A_n^*)} \left( 1-{\mathcal {I}}_a\mathbbm {1}_{A_n^*}(x)\right) \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x) \\&\quad + \int _{{\mathcal {M}}\setminus T_n^{-1}(A_n^*)} {\mathcal {I}}_a\mathbbm {1}_{A_n^*}(x) \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x) \\&\geqq (1-\tilde{z}) \int _{T_n^{-1}(A_n^*)} \mathbbm {1}_{{\widetilde{A}}^c}(x) \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x)\\&\quad + \tilde{z} \int _{{\mathcal {M}}\setminus T_n^{-1}(A_n^*)} \mathbbm {1}_{{\widetilde{A}}}(x) \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x) \\&\geqq \min ({\widetilde{z}},1-{\widetilde{z}}) \Vert \mathbbm {1}_{A_n^*} \circ T_n - \mathbbm {1}_{{\tilde{A}}}\Vert _{\mathrm {L}^{1}({\mathcal {M}})} \\&\geqq z \Vert \mathbbm {1}_{A_n^*} \circ T_n - \mathbbm {1}_{{\tilde{A}}}\Vert _{\mathrm {L}^{1}({\mathcal {M}})}, \end{aligned}$$

and notice that the upper bound for \({\mathcal {C}}_{n,\varepsilon }\) can be used to bound the left hand side of the above expression by a constant times a.

We will use the bounds

$$\begin{aligned}&\left\| {\mathcal {I}}_a\mathbbm {1}_{A_n^*} - m_1({\mathcal {I}}_a\mathbbm {1}_{A_n^*})\right\| _{\mathrm {L}^{1}({\mathcal {M}})} - \mathrm {Bal}(\tilde{A}) \\&\quad = \max \left\{ \min _{c\in [0,1]} \left\| {\mathcal {I}}_a\mathbbm {1}_{A_n^*} - c\right\| _{\mathrm {L}^{1}({\mathcal {M}})} - \underbrace{\mathrm {vol}_{{\mathcal {M}}}({\widetilde{A}})}_{=\Vert \mathbbm {1}_{{\widetilde{A}}}\Vert _{\mathrm {L}^{1}({\mathcal {M}})}}, \min _{d\in [0,1]} \left\| {\mathcal {I}}_a\mathbbm {1}_{A_n^*} - d\right\| _{\mathrm {L}^{1}({\mathcal {M}})} \right. \\&\left. \qquad - \underbrace{\mathrm {vol}_{{\mathcal {M}}}({\widetilde{A}}^c)}_{=\Vert 1-\mathbbm {1}_{{\widetilde{A}}}\Vert _{\mathrm {L}^{1}({\mathcal {M}})}} \right\} \\&\quad \leqq \left\| {\mathcal {I}}_a\mathbbm {1}_{A_n^*} - \mathbbm {1}_{\tilde{A}} \right\| _{\mathrm {L}^{1}({\mathcal {M}})} \\&\quad \leqq \left\| {\mathcal {I}}_a\mathbbm {1}_{A_n^*} - \mathbbm {1}_{A_n^*}\circ T_n \right\| _{\mathrm {L}^{1}({\mathcal {M}})} + \left\| \mathbbm {1}_{A_n^*}\circ T_n - \mathbbm {1}_{\tilde{A}} \right\| _{\mathrm {L}^{1}({\mathcal {M}})} \\&\quad \leqq \frac{Ca}{z} \end{aligned}$$

and

$$\begin{aligned} \mathrm {Bal}(\tilde{A})&= \min \left\{ \mathrm {vol}_{{\mathcal {M}}}(\tilde{A}),\mathrm {vol}_{{\mathcal {M}}}(\tilde{A}^c)\right\} \\&= \min \left\{ \Vert \mathbbm {1}_{\tilde{A}}\Vert _{\mathrm {L}^{1}({\mathcal {M}})}, \Vert 1-\mathbbm {1}_{\tilde{A}}\Vert _{\mathrm {L}^{1}({\mathcal {M}})} \right\} \\&\geqq \min \Bigg \{\Vert \mathbbm {1}_{A_n^*}\circ T_n\Vert _{\mathrm {L}^{1}({\mathcal {M}})} - \underbrace{\Vert \mathbbm {1}_{\tilde{A}}-\mathbbm {1}_{A_n^*}\circ T_n\Vert _{\mathrm {L}^{1}({\mathcal {M}})}}_{\leqq \frac{Ca}{z}\mathrm {GTV}_{n,\varepsilon }(\mathbbm {1}_{A_n^*})\leqq \frac{Ca}{z}}, \\&\Vert 1-\mathbbm {1}_{A_n^*}\circ T_n\Vert _{\mathrm {L}^{1}({\mathcal {M}})} - \Vert \mathbbm {1}_{\tilde{A}}-\mathbbm {1}_{A_n^*}\circ T_n\Vert _{\mathrm {L}^{1}({\mathcal {M}})} \Bigg \} \\&\geqq \min \left\{ \Vert \mathbbm {1}_{A_n^*}\circ T_n\Vert _{\mathrm {L}^{1}({\mathcal {M}})}, \Vert 1-\mathbbm {1}_{A_n^*}\circ T_n\Vert _{\mathrm {L}^{1}({\mathcal {M}})} \right\} - \frac{Ca}{z} \\&= \min \left\{ \int _{{\mathcal {M}}} \mathbbm {1}_{A_n^*}(T_n(x)) \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x), \int _{{\mathcal {M}}} \mathbbm {1}_{(A_n^*)^c}(T_n(x)) \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x) \right\} - \frac{Ca}{z} \\&\geqq c\min \left\{ \int _{{\mathcal {M}}} \mathbbm {1}_{A_n^*}(T_n(x)) {\tilde{\rho }}_n(x) \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x), \right. \\&\left. \quad \int _{{\mathcal {M}}} \mathbbm {1}_{(A_n^*)^c}(T_n(x)) {\tilde{\rho }}_n(x) \, \mathrm {d}\mathrm {vol}_{{\mathcal {M}}}(x) \right\} - \frac{Ca}{z} \\&= c\min \left\{ \nu _n(A_n^*), \nu _n((A_n^*)^c) \right\} \\&\quad - \frac{Ca}{z}. \end{aligned}$$

Assuming that \(\frac{a}{z}\) is sufficiently small (which by later choices is equivalent to \(\varepsilon \) being sufficiently small), and the a-priori lower bound on \(\nu _n(A_n^*)\) derived in the proof of the lower bound in Theorem 2.2 we can assume that \(\mathrm {Bal}(\tilde{A})\geqq c\) for some constant \(c>0\) independent of all other parameters.

The perimeter estimate \({\mathcal {P}}({\tilde{A}}) \leqq \frac{\mathrm {TV}({\mathcal {I}}_a \mathbbm {1}_{A_n^*})}{(1-2z)}\) then implies, if we further restrict to \(z<\frac{1}{4}\), that

$$\begin{aligned}&\frac{\sigma _\eta \mathrm {TV}(\mathbbm {1}_{{\tilde{A}}})}{\mathrm {Bal}({\tilde{A}})} - \frac{\sigma _\eta \mathrm {TV}({\mathcal {I}}_a(\mathbbm {1}_{A_n^*}))}{ \Vert {\mathcal {I}}_a \mathbbm {1}_{A_n^*}- m_1({\mathcal {I}}_a \mathbbm {1}_{A_n^*})\Vert _{\mathrm {L}^{1}({\mathcal {M}})}} \\&\quad \leqq \underbrace{\frac{\sigma _\eta \mathrm {TV}({\mathcal {I}}_a(\mathbbm {1}_{A_n^*}))}{ \Vert {\mathcal {I}}_a \mathbbm {1}_{A_n^*}- m_1({\mathcal {I}}_a \mathbbm {1}_{A_n^*})\Vert _{\mathrm {L}^{1}({\mathcal {M}})}}}_{\leqq C \text { by } (4.9)}\\&\qquad \left( \frac{(1 + 4z)\Vert \Lambda _a \mathbbm {1}_A- m_1(\Lambda _a\mathbbm {1}_{A})\Vert _{\mathrm {L}^{1}({\mathcal {M}})} - \mathrm {Bal}({\tilde{A}})}{\mathrm {Bal}({\tilde{A}})} \right) \\&\quad \leqq C\left( \frac{a}{z} + z \right) \end{aligned}$$

where A is the set satisfying \(\mathbbm {1}_{A_n^*}\circ T_n = \mathbbm {1}_A\). Using equations (4.4) and (4.9), and choosing \(a=\root 3 \of {\varepsilon }\), gives

$$\begin{aligned} 0&\leqq \frac{\sigma _\eta \mathrm {TV}(\mathbbm {1}_{{\tilde{A}}})}{\mathrm {Bal}({\tilde{A}})} - {\mathcal {C}}_{n,\varepsilon } + {\mathcal {C}}_{n,\varepsilon } - \sigma _\eta {\mathcal {C}}_{\mathcal {M}}\\&\leqq \frac{\sigma _\eta \mathrm {TV}(\mathbbm {1}_{{\tilde{A}}})}{\mathrm {Bal}({\tilde{A}})} - \frac{\sigma _\eta \mathrm {TV}({\mathcal {I}}_a(\mathbbm {1}_{A_n^*}))}{ \Vert {\mathcal {I}}_a \mathbbm {1}_{A_n^*}- m_1({\mathcal {I}}_a \mathbbm {1}_{A_n^*})\Vert _{L^1({\mathcal {M}})}} + C\left( \root 3 \of {\varepsilon } + \frac{\delta }{\varepsilon } + \theta + \zeta \right) \\&\leqq C\left( \root 3 \of {\varepsilon } + \frac{\delta }{\varepsilon } + \theta + \frac{\root 3 \of {\varepsilon }}{z} + z + \zeta \right) =: C \kappa \end{aligned}$$

By Assumption 2.14 and Proposition 2.13 there exists a continuum Cheeger \(A^*\) such that \(|\mathrm {vol}_{\mathcal {M}}({\tilde{A}}) - \mathrm {vol}_{\mathcal {M}}(A^*)| \leqq C \kappa ^{1/2}\). Moreover, there is a \(\delta _{\mathcal {M}}\) (independent of \(A^*\)) such that \(\delta _{\mathcal {M}}\leqq \mathrm {vol}_{\mathcal {M}}(A^*) \leqq 1-\delta _{\mathcal {M}}\), as it follows from well-known asymptotics [64] for the isoperimetric profile near zero and compactness of the perimeter functional, which imply that solutions to the continuum Cheeger problem can not have arbitrarily small volume; recall the discussion right after Remark 2.18.

Invoking Lemma 4.6 with \(\delta = \delta _{\mathcal {M}}/2\) and \(B = 2 \sigma _\eta {\mathcal {C}}_{\mathcal {M}}\), and assuming that \(\kappa \) is small enough (which can be guaranteed after setting \(z:= \varepsilon ^{1/6}\) and setting all the other parameters to be small enough), we can then find a set \({\hat{A}}\) so that \(\mathrm {vol}_{\mathcal {M}}({\hat{A}}) = \mathrm {vol}_{\mathcal {M}}(A^*)\), \({\mathcal {P}}({\hat{A}}) - {\mathcal {P}}(A^*) \leqq C \kappa ^{\frac{m-1}{2m}}\) and \(\mathrm {vol}_{\mathcal {M}}({\hat{A}} \Delta {\tilde{A}}) \leqq C\kappa ^{1/2}\). Via Proposition 2.13 we immediately obtain

$$\begin{aligned} \alpha ({\hat{A}}) \leqq C\kappa ^{\frac{m-1}{4m}}, \end{aligned}$$

as desired. Combining these estimates,

$$\begin{aligned} \Vert \mathbbm {1}_{A^*} - \mathbbm {1}_{A_n^*} \circ T_n \Vert _{\mathrm {L}^{1}({\mathcal {M}})}&\leqq \Vert \mathbbm {1}_{A_n^*} \circ T_n - \mathbbm {1}_{{\tilde{A}}} \Vert _{\mathrm {L}^{1}({\mathcal {M}})} + \mathrm {vol}_{\mathcal {M}}({\tilde{A}} ~\Delta ~ {\hat{A}}) + \alpha ({\hat{A}}) \\&\leqq C\left( \root 6 \of {\varepsilon } + \kappa ^{1/2} + \kappa ^{\frac{m-1}{4m}}\right) \\&\leqq C \kappa ^{\frac{m-1}{4m}}. \end{aligned}$$

This concludes the proof. \(\quad \square \)

Remark 4.7

To show the statement in Remark 2.18 first notice that since \(T_{n \sharp } {\widetilde{\nu }}_n =\nu _n \) then

$$\begin{aligned} \nu _n\left( A^* \Delta A_n^* \right) = \Vert \mathbbm {1}_{A_n^*}\circ T_n - \mathbbm {1}_{A^*}\circ T_n \Vert _{\mathrm {L}^{1}({\widetilde{\nu }}_n)} \end{aligned}$$

and in turn by the triangle inequality

$$\begin{aligned} \Vert \mathbbm {1}_{A_n^*}\circ T_n - \mathbbm {1}_{A^*}\circ T_n \Vert _{\mathrm {L}^{1}({\widetilde{\nu }}_n)} \leqq \Vert \mathbbm {1}_{A_n^*}\circ T_n - \mathbbm {1}_{A^*} \Vert _{\mathrm {L}^{1}({\widetilde{\nu }}_n)} + \Vert \mathbbm {1}_{A^*} - \mathbbm {1}_{A^*}\circ T_n \Vert _{\mathrm {L}^{1}({\widetilde{\nu }}_n)} \end{aligned}$$

The first term is the bound from Theorem 2.15 (enlarging the constant to account for the change of measure using Proposition 4.4). On the other hand the second term can be estimated by the volume of the tubular neighborhood of width \(2\delta \) around \(\partial A^*\) (where \(\delta \) appears in (4.5)) and thus, given the smoothness of \(\partial A^*\) by \(C {\mathcal {P}}(A^*) \delta \).

Remark 4.8

In the proof of Theorem 2.15 we have relied upon the tools for mass-constrained perimeter minimizers, and fixed the mass separately. This has caused various losses in the estimates. For example, we have lost a power of two twice, once when comparing the mass and once when comparing the mass-constrained minimizers. The \(\frac{m-1}{m}\) is also an artifact of needing to fix the mass in order to use the isoperimetric stability results. It seems very likely that the \(\kappa ^{\frac{m-1}{4m}}\) should in reality be a \(\kappa ^{1/2}\). Similarly, relying on stability results for sets, as opposed to the relaxed formulation in Lemma 3.9 necessitates the first step in the proof of Theorem 2.15, which introduces the parameter \(z=\root 6 \of {\varepsilon }\) into \(\kappa \), and accordingly decreases the power on \(\varepsilon \) in \(\kappa \). Pursuing such issues, by obtaining improved stability results for total variation minimization problems, is outside the scope of this work.