1 Introduction

Curvature is a fundamental concept in the study of geometric spaces. It is a local parameter whose behavior often controls global phenomena on the manifold. In particular, bounds on the Ricci curvature are known to imply an array of properties, including diameter bounds, control of the spectrum, and sub-Gaussian decay of the heat kernel. If the curvature of some space is upper-bounded by a negative value, then such space has a boundary at infinity and some other universal characteristics of (coarsely) hyperbolic spaces. Unfortunately, most notions of curvature are applicable only to smooth continuous spaces, such as Riemannian and pseudo-Riemannian manifolds. While there exist some combinatorial notions of curvature [6, 11], none has the same power as their smooth counterparts. We refer to [25] for a general overview of discrete curvatures. The focus of this paper is graph curvature.

In [27,28,29], Yann Ollivier introduced a definition of curvature for general metric spaces as a discretization of the well-known Ricci curvature. Since this definition is applicable to any metric space, it is applicable to graphs in particular. Even though relatively recent, it has already proven to be quite influential and fruitful. In analysis of networks, Ollivier–Ricci curvature has been used, for example, to identify communities [36], analyze cancer cells [33], asses the fragility of financial networks [34] and robustness of brain networks [10], and to embed networks for machine learning applications [12]. Ollivier–Ricci curvature has also been analyzed for several types of (random) graphs including Erdős–Rényi random graphs [23]. Some general bounds for this curvature have also been established based on different graph properties [4, 16, 23]. These and other applications of Ollivier–Ricci curvature have also stimulated general interest in graph curvature, leading to the introduction and studies of many other notions of graph curvature [8, 17, 24, 37].

An interesting aspect of Ollivier–Ricci curvature (or any other notion of discrete curvature) is that it creates a bridge between geometry and discrete structures. For example, discrete curvatures play an important role in the field of manifold learning where the discrete objects are data points lying on some manifold, and the task is to learn from the data the properties of the manifold [1].

A related task is that of graph embedding: given a graph, find its embedding in a smooth space such that graph distances between nodes are approximated by distances in the space. Curvature has proven to be important for finding the right space to embed the graph into [12].

In addition to these classical applications, geometry has also proven to be an important and powerful concept for designing latent-space models of random graphs whose properties—such as degree distributions, clustering, distance distributions—closely resemble those of real-world networks [5, 14, 19, 20]. These relations between geometry and network properties inevitably lead to the question whether characteristics of latent geometries of networks can be inferred from discrete properties of graphs that represent these networks. Since curvature is a fundamental characteristic of geometry, it is a natural first candidate for uncovering latent geometry in networks. Hence, a proper notion of graph curvature is needed, a notion that would be known to converge to the true curvature of the geometric space underlying the graph, if it exists.

Quantum gravity is yet another area where convergence of graph curvature is of interest. Here one wants to find a discrete geometry that converges in the continuum limit to the geometry of physical spacetime. To this end, Ollivier–Ricci curvature and its variations have been extensively investigated recently [7, 18, 40].

Despite the interest in Ollivier–Ricci and related curvatures of discrete and combinatorial spaces, the fundamental question of convergence remains largely open. That is, does there exist a discrete notion of curvature that converges in some limit to a traditional notion of curvature of smooth spaces.

There are some positive results in this direction. One is for the convergence of an angle-defect-based notion of curvature of smooth triangulations of Riemannian manifolds [6]. Another one is a manifold learning method designed for consistent estimation of Ricci curvature of a submanifold in Euclidean space based on a point cloud sprinkled uniformly onto the submanifold [1]. Perhaps the closest result to ours is the one in [2, 3] where a discrete version of the d’Alembertian operator is defined for causal sets in 2- and 4-dimensional Lorentzian manifolds. This discrete d’Alembertian is then shown to converge to the traditional d’Alembertian in the continuum limit. To the best of our knowledge, there currently exist no general convergence results for truly combinatorial objects in general and random graphs in Riemannian manifolds in particular.

In this paper we study the question of convergence of Ollivier–Ricci curvature of graphs. We consider random geometric graphs whose nodes are a Poisson process in a Riemannian manifold and whose edges are formed only between nodes that lie within a given distance threshold from each other in the manifold. We show that as the size of such graphs tends to infinity, their Ollivier–Ricci curvature recovers the Ricci curvature of the underlying manifold. To the best of our knowledge, this is the first result that relates a discrete notion of curvature of graphs to the continuum version of curvature of their underlying geometry.

The remainder of the paper is structured as follows. In the next Sect. 2 we introduce the basic notations and definitions needed to present our main results. We present these results in Sect. 3. That section ends with some general comments and outlook. We then provide a general overview of the proof strategy in the first half of Sect. 4. The second half of that section contains the proofs of the main results. The final Sect. 5 contains all the remaining details and proofs of intermediate results that are skipped in Sect. 4.

2 Notations and Definitions

2.1 Geometric Graphs

Given a metric space \(({\mathcal {X}},d)\), a countable node set \(X\subseteq {\mathcal {X}}\), and connection radius \(\varepsilon >0\), we define \(G(X,\varepsilon )\) as the graph whose nodes are all the elements in X. An edge between \(x,y\in X\) exists if and only if \(d(x,y)\le \varepsilon \). Since the nodes of G are points in the metric space, we will refer to them using x and y, instead of indices i and j, and write \(x\in G\) if x is a node of G.

We will also use \(G_{xy}\) to denote the indicator of an edge between x and y in G and define \(\mathcal {N}_x\) to be the neighborhood of node x, i.e.,

$$\begin{aligned} \mathcal {N}_x =\{y\in G: G_{xy}=1\}. \end{aligned}$$

Note that \(\mathcal {N}_x=X\cap \mathcal {B}\hspace{0.27771pt}(x;\varepsilon )\), where \(\mathcal {B}\hspace{0.27771pt}(x;\varepsilon )\) denotes the closed ball around \(x\in {\mathcal {X}}\) of radius \(\varepsilon \) with respect to the distance d, but excluding x.

2.2 Random Geometric Graphs

In this paper we consider graphs that are constructed by randomly placing points in the metric space \(\mathcal {X}\), according to a Poisson process. In order to analyze a notion of curvature on these graphs we need to impose some additional structure on \(\mathcal {X}\). More precisely, we will consider Riemannian manifolds as the spaces on which graphs are constructed. We briefly recall some notions of Riemannian geometry needed for the setup and refer the reader to [15, 30] for more details on the topic.

Formally, a Riemannian manifold is a pair \((\mathcal {M},g)\) where \(\mathcal {M}\) is a smooth manifold and for each \(x\in \mathcal {M}\), \(g_x\) is a smooth inner product on the tangent space \(T_x\mathcal {M}\) at x. This inner product induces a metric \(d_\mathcal {M}\), called the Riemannian metric. Since we are mainly interested in metric spaces, we denote a Riemannian manifold by the pair \((\mathcal {M},d_\mathcal {M})\).

Throughout the remainder of this paper we work with Riemannian manifolds that are smooth, connected, and compact. This allows us to integrate over points in de manifold and ensures that for any two points \(x,y\in \mathcal {M}\) there exists a shortest path (geodesic) in \(\mathcal {M}\) connecting x and y, whose length is \(d_\mathcal {M}(x,y)\). For any \(U\subseteq \mathcal {M}\) we will write \(\textrm{vol}_\mathcal {M}(U)=\int _U\textrm{d}\textrm{vol}_\mathcal {M}\) to denote the volume of U, where \(\textrm{vol}_\mathcal {M}\) is the Riemannian measure on \(\mathcal {M}\). With this setup we can define a random geometric graph on a Riemannian manifold in an analogous way to classic random geometric graph in Euclidean space.

Definition 2.1

Let \(({\mathcal {M}},d_\mathcal {M})\) be a smooth, connected, and compact N-dimensional Riemannian manifold. Fix \(\varepsilon >0\) and consider a Poisson process \(\mathcal {P}_n\) on \(\mathcal {M}\) with intensity measure \((n/{\textrm{vol}_\mathcal {M}(\mathcal {M})})\,\textrm{d}\textrm{vol}_\mathcal {M}\). Then we define the random geometric graph \(\mathbb {G}_n(\varepsilon ):=G(\mathcal {P}_n,\varepsilon )\).

Remark 1

(conditions on the manifold)    From a technical perspective, we only need the manifold to be smooth. This is because we will be working on shrinking neighborhoods of some fixed point \(x^*\in \mathcal {M}\). For a sufficiently small neighborhood U, we can always construct a volume form that is well defined on U and ensure that every two points in U are connected by a geodesic path. We could then fix a sufficiently small and compact neighborhood \({\mathcal {C}}\) of \(x^*\) and then consider a Poisson process on \({\mathcal {C}}\) with intensity measure \((n/{\textrm{vol}_\mathcal {M}(\mathcal {C})})\,\textrm{d}\textrm{vol}_\mathcal {M}\).

The only difference with the global setup is that we would need to frame everything in terms of sufficiently small neighborhoods and deal with possible boundary issues in our proofs. In the end, since curvature is a local property, these issues would vanish. Still, framing all results in this local setting would add additional technical layers to the proofs. For convenience, we therefore choose to present everything in terms of global and nice requirements on the manifold.

We shall next introduce a notion of curvature on random geometric graphs. Since curvature is inherently a local property, it makes sense to define curvature on graphs as a property of an edge. For our analysis we will take a more general approach and consider curvature between two fixed nodes in the graph that are connected by a path. We then analyze its behavior as the size of the graph tends to infinity.

For any \(x\in \mathcal {M}\), we write \(\mathbb {G}_n(x,\varepsilon ):=G(X_n,\varepsilon )\), where \(X_n=\{x\}\cup \mathcal {P}_n\). That is, \(\mathbb {G}_n(x,\varepsilon )\) is a random geometric graph with x added to the node set. Similarly, for any pair of points \((x,y)\in \mathcal {M}\) we write \(\mathbb {G}_n(x,y,\varepsilon ):=G(X_n^\prime ,\varepsilon )\), with \(X_n^\prime =\{x,y\}\cup \mathcal {P}_n\). We refer to both \(\mathbb {G}_n(x,\varepsilon )\) and \(\mathbb {G}_n(x,y,\varepsilon )\) as rooted random graphs.

2.3 Ollivier–Ricci Curvature on Graphs

The definition of Ollivier–Ricci curvature uses the Wasserstein metric (transportation distance), which we shall introduce next. Recall that a coupling between two probability measures \(\mu _1\) and \(\mu _2\) is a joint probability measure \(\mu \) whose marginals are \(\mu _1\) and \(\mu _2\).

Definition 2.2

Let \(\mu _1\) and \(\mu _2\) be probability measures on a metric space \((\mathcal {X},d)\) and let \(\varGamma (\mu _1,\mu _2)\) denote the set of all couplings \(\mu \) between \(\mu _1\) and \(\mu _2\). Then the Wasserstein metric (Kantorovich–Rubinstein distance of order one) is given by

$$\begin{aligned} W_1(\mu _1,\mu _2)=\inf _{\mu \in \varGamma (\mu _1,\mu _2)}\int _{\mathcal {X}\times \mathcal {X}}\!d(x,y)\,d\mu (x,y) \end{aligned}$$
(1)

Remark 2

The following property of the Wasserstein distance will prove useful for us. Suppose two probability measures have support on \(U\subset \mathcal {X}\) and there exists a metric \({\tilde{d}}\) on \(\mathcal {X}\) that coincides with d on U. Then the Wasserstein metric \({\tilde{W}}_1(\mu _1,\mu _2)\) associated with metric \(\tilde{d}\) equals the original Wasserstein metric associated with d.

Let G be a graph. The definition of Ollivier–Ricci curvature on graphs relies on two ingredients, a metric on G and a family of probability measures, indexed by the vertices.

Definition 2.3

An Ollivier-triple \({\mathcal {G}}\) is a triple \((G,d_G,{\varvec{m}})\), where G is a graph, \(d_G\) a metric on G and \({\varvec{m}}=\{m_x\}_{x\in G}\) a family of probability measures on G for each node \(x\in G\).

Given an Ollivier-triple \({\mathcal {G}}=(G,d_G,{\varvec{m}})\), we write \(W_1^{\mathcal {G}}\) for the Wasserstein metric with respect to the metric space \((G,d_G)\). We then define for any pair of nodes \(x,y\in G\) the associated Ollivier curvature as

$$\begin{aligned} \kappa (x,y;{\mathcal {G}})={\left\{ \begin{array}{ll}\displaystyle 1 - \frac{W_1^{{\mathcal {G}}}(m_x,m_y)}{d_G(x,y)}&{}\text {if }d_G(x,y)<\infty ,\\ 0&{}\text {otherwise.} \end{array}\right. } \end{aligned}$$
(2)

Remark 3

  1. 1.

    The concept of Ollivier–Ricci curvature is not restricted to graphs and can be defined on any metric space where we have a sequence of probability measures. A specific example of these are Riemannian manifolds \((\mathcal {M},d_\mathcal {M})\).

  2. 2.

    Note that a sequence \(\{m_x\}_{x\in G}\) of probability measures on G gives rise to a random walk on the graph. The transition probabilities are given by \(\mathbb {P}\hspace{0.33325pt}(x_{t+1}\in A\,|\,x_t=x)=m_x(A)\). So an Ollivier-triple consists of a graph, a metric and a random walk on the graph. However, since we will only use concepts related to the measures \(m_x\) we refrain from using any random walk terminology.

  3. 3.

    When \(d_G\) is the shortest path metric on G and \({\varvec{m}}\) corresponds to the uniform probability measures on the neighborhoods \(\mathcal {N}_x\), i.e., \(m_x(y)=G_{xy}/|\mathcal {N}_x|\), we are in the classic setting for Ollivier–Ricci curvature on graphs [16, 26, 31]. In this work, however, we shall use different combinations of metrics on graphs and probability measures to obtain our results. This is why we define Ollivier–Ricci curvature on graphs in a more general way.

  4. 4.

    The reason why we set \(\kappa \hspace{0.30548pt}(x,y;{\mathcal {G}})=0\) if the nodes are not in the same connected component is because we work with random graphs and this way we ensure that \(\kappa \hspace{0.30548pt}(x,y;{\mathcal {G}})\) is a real-valued random variable.

2.4 Curvature in Riemannian Manifolds

Our main results relate the standard Ricci curvature of a manifold to the Ollivier–Ricci curvature of the random geometric graph constructed on this manifold. For this, we briefly recall the definition of the Ricci curvature, see [15, 30].

In general, the curvature of a geometric space is intended as a local measure for how “different" a region of the space is from that of the flat Euclidean space. Notions of curvature in Riemannian geometry are governed by the Riemannian curvature tensor R. Given an N-dimensional Riemannian Manifold \(({\mathcal {M}},d_\mathcal {M})\), a point \(x\in {\mathcal {M}}\) and two vectors \(\textbf{v},\textbf{w}\in T_x{\mathcal {M}}\) (the tangent space of x), the Riemannian curvature tensor with respect to \(\textbf{v}\), \(\textbf{w}\) is a linear map \(R(\textbf{v},\textbf{w}):T_x\mathcal {M}\rightarrow T_x\mathcal {M}\), written as \(\textbf{u}\mapsto R(\textbf{v},\textbf{w})\textbf{u}\) and defined in terms of the Levi-Civita connection on the tangent bundle. It quantifies to what extent the manifold \(\mathcal {M}\) is not isometric to flat Euclidean space.

In this paper we will use the notion of curvature called Ricci curvature. For two vectors \(\textbf{v}\) and \(\textbf{w}\), the Ricci curvature \({\text {Ric}}\hspace{0.44434pt}(\textbf{v},\textbf{w})\) is defined, in terms of the Riemannian tensor, as the trace of the linear map

$$\begin{aligned} \textbf{u}\mapsto R(\textbf{u},\textbf{v})\textbf{w},\quad \ \textbf{u}\in T_x{\mathcal {M}}. \end{aligned}$$

Given a point \(x\in {\mathcal {M}}\) and a unit vector \(\textbf{v}\in T_x{\mathcal {M}}\), we often refer to \({\text {Ric}}\hspace{0.44434pt}(\textbf{v},\textbf{v})\) as the Ricci curvature of x with respect to v.

This Ricci curvature is related to another notion of curvature, called sectional curvature, which is defined as

$$\begin{aligned} K(\textbf{v},\textbf{w})=\frac{\langle R(\textbf{v},\textbf{w})\textbf{v},\textbf{w}\rangle }{\langle \textbf{v},\textbf{v}\rangle \langle \textbf{w},\textbf{w}\rangle -\langle \textbf{v},\textbf{w}\rangle ^2}, \end{aligned}$$

where \(\langle \,{\cdot },\,{\cdot }\,\rangle \) denotes the inner product on the tangent space. One can show that \({\text {Ric}}\hspace{0.44434pt}(\textbf{v},\textbf{v})\) is obtained by averaging the sectional curvature \(K(\textbf{v},\textbf{w})\) over all unit vectors \(\textbf{w}\in T_x\mathcal {M}\).

In the remainder of this paper we will work with the Ricci curvature of a point x, with respect to some tangent vector \(\textbf{v}\). We note that it is not needed to understand the fine details behind curvature of Riemannian manifolds to understand all the details of the results or proofs.

3 Main Results

Here we state our results regarding the convergence of Ollivier–Ricci curvature of random geometric graphs on Riemannian manifolds. We note that if the manifold dimension is \(N=1\), then there is nothing to prove, so that we always assume that \(N\ge 2\).

We mainly consider two different distances on the graphs, leading to two different but related results. Although we consider different distances on graphs, we shall always consider uniform measures on balls of a certain radius. We shall clearly distinguish between the connection radius of the graph and the radius used for the uniform measures:

connection radius:

\(\varepsilon _n\)

measure radius:

\(\delta _n\)

The former is the connectivity distance threshold: if the distance between a pair of nodes in the manifold is below this threshold, then these nodes are connected by an edge in the graph. The latter radius is the radius of the ball (either in the graph or in the manifold) over which the uniform probability measure is distributed.

Let \(G_n=\mathbb {G}_n(\varepsilon _n)\) be a random geometric graph on \(\mathcal {M}\) and \(d_G\) a distance on \(G_n\). Then, for a node \(x\in G_n\), we define the graph ball of radius \(\lambda \) around x as

$$\begin{aligned} \mathcal {B}_G(x;\lambda ):=\{y\in G_n\setminus \{x\}:d_G(x,y)\le \lambda \}. \end{aligned}$$

Note that \(\mathcal {B}_G(x;\lambda )\) depends on the definition of the graph distance \(d_G\). For our results we consider Ollivier-triples \({\mathcal {G}}_n=(G_n,d_G,{\varvec{m}}^G)\), where \({\varvec{m}}^G\) are the uniform measures on \(\mathcal {B}_G(x;\delta _n)\), i.e.,

$$\begin{aligned} m_x^G(y)={\left\{ \begin{array}{ll} \displaystyle \frac{1}{|\mathcal {B}_G(x;\delta _n)|} &{}\text {if }y\in \mathcal {B}_G(x;\delta _n),\\ 0&{}\text {else.} \end{array}\right. } \end{aligned}$$
(3)

We reiterate that if \(\varepsilon _n=\delta _n\) and the graph metric \(d_G\) is the shortest path distance, then we are in the classical setting of Ollivier–Ricci curvature on graphs as considered in the past literature [16, 26, 31].

3.1 Graphs with Manifold Weighted Distance

Let \(G_n=\mathbb {G}_n(x^*,\varepsilon _n)\) be a random rooted graph on \(\mathcal {M}\). Then we define the manifold weighted graph distance \(d_G^w\) as the weighted shortest-path distance on \(G_n\) where each edge (uv) is assigned weight \(d_\mathcal {M}(u,v)\), corresponding to the distance between the nodes on the manifold. Similarly to \(\mathcal {B}_G(x;\lambda )\), we denote by \({\mathcal {B}}_G^w(x;\lambda )\) the graph ball of radius \(\lambda \) with respect to \(d_G^w\) and let \({\varvec{m}}^{G,w}=(m_x^{G,w})_{x\in G}\) denote the uniformly measures on the balls \({\mathcal {B}}_G^w(x;\delta _n)\). Finally, given a point \(x\in \mathcal {M}\) and a vector \(\textbf{v}\in T_x\mathcal {M}\), we say that another point \(y\in \mathcal {M}\) is at distance \(\delta \) in the direction of \(~ \textbf{v}\), if \(d_\mathcal {M}(x,y)=\delta \) and y lies on the geodesic starting at x in the direction of \(\textbf{v}\).

Our first result shows that for certain combinations of connection radius \(\varepsilon _n\) and measure radius \(\delta _n\), the Ollivier–Ricci curvature on \(G_n\) converges to the Ricci curvature.

Theorem 1

Let \(N\ge 2\), \((\mathcal {M},d_\mathcal {M})\) be a smooth, connected, and compact N-di- mensional Riemannian manifold, \(x^*\in {\mathcal {M}}\), and \(\textbf{v}\) a unit tangent vector at \(x^*\). Furthermore, let \(\varepsilon _n=\varTheta \hspace{0.33325pt}((\log n)^an^{-\alpha })\), \(\delta _n=\varTheta \hspace{0.33325pt}((\log n)^bn^{-\beta })\) (as \(n\rightarrow \infty \)) where the constants satisfy

$$\begin{aligned} 0<\beta \le \alpha ,\quad \ \ \alpha + 2\beta \le \frac{1}{N}, \end{aligned}$$

and \(a\le b\) if \(\alpha =\beta \) and \(\min {\{a,a+2b\}}>2/N\) if \(\alpha +2\beta =1/N\). Let \(y_n^*\in \mathcal {M}\) be at distance \(\delta _n\) in the direction of \(\textbf{v}\) and \(G_n=\mathbb {G}_n(x^*,y_n^*,\varepsilon _n)\) be rooted random graphs on \(\mathcal {M}\). Then for the Ollivier-triple \({\mathcal {G}}_n^w=(G_n,d_{G_n}^w,{\varvec{m}}^{G,w})\), it holds

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb {E}\left[ \biggl |\frac{2(N+2)\kappa (x^*,y_n^*;{\mathcal {G}}_n^w)}{\delta _n^2}-{\text {Ric}}\hspace{0.44434pt}(\textbf{v},\textbf{v})\biggr |\right] =0. \end{aligned}$$

Theorem 1 relates two different quantities. The first is the Ollivier–Ricci curvature in the graph between the node \(x^*\) and another node \(y_n^*\) that is at distance \(\delta _n\) from \(x^*\) in the direction of vector \(\textbf{v}\). The second is the Ricci curvature of the manifold at \(x^*\) in the \(\textbf{v}\)-direction. The theorem says that if we properly rescale the former, it converges in expectation to the latter.

Remark 4

  1. 1.

    Note that Theorem 1 states that \(\delta _n^{-2}2(N+2)\hspace{0.7222pt}\kappa (x^*,y_n^*;{\mathcal {G}}_n^w)\) converges in the \(L^1\) sense to \({\text {Ric}}\hspace{0.44434pt}(\textbf{v},\textbf{v})\). In particular, this implies the concentration result

    $$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb {P}\left( \biggl |\frac{2(N+2)\hspace{0.54993pt}\kappa (x^*,y_n^*;{\mathcal {G}}_n^w)}{\delta _n^2}-{\text {Ric}}\hspace{0.44434pt}(\textbf{v},\textbf{v})\biggr |\ge \eta \right) =0,\ \ \quad \text {for all }\eta >0. \end{aligned}$$
  2. 2.

    Since \(\varepsilon _n,\delta _n\rightarrow 0\), both the connectivity and measure neighborhoods of \(x^*\) become smaller as n grows. Indeed, curvature is a local property, so that measuring it more accurately requires smaller regions.

  3. 3.

    While the connectivity neighborhood of \(x^*\) is shrinking, the expected number of \(x^*\)’s neighbors lying in it is growing with n. To see this, note that for large enough n the volume of the ball \(\mathcal {B}_\mathcal {M}(x;\varepsilon _n)\) around \(x\in \mathcal {M}\) can be approximated by that of the N-dimensional Euclidean ball. Hence, for any \(x\in \mathbb {G}_n(x^*,y_n^*,\varepsilon _n)\), as \(n\rightarrow \infty \),

    $$\begin{aligned} \mathbb {E}\hspace{0.33325pt}[|\mathcal {N}_x|]=n{\text {vol}}_\mathcal {M}(\mathcal {B}_\mathcal {M}(x;\varepsilon _n))=\varTheta (n\varepsilon _n^N)=\varTheta \bigl ((\log n)^{aN}n^{1-\alpha N}\bigr ). \end{aligned}$$

    The conditions of the theorem imply that \(\alpha \le \alpha +2\beta \le 1/N\), so that \(1-\alpha N\ge 0\). This means that the average degree diverges faster than logarithmically if \(\alpha N<1\). More generally, the conditions of Theorem 1 imply that the average degree always diverges faster than \((\log n)^2\).

If we consider the classic setting where the connection and measure radii are the same, \(\varepsilon _n=\delta _n\), then the following result is a direct consequence of Theorem 1.

Corollary 1

Let \(N\ge 2\), \((\mathcal {M},d_\mathcal {M})\) be a smooth, connected, and compact N-di- mensional Riemannian manifold, \(x^*\in {\mathcal {M}}\), and \(\textbf{v}\) a unit tangent vector at \(x^*\). Furthermore, let \(\delta _n=\varTheta \hspace{0.33325pt}((\log n)^bn^{-\beta })\), with \(\beta \le 1/(3N)\) and \(b>2/N\) whenever \(\beta =1/(3N)\). Let \(y_n^*\in \mathcal {M}\) be at distance \(\delta _n\) in the direction of \(\textbf{v}\) and \(G_n=\mathbb {G}_n(x^*,y_n^*,\delta _n)\) be rooted random graphs on \(\mathcal {M}\). Then for the Ollivier-triple \({\mathcal {G}}_n^w=(G_n,d_{G_n}^w,{\varvec{m}}^{G,w})\), it holds

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb {E}\left[ \biggl |\frac{2(N+2) \kappa (x^*,y_n^*;{\mathcal {G}}_n^w)}{\delta _n^2}-{\text {Ric}}\hspace{0.44434pt}(\textbf{v},\textbf{v})\biggr |\right] =0. \end{aligned}$$

While the conditions in this corollary imply that the average degree in \(\mathbb {G}_n(x^*,y_n^*,\delta _n)\) diverges faster than \(n^{2/3}\), Theorem 1 works for graphs where the average degree can be almost as small as \((\log n)^2\). The crucial component for establishing the curvature convergence in graphs with so much smaller average degree is to consider different connection and measure radii and let the connection radius decrease at a faster rate than the measure radius, i.e., \(\varepsilon _n\ll \delta _n\).

Remark 5

(extreme cases for convergence of curvature)    Corollary 1 covers one set of extreme cases for the combination \(a,b,\alpha \) and \(\beta \) from Theorem 1, were we take \(\beta \) to be as big as possible. This means that we compute the curvature using uniform probability measures on a set of nodes that is as small as possible. For the true extreme case, let \(\eta >0\) be arbitrarily small and define \(\beta =(1-\eta )/(3N)\) and \(b=(2+\eta )/N\). Then, to calculate the curvature, we need to compute the Wasserstein metric between uniform probability measures on neighborhoods that contain

$$\begin{aligned} \varTheta (n\varepsilon _n^N)=\varTheta (n\delta _n^N)=\varTheta \bigl ((\log n)^{2+\eta }n^{(2+\eta )/3}\bigr ), \end{aligned}$$

number of nodes. The consequence, however, is that our graphs have average degree diverging at the same rate: \((\log n)^{2+\eta }n^{(2+\eta )/3}\).

In order to get graphs whose average degree diverges as slow as possible, we need to consider an other extreme case. Again let \(\eta >0\) be arbitrary small. Now we define

$$\begin{aligned} a=\frac{2+\eta }{N},\ \quad \alpha =\frac{1-\eta }{N},\quad \ b=a,\ \quad \beta =\frac{\eta }{2N}. \end{aligned}$$

For these choices we have that \(\alpha +2\beta =1/N\) and \(\min {\{a,a+2b\}}=a>2/N\) so that the result from Theorem 1 holds. In this case, the average degree scales as

$$\begin{aligned} \varTheta (n\varepsilon _n^N)=\varTheta \hspace{0.33325pt}((\log n)^{Na}n ^{1-N\alpha })=\varTheta \hspace{0.33325pt}((\log n)^{2+\eta }n^{\eta }), \end{aligned}$$

which is almost logarithmic. However, we now need to compute the Wasserstein metric with respect to the uniform measure on a number of nodes that scales as

$$\begin{aligned} \varTheta (n\delta _n^N)=\varTheta \hspace{0.33325pt}((\log n)^{2+\eta }n^{1-\eta /2}). \end{aligned}$$

That is, in order to compute curvature on graphs with almost logarithmic average degree, we need to consider the uniform probability measure on almost the entire graph.

3.2 Graphs with Hop Count Distance

In the previous section we considered Ollivier–Ricci curvature of graphs on Riemannian manifolds, with graph edges weighted by manifold distances. These weights encode a lot of information about the manifold metric structure, so that one may feel not terribly surprised that we can recover manifold curvature from graph curvature using this information. The natural question is then if it is possible to prove convergence of Ollivier–Ricci curvature based on shortest path distances \(d_G^s\) in unweighted graphs. It turns out that this can be done under some slightly more restrictive conditions on the connection and measure radii.

For this we define, for any random geometric graph \(G_n=\mathbb {G}(\varepsilon _n)\), the rescaled shortest path distance \(d_G^*(x,y)=\varepsilon _nd_G^s(x,y)\). Similar to the previous setting we let \(\mathcal {B}_G^*(x;\delta _n)\) denote the balls of radius \(\delta _n\) around in \(x\in G_n\) with respect to the metric \(d_G^*\) and define the random walk measures

$$\begin{aligned} m^{G,*}_x(y)={\left\{ \begin{array}{ll}\displaystyle \frac{1}{|\mathcal {B}_G^*(x;\delta _n)|} &{}\text {if } y\in \mathcal {B}_G^*(x;\delta _n),\\ 0 &{}\text {else}.\end{array}\right. } \end{aligned}$$

Theorem 2

Let \((\mathcal {M}, d_\mathcal {M})\) be a \({\varvec{smooth, connected, ~and~ compact}}\) 2-dimensional Riemannian manifold, \(x^*\in {\mathcal {M}}\), and \(\textbf{v}\) a unit tangent vector at \(x^*\). Furthermore, let \(\varepsilon _n=\varTheta \hspace{0.33325pt}((\log n)^an^{-\alpha })\), \(\delta _n=\varTheta \hspace{0.33325pt}((\log n)^bn^{-\beta })\) where the constants satisfy

$$\begin{aligned} 0<\beta \le \frac{1}{9}\ \ \quad \text {and}\ \ \quad 3\beta \le \alpha \le \frac{1-3\beta }{2}, \end{aligned}$$

and \(a<3b\) if \(\alpha =3\beta \) and \(2a+3b>1\) if \(\alpha =(1-3\beta )/2\). Let \(y_n^*\in \mathcal {M}\) be at distance \(\delta _n\) in the direction of \(\textbf{v}\) and \(G_n=\mathbb {G}_n(x^*,y_n^*,\varepsilon _n)\) be rooted random graphs on \(\mathcal {M}\). Then for the Ollivier-triple \({\mathcal {G}}_n^*=(G_n,d_{G_n}^*,{\varvec{m}}^{G,*})\), it holds

$$\begin{aligned} \lim _{n\rightarrow \infty }\,\mathbb {E}\left[ \biggl |\frac{2(N+2)\kappa (x^*,y_n^*;{\mathcal {G}}_n^*)}{\delta _n^2}-{\text {Ric}}\hspace{0.44434pt}(\textbf{v},\textbf{v})\biggr |\right] =0. \end{aligned}$$

Remark 6

  1. 1.

    Note that unlike Theorem 1, here we do not include any information on the distances between nodes on the manifold. This is because the distance \(\delta ^*_{G_n}\) is simply the shortest path distance on the graph \(G_n\) rescaled by the connection radius \(\varepsilon _n\).

  2. 2.

    Observe that the theorem allows to select an \(\alpha \) that is arbitrary close to 1/2. In particular,

    $$\begin{aligned} \mathbb {E}\hspace{0.33325pt}[|\mathcal {N}_x|]=\varTheta (n\varepsilon _n^2)=\varTheta \hspace{0.33325pt}((\log n)^{2a}n^{1-2\alpha })\le \varTheta \hspace{0.33325pt}((\log n)^{2a}n^{6\beta }). \end{aligned}$$

    Hence by selecting a small \(\beta \) we have a discrete notion of curvature that converges on graphs with almost logarithmic average degree, without using any information on the manifold.

  3. 3.

    Theorem 2 currently only works in 2-dimensional manifolds. This is because the proof relies on results for the stretch (the fraction \(d_G/d_\mathcal {M}\)) for random geometric graphs in 2-dimensional Euclidean space [9]. Our proof techniques, however, immediately allow the results to be extended to higher dimensions, once similar types of stretch results for these spaces are obtained.

3.3 Summary, Comments, Caveats, and Outlook

In summary, we have proven that upon proper rescaling, the Ollivier–Ricci curvature of random geometric graphs on a Riemannian manifold converges to the Ricci curvature of the underlying manifold.

Our first result, Theorem 1, establishes convergence of Ollivier–Ricci curvature for a wide range of connectivity and measure radii. In particular, it contains as a corollary the classical setting where both radii are the same, Corollary 1. The theorem does, however, require knowledge of pairwise distances between connected nodes in the manifold.

Our second result, Theorem 2, relaxes this requirement and establishes the same convergence without any knowledge of distances in the manifold. This does come at the price of slightly more restrictive conditions on the possible connection and measure radii. Still, as for the first result, the convergence holds all the way up to graphs whose average degree grows very slowly (almost logarithmically).

To the best of our knowledge, these are the first rigorous results on the convergence of a discrete notion of curvature of random combinatorial objects to a traditional continuum notion of curvature of smooth space.

While the classical setting for Ollivier–Ricci graph curvature uses probability measures (random walks) on balls of the same radius as the graph connection radius, in this paper we allow the radii to be different. This is an important generalization. In particular, we find that in order for the curvature to converge on graphs with almost logarithmic average degree, we need the probability measure radius to be much larger than the connection radius. This is intuitively expected because in order to “feel” any curvature in graphs with such a low density, we really need to consider large “mesoscopic” neighborhoods in them since otherwise all we could see is local “microscopic” Euclidean flatness. It would be interesting to see how this more general approach would generalize known results for the classical setting of Ollivier–Ricci curvature of graph families that have been investigated in the past, such as trees or Erdős–Rényi random graphs [4, 16].

In our recent numeric experiments [13], we have seen that in manifold-distance-weighted random geometric graphs, the Ollivier–Ricci curvature convergence holds even for graphs with constant average degree. Unfortunately, the proof techniques presented in this paper do not allow for a direct generalization to this setting. Therefore, other techniques are needed to (dis)confirm the convergence of Ollivier–Ricci curvature of graphs with constant average degree. We note that one definitely cannot expect Ollivier–Ricci curvature to converge in all possible graph sparsity settings. For example, we definitely need the giant component to exist to talk about any curvature convergence.

For the task of learning latent geometry in networks, our results can still be improved, particularly by removing the requirement to know the connection radius. When presented just with a truly unweighted realization of a random geometric graph, this radius needs first to be learnt, estimated. It would thus be interesting to see if convergence would still hold if we replace the true value of the connection radius with its consistent estimation, e.g. based on the average degree. Here we expect the speed of curvature convergence (if any) to depend on the speed of estimator convergence in a possibly nontrivial way.

Finally, now that we have seen that Ollivier–Ricci curvature of random combinatorial discretizations of smooth spaces converges to their Ricci curvature, it would be interesting to investigate whether such convergence also holds for other popular notions of discrete curvature. Forman–Ricci curvature [37] appears to be a good next candidate for such investigation.

4 Proof Overview

Our main results in Theorems 1 and 2 follow from our more general result on the Ollivier–Ricci curvature convergence in graphs whose edges are always weighted by some weights. That is, we assume that all edges in our graphs always have some weights, assigned according to some scheme. For our general result it is not important what these weights or their assignment scheme are. What is important is that the graph distance \(d_G\) between node pairs is a good approximation of the manifold distance \(d_\mathcal {M}\) between the corresponding pair of points. Here by graph distance we mean any metric on the vertex set of the graph. To quantify how good this approximation is, we introduce the following definition.

Definition 4.1

Let \((\mathcal {M},d_\mathcal {M})\) be an N-dimensional Riemannian manifold and \(G_n=\mathbb {G}_n(x^*,\varepsilon _n)\) a rooted random graph on \(\mathcal {M}\). A graph distance \(d_G\) on \(G_n\) is said to be a \(\delta _n\)-good approximation of \(d_\mathcal {M}\) if \(d_\mathcal {M}\le d_G\) and the following holds (as \(n\rightarrow \infty \)): there exists a \(Q>3\) and \(\xi _n=o(\delta _n)\) such that with probability \(1-o(\delta _n^3)\),

$$\begin{aligned} |d_\mathcal {M}(u,v) - d_G(u,v)|\le d_\mathcal {M}(u,v)\hspace{0.7222pt}\xi _n^2+\xi _n^3, \end{aligned}$$
(4)

holds for all \(u,v\in \mathcal {B}_\mathcal {M}(x^*;Q \delta _n) \cap G_n\).

Remark 7

(asymptotic expressions)    Most of our results will deal with asymptotic relations, e.g. \(\xi _n=o(\delta _n)\). Unless stated otherwise, these asymptotic relations will always be understood as \(n\rightarrow \infty \).

There are several examples of shortest weighted path distance that are \(\delta _n\)-good approximation of the manifold distance. In this paper we consider two cases. In one, each edge (uv) has weight equal to \(d_\mathcal {M}(u,v)\), while in the other case the weight is simply the connection radius \(\varepsilon _n\). An explicit example of the latter case is when the manifold is 2-dimensional and the connection and measure radii are given by \(\varepsilon _n=n^{-1/3}\) and \(\delta _n=n^{-1/9}\log n\), respectively. See Propositions 4 and 5 for more details.

Recall that \(\mathcal {B}_G(x;\delta )\) denotes the set of nodes in the graph that are at graph distance at most \(\delta \) from x,

$$\begin{aligned} m_x^G(y) = {\left\{ \begin{array}{ll}\displaystyle \frac{1}{|\mathcal {B}_G(x;\delta _n)|}&{}\text {if }y\in \mathcal {B}_G(x;\delta _n),\\ 0&{}\text {else},\end{array}\right. } \end{aligned}$$

and define

$$\begin{aligned} \lambda _n=(\log n)^{2/N}n^{-1/N}. \end{aligned}$$
(5)

This \(\lambda _n\) will play the role of an additional radius, for extending the graph distance \(d_G\) to the manifold. In short, to define a distance between \(u,v\in \mathcal {M}\), we will connect u and v to all points of the graph within radius \(\lambda _n\) and then use the graph distance. The radius \(\lambda _n\) has been selected such that the expected number of nodes inside any ball \(\mathcal {B}_\mathcal {M}(x;\lambda _n)\) is of the order \(\varTheta \hspace{0.33325pt}((\log n)^2)\). Hence, the probability of observing no node of the graph inside any such ball is \(O(e^{-(\log n)^2})=o(n^{-1})\), which is sufficiently small. More details on the use of \(\lambda _n\) can be found in Sect. 5.1. Our general result is then as follows.

Theorem 3

Let \(N\ge 2\), \((\mathcal {M},d_\mathcal {M})\) be a smooth, connected, and compact N-di- mensional Riemannian manifold, \(x^*\in {\mathcal {M}}\), and \(\textbf{v}\) a unit tangent vector at \(x^*\). Furthermore, let \(\varepsilon _n\le \delta _n=o(1)\) be such that \(\lambda _n=o(\varepsilon _n)\) and \(\lambda _n=o(\delta _n^3)\). Let \(y_n^*\in \mathcal {M}\) be at distance \(\delta _n\) in the direction of \(\textbf{v}\), \(G_n=\mathbb {G}_n(x^*,y_n^*,\varepsilon _n)\) be rooted random graphs on \(\mathcal {M}\), and \(d_G\) a \(\delta _n\)-good approximation of \(d_\mathcal {M}\). Then, if we consider the Ollivier-triple \({\mathcal {G}}_n=(G_n,d_G,{\varvec{m}}^G)\),

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb {E}\left[ \biggl |\frac{2(N+2)\hspace{0.69992pt}\kappa (x^*,y_n^*;{\mathcal {G}}_n)}{d_G(x^*,y_n^*)^2}-{\text {Ric}}\hspace{0.44434pt}(\textbf{v},\textbf{v})\biggr |\right] =0. \end{aligned}$$

Once we have established this general result, our main results in Theorems 1 and 2 follow if we can show that the considered graph distances are \(\delta _n\)-good approximations.

A key ingredient in the proof of Theorem 3 is the convergence result for Ollivier–Ricci curvature for uniform measures on Riemannian manifolds, proved in the seminal paper on the topic [28]. In a high-level overview, our proof approximates Ollivier–Ricci curvature of probability measures on the graph with those on the manifold. Having obtained such an approximation with a required accuracy, we then apply the convergence result from [28].

Since Ollivier–Ricci curvature is defined by the Wasserstein metric on probability measures, our analysis focuses on approximating the Wasserstein metric of discrete probability measures on the graph by the Wasserstein metric of uniform probability measures on the manifold. This is done in three steps: 1) extend the graph distance \(d_G\) to a distance \({\widetilde{d}}_\mathcal {M}\) on the manifold such that the Wasserstein metric \({\widetilde{W}}_1\) with respect to this new distance is a good approximation of the Wasserstein metric \(W_1\) on the manifold, 2) show that the Wasserstein metric between the probability measure \(m_x^G\) on the graph and the discrete probability measure \(m_x^\mathcal {M}\) on the nodes within the ball \(\mathcal {B}_\mathcal {M}(x;\delta _n)\) is sufficiently small, and 3) show that the Wasserstein metric between the uniform measure on \(\mathcal {B}_\mathcal {M}(x;\delta _n)\) and the discrete probability measure \(m_x^\mathcal {M}\) is sufficiently small.

Remark 8

In all cases, sufficiently small means that the error terms are of smaller order than \(\delta _n^3\). This is because the Wasserstein metric is first divided by \(\delta _n\) to obtain the curvature, which is then divided by \(\delta _n^2\) to make it converge to the Ricci curvature.

We proceed with explaining all ingredients and the three steps in more detail. We reiterate that unless stated otherwise, we will assume that \(\varepsilon _n\le \delta _n\) are two sequences converging to zero such that \(\lambda _n=o(\varepsilon _n)\) and \(\lambda _n=o(\delta _n^3)\).

4.1 Ollivier Curvature on Riemannian Manifolds

Let \((\mathcal {M},d_\mathcal {M})\) be a smooth, orientable, connected and compact N-dimensional Riemannian manifold. For \(x\in \mathcal {M}\) and \(\delta >0\), we write \(\mathcal {B}_\mathcal {M}(x;\delta )\subseteq \mathcal {M}\) to denote the closed ball of radius \(\delta \) around x, i.e., \(\mathcal {B}_\mathcal {M}(x;\delta )=\{y\in \mathcal {M}:d_\mathcal {M}(x,y)\le \delta \}\). Recall that

$$\begin{aligned} \textrm{vol}_\mathcal {M}(\mathcal {B}_\mathcal {M}(x;\delta )):=\int _{\mathcal {B}_\mathcal {M}(x;\delta )}\!\!\!\textrm{d}\textrm{vol}_\mathcal {M}(y), \end{aligned}$$

denotes the volume of the ball \(\mathcal {B}_\mathcal {M}(x;\delta )\). Now fix \(\delta > 0\) and consider the uniform measure on balls of radius \(\delta \). That is, for \(x \in \mathcal {M}\) we take the probability measure \(\mu _x^\delta \) given by

$$\begin{aligned} \textrm{d}\mu ^\delta _x(y)={\left\{ \begin{array}{ll}\displaystyle \frac{1}{\textrm{vol}_\mathcal {M}(\mathcal {B}_\mathcal {M}(x;\delta ))}\,\textrm{d}\textrm{vol}_\mathcal {M}(y)&{}\text {if }y\in \mathcal {B}_\mathcal {M}(x;\delta ),\\ 0&{}\text {else.}\end{array}\right. } \end{aligned}$$
(6)

We will refer to \(\mu ^\delta _x\) as the uniform \(\delta \)-measure. The following result from [28] shows that for a uniform \(\delta \)-measure on a Riemannian manifold, the Ollivier curvature (properly rescaled) converges to the Ricci curvature as \({\delta \rightarrow 0}\).

Theorem 4

[28, Exam. 7] Let \((\mathcal {M},d_\mathcal {M})\) be a smooth complete N-dimensional Riemannian manifold \(x\in {\mathcal {M}}\) and \(\textbf{v}\) a unit tangent vector at x. Let \(\delta >0\) and \(y_\delta \) be the point at distance \(\delta \) in the direction of \(\textbf{v}\). Then if we consider the Ollivier–Ricci curvature \(\kappa \) for the uniform \(\delta \)-measures given by (6),

$$\begin{aligned} \lim _{\delta \rightarrow 0}\frac{2(N+2)}{\delta ^2}\kappa (x,y_\delta )={\text {Ric}}\hspace{0.44434pt}(\textbf{v},\textbf{v}). \end{aligned}$$

Remark 9

The result in Theorem 4 clearly exhibits the local nature of curvature as it holds in the limit where the distance \(d_\mathcal {M}(x,y)=\delta \) between the two points goes to zero.

Taking \(\delta = \delta _n\), \(x = x^*\), and \(y = y_n^*\) in the above theorem, we have that the rescaled Ollivier–Ricci curvature associated to the uniform \(\delta _n\)-measures converges to the Ricci curvature as \(n\rightarrow \infty \). The main strategy for proving Theorem 3 is to compare this “uniform” version of the curvature \(\kappa \) on the manifold to the discrete version on the graph. More precisely, we need to prove that

$$\begin{aligned} \mathbb {E}\bigl [\bigl |W_1^G\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr )-W_1 \bigl (\mu _{x^*}^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )\bigr |\bigr ]=o(\delta _n^3). \end{aligned}$$
(7)

There are two complicating factors here. First, we have to deal with two Wasserstein metrics defined on two different spaces. Second, we have to compare discrete probability measures with continuous ones. We deal with the different Wasserstein metrics in the next section and with comparing the different measures in Sects. 4.3 and 4.4.

4.2 Extending the Graph Distance to the Manifold

In order to compare the two different Wasserstein metrics in (7) we extend the graph distance \(d_G\) to a distance \({\widetilde{d}}_\mathcal {M}\) defined on a sufficiently large part of \(\mathcal {M}\). In particular, we will consider the ball \(\mathcal {B}_\mathcal {M}(x^*;Q\delta _n)\), with \(Q>3\) from Definition 4.1. The extension is such that for any two nodes \(x,y\in G_n\), \(d_G(x,y)=\widetilde{d}_\mathcal {M}(x,y)\), so that \(W_1^G(m_{x^*}^G,m_{y_n^*}^G)\) can be replaced by the Wasserstein metric associated with \({\widetilde{d}}_\mathcal {M}\).

Recall the definition of \(\lambda _n\) from (5), \(\lambda _n=(\log n)^{2/N}n^{-1/N}\). Denote \(G_n=\mathbb {G}_n(x^*,y_n^*,\delta _n)\) and let \(U\subset {\mathcal {M}}\) be a countable set of points. Then we define the graph \(G_n(U)\) obtained from \(G_n\) by adding the points of U to the vertex set and connecting each \(u\in U\) to any other node \(x\in G_n \setminus U\) for which \(d_{{\mathcal {M}}}(x,u)\le \lambda _n/2\). After this, we assign to each new edge (ux) the weight \(d_\mathcal {M}(x,u)(1+\xi _n^2)+\xi _n^3\), with \(\xi _n\) from Definition 4.1. We can then extend the graph distance to the manifold by defining \({\widetilde{d}}_{{\mathcal {M}}}(u,v)\) to be the graph distance \(d_G(u,v)\) computed in the extended graph \(G_n(\{u,v\})\) with the added weights. That is, \({\widetilde{d}}_{{\mathcal {M}}}(u,v)\) is the shortest weighted path distance in the extended graph \(G_n(\{u,v\})\), where the weights follow the same scheme as for the original graph.

Observe that if \(x,y\in G_n\) then \({\widetilde{d}}_{{\mathcal {M}}}(x,y)=d_G(x,y)\) so that the distance on nodes of \(G_n\) does not change and hence \({\widetilde{d}}_{{\mathcal {M}}}\) is a true extension of \(d_G\). In addition, by definition of the graph distance it immediately follows that \({\widetilde{d}}_{{\mathcal {M}}}(u,v)=0\) if and only if \(u=v\). Figure  1 shows an illustration of the extended distance.

Fig. 1
figure 1

Illustration of the extended graph distance \({\widetilde{d}}_\mathcal {M}\). Here u is connected to node \(x_1\) and v to \(x_6\) and the shortest geodesic-weighted path between \(x_1\) and \(x_6\) goes over five edges

It is important to note that this extended distance depends on the random graph \(G_n\). Therefore, it could happen that two added points \(u,v\in U\) are not connected in \(G_n(U)\), i.e., there does not exist a path from u to v in the extended graph. This happens if there are no nodes in \(\mathcal {B}_\mathcal {M}(u;\lambda _n/2)\) or in \(\mathcal {B}_\mathcal {M}(v;\lambda _n/2)\) or if none of the node pairs \((x,y)\in \mathcal {B}_\mathcal {M}(u;\lambda _n/2)\times \mathcal {B}_\mathcal {M}(v;\lambda _n/2)\) are connected by a path in \(G_n\). Therefore, to justify the definition of the extended manifold distance we need to make sure that, with sufficiently high probability, theses situations do not occur.

Lemma 1

Let \(G_n={\mathbb {G}}_n(x^*,y_n^*,\delta _n)\) and \(Q>3\) be the constant from Definition 4.1. Then, there exists an event \(\varOmega _n\) satisfying \(\mathbb {P}\left( \varOmega _n\right) \ge 1-o(\delta _n^3)\) such that on this event the following holds:

  • (\(\varOmega 1\)) \((\mathcal {B}_\mathcal {M}(x^*;Q\delta _n),{\widetilde{d}}_\mathcal {M})\) is a metric space and

  • (\(\varOmega 2\)) \({\widetilde{d}}_\mathcal {M}(u,v)=d_\mathcal {M}(u,v)+o(\delta _n^3)\).

The first property ensures that our extended distance is an actual distance. Moreover, by the second property, this extended distance is a good approximation of the true distance on the manifold. Finally, we also note that the first property makes sure that \(d_G(x^*,y_n^*)={\widetilde{d}}_\mathcal {M}(x^*,y_n^*)<\infty \), so that the curvature \(\kappa \) between \(x^*\) and \(y_n^*\) is well defined and not forced to be zero. The precise definition of \(\varOmega _n\) is not needed to understand the high level arguments as well as the proof of the main results. For now, let us refer to \(\varOmega _n\) as the good event. Details on this event can be found in Sect. 5.1.

Let \({\widetilde{W}}_1\) denote the Wasserstein metric with respect to \({\widetilde{d}}_\mathcal {M}\), which is only well defined on the good event \(\varOmega _n\). Since the distance is determined by the graph \(G_n=\mathbb {G}_n(x^*,y_n^*,\delta _n)\), the Wasserstein metric is also a random object. The following proposition states that, on the event \(\varOmega _n\), the difference between the Wasserstein metrics \({\widetilde{W}}_1\) and \(W_1\) is small. The proof is given in Sect. 5.1.

Proposition 1

Let \(G_n={\mathbb {G}}_n(x^*,\varepsilon _n)\) and \(\mu _1,\mu _2\) be two probability measures on \(\mathcal {M}\) with support contained in \(\mathcal {B}_\mathcal {M}(x^*;Q\delta _n)\). Then

$$\begin{aligned} \mathbb {E}\bigl [|{\widetilde{W}}_1(\mu _1,\mu _2)-W_1(\mu _1,\mu _2)|\,|\,\varOmega _n\bigr ]=o(\delta _n^3). \end{aligned}$$

Recall that \({\widetilde{d}}_\mathcal {M}(x,y)=d_G(x,y)\) if \(x,y\in G_n\), and therefore \(W_1^G(m_{x^*}^G,m_{y_n^*}^G)={\widetilde{W}}_1(m_{x^*}^G,m_{y_n^*}^G)\). Hence, since the uniform \(\delta _n\)-measures \(\mu _{x^*}^{\delta _n}\) and \(\mu _{y_n^*}^{\delta _n}\) are probability measures on \(\mathcal {M}\) with support contained in \(\mathcal {B}_\mathcal {M}(x^*;Q\delta _n)\), Proposition 1 implies that on the good event,

$$\begin{aligned} \bigl |W_1^G\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr )-W_1 \bigl (\mu _{x^*}^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )\bigr |=\bigl | {\widetilde{W}}_1\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr )-{\widetilde{W}}_1\bigl (\mu _{x^*} ^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )\bigr |+o(\delta _n^3) \end{aligned}$$

holds in expectation. This is helpful because both Wasserstein metrics in the expression on the right hand side are now defined on the same space. Therefore, since \({\widetilde{W}}_1\) is a distance, the reverse triangle inequality implies

$$\begin{aligned} \bigl |{\widetilde{W}}_1\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr )-{\widetilde{W}}_1 \bigl (\mu _{x^*}^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )\bigr |\le {\widetilde{W}}_1 \bigl (m_{x^*}^G,\mu _{x^*}^{\delta _n}\bigr )+{\widetilde{W}}_1\bigl (m_{y_n^*}^G, \mu _{y_n^*}^{\delta _n}\bigr ). \end{aligned}$$

Applying Proposition 1 again we get that

$$\begin{aligned} \bigl |{\widetilde{W}}_1\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr )-{\widetilde{W}}_1 \bigl (\mu _{x^*}^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )\bigr |\le W_1\bigl (m_{x^*}^G, \mu _{x^*}^{\delta _n}\bigr )+W_1\bigl (m_{y_n^*}^G,\mu _{y_n^*}^{\delta _n}\bigr )+o(\delta _n^3) \end{aligned}$$

holds in expectation, conditioned on the good event. However, the right hand side no longer involves the extended distance. Hence, it now suffices to show that for any \(x\in \mathcal {B}_\mathcal {M}(x^*;\delta _n)\),

$$\begin{aligned} \mathbb {E}\bigl [W_1\bigl (m_x^G,\mu _x^{\delta _n}\bigr )\,|\,\varOmega _n\bigr ]=o(\delta _n^3). \end{aligned}$$
(8)

4.3 Approximating Probability Measures on Graph Balls

Recall that \(\mathcal {B}_\mathcal {M}(x;\delta _n)\) denotes the closed ball around \(x\in \mathcal {M}\) with radius \(\delta _n\) according to the manifold distance \(d_\mathcal {M}\). The first step in establishing (8) is to move from uniform measures on the graph balls \(\mathcal {B}_G(x;\delta _n)\) to uniform measures on the nodes of the graph that lie in the manifold balls \(\mathcal {B}_\mathcal {M}(x;\delta _n)\). The reason for this is that \(y\in \mathcal {B}_G(x;\delta _n)\) does not necessarily imply that \(y\in \mathcal {B}_\mathcal {M}(x;\delta _n)\), nor vice versa. This creates difficulties when comparing the measures \(m_x^G\) and \(\mu _x^{\delta _n}\).

Let \(G_n=\mathbb {G}_n(x^*,\varepsilon _n)\) be rooted random graphs on \(\mathcal {M}\). Then we define the probability measures \({\varvec{m}}^\mathcal {M}\) on the nodes of \(G_n\) as

$$\begin{aligned} m_x^\mathcal {M}(y)={\left\{ \begin{array}{ll}\displaystyle \frac{1}{|\mathcal {B}_\mathcal {M}(x;\delta _n)\cap G_n|}&{}\text {if }y\in \mathcal {B}_\mathcal {M}(x;\delta _n)\cap G_n,\\ 0&{}\text {else.}\end{array}\right. } \end{aligned}$$
(9)

Although the uniform measures \(m_{x^*}^G\) and \(m_{x^*}^\mathcal {M}\) are not the necessarily equal, the Wasserstein metric between them is sufficiently small.

Proposition 2

Let \(G_n=\mathbb {G}_n(x^*,\varepsilon _n)\) be rooted random graphs on \(\mathcal {M}\) with graph distance \(d_G\) that is a \(\delta _n\)-good approximation of \(d_\mathcal {M}\). Let \(x\in \mathcal {B}_\mathcal {M}(x^*;\delta _n)\) and denote by \(m_{x}^G\) the uniform measure on \(\mathcal {B}_G(x;\delta _n)\) and by \(m_{x}^\mathcal {M}\) the uniform measure on \(\mathcal {B}_\mathcal {M}(x;\delta _n)\cap G_n\). Then

$$\begin{aligned} \mathbb {E}\bigl [W_1\bigl (m_{x}^G,m_{x}^\mathcal {M}\bigr )\,|\,\varOmega _n\bigr ]=o(\delta _n^3). \end{aligned}$$

The proof of this result is based on some simple computations regarding Poisson random variables and can be found in Sect. 5.2. Proposition 2 allows us to replace (8) with

$$\begin{aligned} \mathbb {E}\bigl [W_1\bigl (m_x^\mathcal {M},\mu _x^{\delta _n}\bigr )\bigr ]=o(\delta _n^3). \end{aligned}$$
(10)

Note that the only dependence on the graph is now in the amount of nodes placed inside the ball \(\mathcal {B}_\mathcal {M}(x;\delta _n)\), which is completely determined by the Poisson process. All dependencies on the actual structure of the graph have been removed. This allows us to compute the Wasserstein metric between \(m_x^\mathcal {M}\) and \(\mu _x^{\delta _n}\).

4.4 Coupling Continuous and Discrete Probability Measures on \(\mathcal {M}\)

Recall that the Wasserstein metric \(W_1(\mu _1,\mu _2)\) takes an infimum over all possible joint distributions (couplings) between the measures \(\mu _1\) and \(\mu _2\). Hence, to show that (10) holds, we need to design an optimal coupling (transport plan) between \(m_x^\mathcal {M}\) and \(\mu _x^{\delta _n}\). The main idea here is to view \(m_x^\mathcal {M}\) as a discrete version of \(\mu _x^{\delta _n}\).

For now, let us assume that we are working in the N-dimensional Euclidean cube \(\mathcal {M}=[0,1]^N\). Given a realization of the Poisson process, a transport plan between \(m_x^\mathcal {M}\) and \(\mu _x^{\delta _n}\) should assign to each measurable set \(A\subseteq \mathcal {B}_\mathcal {M}(x;\delta _n)\) how much of the associated mass \(\mu _x^{\delta _n}(A)\) is transported to each point of the Poisson process. To make it optimal, we should distribute the mass over those points that are closest to A. This problem is actually related to that of finding a minimal matching between points of a Poisson process and points of a grid on \([0,1]^N\), see [22, 35, 39]. Here, minimal means that the largest distance between a point of the Poisson process and its matched grid point is minimized. The idea for the transport plan is as follows:

  • Place a grid on \([0,1]^N\).

  • Find a minimal matching between the Poisson process and the grid.

  • Given a \(A\subseteq \mathcal {B}_\mathcal {M}(x;\delta _n)\), we take all points of the Poisson process that are matched to grid points inside A and distribute the mass \(\mu _x^{\delta _n}(A)\) equally over those points.

Using known results for minimal matchings, it can then be shown that, under suitable conditions, the Wasserstein metric between \(m_x^\mathcal {M}\) and \(\mu _x^{\delta _n}\) is \(o(\delta _n^3)\).

Finally, we need to extend these results in flat Euclidean space to the ball \(\mathcal {B}_\mathcal {M}(x;Q\delta _n)\) in general \(\mathcal {M}\). For this we use that \(\delta _n\rightarrow 0\) and that small neighborhoods of \(x\in \mathcal {M}\) can be mapped diffeomorphically to the flat N-dimensional tangent space by the exponential map \(\exp _x:T_x\mathcal {M}\rightarrow \mathcal {M}\). We then apply the matching results there and map back. Here we need to tread carefully, since the exponential map does not preserve distances. We thus fix a sufficiently small neighborhood U around the origin of the tangent space at x. Then, for some fixed \(0<\xi < 1\) and large enough n we have

$$\begin{aligned} {\mathcal {B}}_N\biggl (0;\frac{\delta _n}{1+\xi }\biggr )\subseteq \exp ^{-1}\mathcal {B}_\mathcal {M}(x;\delta _n)\subseteq {\mathcal {B}}_N\biggl (0;\frac{\delta _n}{1-\xi }\biggr ), \end{aligned}$$

where \(\mathcal {B}_N(0;\delta )\) is the Euclidean ball of radius \(\delta \). This then yields matching upper and lower bounds on the Wasserstein metric on \(\mathcal {M}\) in terms of the Wasserstein metric on the Euclidean space. All details of this approach are provided in Sect. 5.3. In the end we obtain the following result.

Proposition 3

For any point \(x\in \mathcal {M}\),

$$\begin{aligned} \mathbb {E}\bigl [W_1\bigl (m_x^\mathcal {M},\mu _x^{\delta _n}\bigr )\bigr ]=o(\delta _n^3). \end{aligned}$$

4.5 Proof of the Main Results

We now have all ingredients to prove the main results. We start with Theorem 3, where we bound the expression inside the expectation as a sum of several terms and use the above results and the fact that \(d_G\) is a \(\delta _n\)-good approximation to show that each individual term goes to zero.

Proof of Theorem 3

First, we bound the term inside the expectation as follows:

$$\begin{aligned}&\biggl |\frac{2(N+2)\hspace{0.7222pt}\kappa (x^*,y_n^*;{\mathcal {G}}_n)}{d_G(x^*,y_n^*)^2}-{\text {Ric}}\hspace{0.44434pt}(\textbf{v},\textbf{v})\biggr |\\&\qquad \qquad \le \biggl |\frac{2(N+2)\hspace{0.7222pt}\kappa (x^*,y_n^*; {\mathcal {G}}_n)}{d_G(x^*,y_n^*)^2}-\frac{2(N+2)\hspace{0.7222pt}\kappa (x^*,y_n^*)}{\delta _n^2}\biggr |\\&\qquad \qquad \qquad {}+\biggl |\frac{2(N+2)\hspace{0.7222pt}\kappa (x^*,y_n^*)}{\delta _n^2}-{\text {Ric}}\hspace{0.44434pt}(\textbf{v},\textbf{v})\biggr |. \end{aligned}$$

The last term is deterministic and goes to zero by Theorem 4. For the first term we note that the absolute value of each curvature term can be bounded from above by 2. Now let \(C_n\) denote the event that \(x^*\) and \(y_n^*\) are connected. Since this is implied by good event \(\varOmega _n\), see Lemma 1, it follows that \(C_n^c\subseteq \varOmega _n^c\), where the superscript c denotes the complement of the event. Moreover, on the event \(C_n^c\), \(\kappa (x^*,y_n^*,{\mathcal {G}}_n)=0\) by definition. Finally, since \(d_G\) is a \(\delta _n\)-good approximation it follows that \(\delta _n^2=d_\mathcal {M}(x^*,y_n^*)^2\le d_G(x^*,y_n^*)^2\). Therefore, we have

$$\begin{aligned}{} & {} \mathbb {E}\left[ \biggl |\frac{2(N+2)\hspace{0.7222pt}\kappa (x^*,y_n^*;{\mathcal {G}}_n)}{d_G(x^*,y_n^*)^2}-\frac{2(N+2)\hspace{0.7222pt}\kappa (x^*,y_n^*)}{\delta _n^2}\biggr |\;\Big |\;C_n^c\right] \nonumber \\ {}{} & {} \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \le \frac{4(N+2)}{\delta _n^2}(1-\mathbb {P}\left( \varOmega _n\right) ) \end{aligned}$$

and

$$\begin{aligned}{} & {} \mathbb {E}\left[ \biggl |\frac{2(N+2)\hspace{0.7222pt}\kappa (x^*,y_n^*; {\mathcal {G}}_n)}{d_G(x^*,y_n^*)^2}-\frac{2(N+2) \hspace{0.7222pt}\kappa (x^*,y_n^*)}{\delta _n^2}\biggr |\;\Big |\;C_n\cap \varOmega _n^c\right] \nonumber \\{} & {} \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \le \frac{8(N+2)}{\delta _n^2}(1-\mathbb {P}\left( \varOmega _n\right) ). \end{aligned}$$

It then follows that

$$\begin{aligned}&\mathbb {E}\left[ \biggl |\frac{2(N + 2)\hspace{0.7222pt}\kappa (x^*,y_n^*;{\mathcal {G}}_n)}{d_G(x^*,y_n^*)^2}-\frac{2(N+2)\hspace{0.7222pt}\kappa (x^*,y_n^*)}{\delta _n^2}\biggr |\right] \\&\quad \le 2(N+2)\,\mathbb {E}\left[ \biggl |\frac{\kappa (x^*,y_n^*; {\mathcal {G}}_n)}{d_G(x^*,y_n^*)^2}-\frac{\kappa (x^*,y_n^*)}{\delta _n^2}\biggr |\;\Big |\;\varOmega _n\right] +(1-\mathbb {P}\left( \varOmega _n\right) )\frac{12\hspace{0.36667pt}(2+N)}{\delta _n^2}. \end{aligned}$$

By construction of the good event we have \(1-\mathbb {P}\left( \varOmega _n\right) =o(\delta _n^3)\) and thus, the last term in the above bound goes to zero. For the other term we recall that

$$\begin{aligned} \kappa (x^*,y_n^*;{\mathcal {G}}_n)=1-\frac{W_1^G\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr )}{d_G(x^*,y_n^*)}\ \ \quad \text {and}\ \ \quad \kappa (x^*,y_n^*)=1-\frac{W_1 \bigl (\mu _{x^*}^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )}{\delta _n}. \end{aligned}$$

Then the expression inside the conditional expectation can be bounded as follows:

$$\begin{aligned}&\biggl |\frac{\kappa (x^*,y_n^*;{\mathcal {G}}_n)}{d_G(x^*,y_n^*)^2} -\frac{\kappa (x^*,y_n^*)}{\delta _n^2}\biggr |\nonumber \\&\qquad \le \biggl |\kappa (x^*,y_n^*;{\mathcal {G}}_n)\biggl (\frac{1}{d_G(x^*,y_n^*)^2} -\frac{1}{\delta _n^2}\biggr )\biggr |+\frac{|\kappa (x^*,y_n^*;{\mathcal {G}}_n)-\kappa (x^*,y_n^*)|}{\delta _n^2}\nonumber \\&\qquad \le 2\frac{|\delta _n^2-d_G(x^*,y_n^*)^2|}{\delta _n^4}+\frac{1}{\delta _n^2} \biggl |\frac{W_1^G\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr )}{d_G(x^*,y_n^*)}-\frac{W_1\bigl (\mu _{x^*}^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )}{\delta _n}\biggr |\nonumber \\&\qquad \le \frac{|\delta _n^2-d_G(x^*,y_n^*)^2|}{\delta _n^4}+\frac{W_1^G\bigl (m_{x^*} ^G,m_{y_n^*}^G\bigr )|\delta _n-d_G(x^*,y_n^*)|}{\delta _n^4} \end{aligned}$$
(11)
$$\begin{aligned}&\qquad \qquad {}+\frac{\bigl |W_1^G\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr )-W_1\bigl (\mu _{x^*}^ {\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )\bigr |}{\delta _n^3}. \end{aligned}$$
(12)

Next, since \(d_G\) is a \(\delta _n\)-good approximation, we can apply (4)

$$\begin{aligned} |\delta _n-d_G(x^*,y_n^*)|=|d_\mathcal {M}(x^*,y_n^*)-d_G(x^*,y_n^*) |\le \delta _n\xi _n^2+\xi _n^3=o(\delta _n^3). \end{aligned}$$

Since \(W_1^G(m_{x^*}^G,m_{y_n^*}^G)\le \delta _n\) it then follows that the second term in (11) goes to zero. For the first term we have

$$\begin{aligned} |\delta _n^2-d_G(x^*,y_n^*)^2|&\le |\delta _n-d_G(x^*,y_n^*)|(\delta _n+d_G(x^*,y_n^*))\\&\le (\delta _n\xi _n^2+\xi _n^3)(\delta _n+\delta _n(1+\xi _n^2)+\xi _n^3)=o(\delta _n^4), \end{aligned}$$

which implies that this term also goes to zero. We are thus left with (12), for which we have to show that

$$\begin{aligned} \mathbb {E}\bigl [\bigl |W_1^G\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr ) -W_1\bigl (\mu _{x^*}^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )\bigr |\,\big |\,\varOmega _n\bigr ]=o(\delta _n^3). \end{aligned}$$

We first replace \(W_1(\mu _{x^*}^{\delta _n},\mu _{y_n^*}^{\delta _n})\) with \({\widetilde{W}}_1(\mu _{x^*}^{\delta _n},\mu _{y_n^*}^{\delta _n})\) by invoking Proposition 1:

$$\begin{aligned} \mathbb {E}\bigl [\bigl |{\widetilde{W}}_1\bigl (\mu _{x^*}^{\delta _n},\mu _{y_n^*} ^{\delta _n}\bigr )-W_1\bigl (\mu _{x^*}^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr ) \bigr |\,\big |\,\varOmega _n\bigr ]=o(\delta _n^3). \end{aligned}$$

This then implies

$$\begin{aligned}&\mathbb {E}\bigl [\bigl |W_1^G\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr )-W_1\bigl (\mu _{x^*} ^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )\bigr |\,\big |\,\varOmega _n\bigr ]\\&\qquad \le \mathbb {E}\bigl [\bigl |{\widetilde{W}}_1\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr ) -{\widetilde{W}}_1\bigl (\mu _{x^*}^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )\bigr |\, \big |\,\varOmega _n\bigr ]+o(\delta _n^3). \end{aligned}$$

To show that the first term in the upper bound is also \(o(\delta _n^3)\) we apply the reverse triangle inequality twice to obtain

$$\begin{aligned} \bigl |{\widetilde{W}}_1\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr )-{\widetilde{W}}_1\bigl (\mu _{x^*} ^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )\bigr |\le {\widetilde{W}}_1\bigl (m_{x^*}^G,\mu _{x^*} ^{\delta _n}\bigr )+{\widetilde{W}}_1\bigl (m_{y_n^*}^G,\mu _{y_n^*}^{\delta _n}\bigr ). \end{aligned}$$

We proceed to show that \(\widetilde{W}_1(m_{x^*}^G,\mu _{x^*}^{\delta _n})=o(\delta _n^3)\) holds in expectation on the event \(\varOmega _n\). The proof for \({\widetilde{W}}_1(m_{y_n^*}^G,\mu _{y_n^*}^{\delta _n})\) is similar. Applying Proposition 1 again we get

$$\begin{aligned} \mathbb {E}\bigl [{\widetilde{W}}_1\bigl (m_{x^*}^G,\mu _{x^*}^{\delta _n}\bigr )\,\big |\, \varOmega _n\bigr ]&\le \mathbb {E}\bigl [W_1\bigl (m_{x^*}^G,\mu _{x^*}^{\delta _n})\,\big |\, \varOmega _n\bigr ]+o(\delta _n^3)\\&\le \mathbb {E}\bigl [W_1\bigl (m_{x^*}^G,m_{x^*}^\mathcal {M}\bigr )\,\big |\,\varOmega _n\bigr ]+\mathbb {E}\bigl [W_1\bigl (m_{x^*}^\mathcal {M},\mu _{x^*}^{\delta _n}\bigr )\bigr ]+o(\delta _n^3). \end{aligned}$$

Since both expectations are \(o(\delta _n^3)\) by, respectively, Propositions 2 and 3, we conclude that

$$\begin{aligned} \mathbb {E}\bigl [\bigl |W_1^G\bigl (m_{x^*}^G,m_{y_n^*}^G\bigr ) -W_1\bigl (\mu _{x^*}^{\delta _n},\mu _{y_n^*}^{\delta _n}\bigr )\bigr |\,\big |\,\varOmega _n\bigr ]=o(\delta _n^3), \end{aligned}$$

which finishes the proof. \(\square \)

Now that we have the general result, Theorems 1 and 2 directly follow from Theorem 3 if we can show that the graph distances that are considered there are \(\delta _n\)-good approximations.

Throughout the remainder of this section we will assume that

$$\begin{aligned} \varepsilon _n=\varTheta \hspace{0.33325pt}((\log n)^an^{-\alpha }),\quad \ \delta _n=\varTheta \hspace{0.33325pt}((\log n)^bn^{-\beta }), \end{aligned}$$

for some \(a,b\in \mathbb {R}\) and \(0\le \alpha ,\beta \le 1\). We shall also assume that \(\varepsilon _n\le \delta _n\). The following results show that for appropriate choices of the constants ab and \(\alpha , \beta \) both the weighted manifold and the rescaled hopcount distance are \(\delta _n\)-good approximations. The proofs are given in Sects. 5.4 and 5.5, respectively.

Proposition 4

Suppose the constants in \(\varepsilon _n\) and \(\delta _n\) satisfy

$$\begin{aligned} 0<\beta \le \alpha ,\ \quad \alpha +2\beta \le \frac{1}{N} \end{aligned}$$

with \(a\le b\) if \(\alpha =\beta \) and \(a+2b>2/N\) if \(\alpha +2\beta =1/N\). Let \(y_n^*\in \mathcal {M}\) be at distance \(\delta _n\) in the direction of \(\textbf{v}\) and \(G_n=\mathbb {G}_n(x^*,y_n^*,\varepsilon _n)\) be rooted random graphs on \(\mathcal {M}\). Then the manifold-weighted graph distance \(d_G^w\) on \(G_n\) is a \(\delta _n\)-good approximation of \(d_\mathcal {M}\).

Proposition 5

Suppose the constants in \(\varepsilon _n\) and \(\delta _n\) satisfy

$$\begin{aligned} 0<\beta \le \frac{1}{9}\ \quad \text {and}\ \quad 3\beta \le \alpha \le \frac{1-3\beta }{2}, \end{aligned}$$

and \(a<3b\) if \(\alpha =3\beta \) and \(2a+3b>1\) if \(\alpha =(1-3\beta )/2\). Let \(y_n^*\in \mathcal {M}\) be at distance \(\delta _n\) in the direction of \(\textbf{v}\). Let \(G_n=\mathbb {G}_n(x^*,y_n^*,\varepsilon _n)\) be rooted random graphs on a 2-dimensional Riemannian manifold \(\mathcal {M}\) and denote by \(d_G^s\) the shortest path distance. Then the \(\varepsilon _n\)-weighted graph distance \(d_G^*:=\varepsilon _nd_G^s\) on \(G_n\) is a \(\delta _n\)-good approximation of \(d_\mathcal {M}\).

Observe that the conditions of the constants in Propositions 4 and 5 are exactly the same as in Theorems 1 and 2, respectively. Moreover, these conditions imply that \(\lambda _n=o(\varepsilon _n)\) and \(\lambda _n=o(\delta _n^3)\), with \(\lambda _n\) as defined in (5), as we will now demonstrate.

In Proposition 4 we have \(\beta >0\) and \(\alpha +2\beta \le 1/N\). It then follows that \(\alpha <1/N\) which implies \(\lambda _n=o(\varepsilon _n)\). When the inequality \(3\beta \le \alpha +2\beta \le 1/N\) is strict we have that \(\lambda _n=o(\delta _n^3)\). When \(3\beta =1/N\) it must be that \(\alpha +2\beta =1/N\) and hence the conditions of Proposition 4 imply that \(3b\ge a+2b>2/N\). From this we deduce that \(\lambda _n/\delta _n^3=\varTheta \hspace{0.33325pt}((\log n)^{2/N-a-2b})=o(1)\).

In Proposition 5, since \(N = 2\), the conditions \(\lambda _n=o(\varepsilon _n)\) and \(\lambda _n=o(\delta _n^3)\) follow if \(\alpha <1/2\) and \(3\beta <1/2\). The first inequality holds since \(\beta >0\) and \(\alpha \le (1-3\beta )/2\), while the second is due to the fact that \(3\beta \le 3/9=1/3\).

We thus conclude that under the conditions in both propositions, the radii satisfy the conditions of Theorem 3. Hence, Theorems 1 and 2 follow from it.

5 Proofs

Here we prove all the intermediate results that we used to prove our main results in the previous section. We start with the proof of Lemma 1 and Proposition 1 in the next Sect. 5.1. In Sect. 5.2 we provide the details for Proposition 2, while the proof of Proposition 3 is given in Sect. 5.3. We end with Sects. 5.4 and 5.5 where we prove Propositions 4 and 5, respectively, leading to the main results of this paper.

Recall that

$$\begin{aligned} \lambda _n=(\log n)^{2/N} n^{-1/N}, \end{aligned}$$

and \(\varepsilon _n\le \delta _n\rightarrow 0\) are such that \(\lambda _n=o(\varepsilon _n)\) and \(\lambda _n=o(\delta _n^3)\).

5.1 Extended Graph Distance

Our first goal is to proof Lemma 1. We start by showing that there exists a radius \(r_n\rightarrow 0\) such that for any finite set of points \(u\in \mathcal {M}\), the balls \(\mathcal {B}_\mathcal {M}(u;r_n)\) will still each contain at least one node from the rooted graphs \(G_n={\mathbb {G}}_n(x^*,y_n^*,\varepsilon _n)\). The reason why we need \(r_n\) to decrease is because the connection radius \(\varepsilon _n\) also decreases and we want the ball \(\mathcal {B}_\mathcal {M}(u;r_n)\) to be contained inside the connection area of the point u.

Lemma 2

Let \(U\subset \mathcal {M}\) be a finite set of points in \(\mathcal {M}\) such that \(|U|=O(n^c)\), for some \(c>0\), and let \(r_n=\varTheta (\lambda _n)\). Then, for \(G_n={\mathbb {G}}_n(\varepsilon _n)\),

$$\begin{aligned} \mathbb {P}\left( \,\bigcup _{u\in U}\{|\mathcal {B}_\mathcal {M}(u;r_n)\cap G_n|=0\}\right) =o(\delta _n^3), \end{aligned}$$

as \(n\rightarrow \infty \).

Proof

First note that for \(r_n\) small enough the ball \(\mathcal {B}_\mathcal {M}(u;r_n)\) can be mapped diffeomorphically onto the tangent space \(T_u\mathcal {M}\) at u. In particular, for small enough \(r_n\) we have that, as \(n\rightarrow \infty \), \(\textrm{vol}_\mathcal {M}(\mathcal {B}_\mathcal {M}(u;r_n))=\varTheta (r_n^N)=\varTheta (\lambda _n^N)\). Next, since the nodes in \(G_n\) are placed according to a Poisson process with intensity \(n/{\textrm{vol}_\mathcal {M}(\mathcal {M})}\) it follows that

$$\begin{aligned} \mathbb {P}\left( |\mathcal {B}_\mathcal {M}(u;r_n)\cap G_n|=0\right)= & {} \exp {\biggl (-\frac{n\textrm{vol}_\mathcal {M}(\mathcal {B}_\mathcal {M}(u;\lambda _n))}{\textrm{vol}_\mathcal {M}(\mathcal {M})}\biggr )} =e^{-n\varTheta (\lambda _n^N)}\\= & {} e^{-\varTheta ((\log n)^2)}. \end{aligned}$$

Therefore, by applying the union bound we get

$$\begin{aligned} \mathbb {P}\left( \,\bigcup _{u\in U}\{|\mathcal {B}_\mathcal {M}(u;\varepsilon _n)\cap G|=0\}\right)&\le |U|\min _{u \in U} \mathbb {P}\left( |\mathcal {B}_\mathcal {M}(u;\varepsilon _n)\cap G|=0\right) \\&=e^{-\varTheta ((\log n)^2)+\log {|U|}}\le e^{-\varTheta ((\log n)^2)+c\log n}. \end{aligned}$$

To finish the proof we note that \(e^{-\varTheta ((\log n)^2)+c\log n}=o(\lambda _n)\) which by assumption is \(o(\delta _n^3)\). \(\square \)

With this lemma we obtain the following corollary.

Corollary 2

There exists a collection \(\{B_1,\dots ,B_m\}\) of \(m=\varTheta (\lambda _n^{-N})\) balls of radius \(\lambda _n/4\) that cover \(\mathcal {M}\), such that if we denote by \(c_1,\dots ,c_m\) their centers and define the event

$$\begin{aligned} C_n=\bigcap _{t=1}^m\,\{|\mathcal {B}_\mathcal {M}(c_t;\lambda _n/4)\cap G_n|\ne 0\}. \end{aligned}$$
(13)

Then \(\mathbb {P}\left( C_n\right) =1-o(\delta _n^3)\).

Proof

The collection is constructed using the standard trick of taking a maximal set of disjoint balls of radius \(\lambda _n/8\) in \(\mathcal {M}\). Denote their centers by \(c_1,\dots ,c_m\). Simple volume comparison, and the compactness of \(\mathcal {M}\), gives \(m=O(\lambda _n^{-N})\). By construction, the balls \(B_i=\mathcal {B}_\mathcal {M}(c_i;\lambda _n/4)\) then cover \(\mathcal {M}\), and hence \(m=\varTheta (\lambda _n^{-N})=\varTheta \hspace{0.33325pt}((\log n)^{-2}n)=O(n)\). The result then follows from Lemma 2. \(\square \)

The event \(C_n\) will play a crucial part in defining the good event \(\varOmega _n\). Let \(D_n\) denote the event on which (4) holds. Then we define the good event as

$$\begin{aligned} \varOmega _n=C_n\cap D_n. \end{aligned}$$
(14)

On this event, with sufficiently high probability, \((\mathcal {B}_\mathcal {M}(x^*;Q\delta _n),{\widetilde{d}}_\mathcal {M})\) is a metric space for any constant \(Q>0\) and the extended distance \({\widetilde{d}}_\mathcal {M}\) is a good approximation of the original distance \(d_\mathcal {M}\). Note that we do not need to consider the whole manifold since curvature is a local property.

Lemma 3

Let \(\varOmega _n\) be the event defined in (14) and \(Q>3\) the constant from Definition 4.1. Then on the event \(\varOmega _n\),

  • each pair of points \(u,v\in \mathcal {B}_\mathcal {M}(x^*;Q\delta _n)\) is connected by a path in the extended graph \(G_n(u,v)\), and

  • \((\mathcal {B}_\mathcal {M}(x^*;Q\delta _n),{\widetilde{d}}_\mathcal {M})\) is a metric space.

Fig. 2
figure 2

Depiction of the covering of the geodesic between u and v by the balls \(B_{t_i}\)

Proof

We first prove the first statement. For this, take any \(u,v\in \mathcal {B}_\mathcal {M}(x^*;Q\delta _n)\) and let \(\gamma (u,v)\) denote the geodesic between u and v. This geodesic will be covered by a subsequence \(B_{t_1},\dots ,B_{t_k}\) of the cover of \(\mathcal {M}\), which we rank in order of appearance moving from u to v. Let \(c_{t_1},\dots ,c_{t_k}\) denote the corresponding centers of these balls, see Fig. 2. On the event \(C_n\) each ball contains a vertex \(x_{t_i}\in G_n\) and since

$$\begin{aligned} d_\mathcal {M}(u,x_{t_1}),d_\mathcal {M}(v,x_{t_k})\le 2\frac{\lambda _n}{4}=\frac{\lambda _n}{2}, \end{aligned}$$

the edges \((u, x_{t_1})\) and \((v,x_{t_k})\) are present in \(G_n(u,v)\). Moreover, since \(d_\mathcal {M}(x_{t_i},x_{t_{i+1}})\) is bounded by four times the radius of the balls, it follows that for large enough n, \(d_\mathcal {M}(x_{t_i},x_{t_{i+1}})\le \lambda _n=o(\varepsilon _n)\) and thus, for n large enough, \(\{x_{t_1},\dots ,x_{t_k}\}\) is a path in \(G_n\). We thus conclude that u and v are connected in \(G_n(u,v)\). Note that because of this property, on the event \(\varOmega _n\), the extended manifold distance between \({\widetilde{d}}_\mathcal {M}\) is well defined on \(\mathcal {M}\).

We are left to show that on the event \(\varOmega _n\), the extended manifold distance is a true distance. Note that the only non-trivial part is the triangle inequality. Let \(u,v,z\in \mathcal {B}_\mathcal {M}(x^*;Q\delta _n)\) and consider the graphs \(G^{(1)}=G_n(u,v)\) and \(G^{(2)}=G_n(u,v,z)\). Now observe that the triangle inequality can only be violated if z creates a short-cut, i.e., if the shortest weighted path between u and v in \(G^{(1)}\) is longer than in \(G^{(2)}\). Suppose that this is true, and let \(\pi _1=\{u,\dots ,y_1,z,y_2,\dots ,v\}\) denote this new weighted shortest path in \(G^{(2)}\). Since \(y_1\) and \(y_2\) are connected to z in \(G^{(2)}\) it follows that \(d_\mathcal {M}(z,y_i)\le \lambda _n/2\). However, by the triangle inequality for \(d_\mathcal {M}\), this implies that \(d_\mathcal {M}(y_1,y_2)\le \lambda _n=o(\varepsilon _n)\) and hence, for sufficiently large n, the edge \((y_1, y_2)\) is present in \(G_n\) and thus also in \(G^{(1)}\) and \(G^{(2)}\).

Let \({\hat{\pi }}=\{y_1:=x_0,x_1,\dots ,x_{m-1},y_2:=x_m\}\) denote the shortest weighted path in \(G_n\) between \(y_1\) and \(y_2\), i.e., \(d_G(y_1,y_2)=\sum _{t=1}^mw_{x_{t-1}x_t}\), and take \(\pi _2=\{u,\dots ,y_1,x_1,\dots ,x_{m-1},y_2,\dots ,v\}\). Then \(\pi _2\) is a path between u and v that excludes z. See also Fig. 3. We will show that the total weight of this path is at most that of \(\pi _1\).

For simplicity lets us denote by \(\Vert \pi \Vert \) the total weight of a path \(\pi \). Since \(d_G\) is a \(\delta _n\)-good approximation,

$$\begin{aligned} \Vert {\hat{\pi }}\Vert :=\sum _{t=1}^mw_{x_{t-1}x_{t}}=d_G(y_1,y_2)\le d_{\mathcal {M}}(y_1,y_2)(1+\xi _n^2)+\xi _n^3 \end{aligned}$$

holds on the event \(\varOmega _n\). Applying the triangle inequality for \(d_\mathcal {M}\) we get

$$\begin{aligned} \Vert {\hat{\pi }}\Vert&\le d_\mathcal {M}(y_1,z)(1+\xi _n^2) + d_\mathcal {M}(z,y_2)(1+\xi _n^2)+\xi _n^3\\&\le d_\mathcal {M}(y_1,z)(1+\xi _n^2)+d_\mathcal {M}(z,y_2)(1+\xi _n^2)+2\xi _n^3=w_{y_1z}+w_{y_2z}. \end{aligned}$$

This implies that the total weight of the path \(\pi _2\) is at most that of \(\pi _1\) from which we conclude that z cannot create a short-cut and hence \({\widetilde{d}}_\mathcal {M}\) satisfies the triangle inequality. \(\square \)

Fig. 3
figure 3

Abstract depiction of the weighted shortest path between u and v created by adding z and the path \(\pi _2\), given in blue

We are now ready to prove Lemma 1.

Proof of Lemma 1

Note that for any two nodes \(u,v\in G_n\) with \(u,v\in \mathcal {B}_\mathcal {M}(x^*;Q\delta _n)\), Lemma 3 implies that u and v are connected by a path in \(G_n\). Hence the only part of Lemma 1 to prove is property (\(\varOmega \)1) there.

Take any \(u,v\in \mathcal {B}_\mathcal {M}(x^*;3\delta _n)\). Then on the event \(\varOmega _n\), by definition of the extended distance \({\widetilde{d}}_\mathcal {M}\), there exists \(x_u,x_v\in G_n\) such that \(d_\mathcal {M}(u,x_u)\le \lambda _n/2\), \(d_\mathcal {M}(v,x_v)\le \lambda _n/2\), and

$$\begin{aligned} {\widetilde{d}}_\mathcal {M}(u,v)&=d_\mathcal {M}(u,x_u)(1+\xi _n^2)+d_\mathcal {M}(v,x_v)(1+\xi _n^2)+2\xi _n^3+d_G(x_u,x_v)\nonumber \\&\le \lambda _n(1+\xi _n^2)+2\xi _n^3+d_G(x_u,x_v). \end{aligned}$$
(15)

Moreover, since \(Q>3\) and \(\lambda _n=o(\delta _n^3)\) we can assume that \(x_u,x_v\in \mathcal {B}_\mathcal {M}(x^*;Q\delta _n)\), for sufficiently large n. Since the approximation (4) holds on the event \(\varOmega _n\), we have

$$\begin{aligned} |d_G(x_u,x_v)-d_\mathcal {M}(u,v)|&\le |d_G(x_u,x_v)-d_\mathcal {M}(x_u,x_v)|+|d_\mathcal {M}(x_u,x_v)-d_\mathcal {M}(u,v)|\nonumber \\&\le |d_G(x_u,x_v)-d_\mathcal {M}(x_u,x_v)|+d_\mathcal {M}(x_u,u)+d_\mathcal {M}(x_v,v)\nonumber \\&\le d_\mathcal {M}(x_u,x_v)\xi _n^2+\xi _n^3+d_\mathcal {M}(x_u,u)+d_\mathcal {M}(x_v,v)\nonumber \\&\le d_\mathcal {M}(x_u,x_v)\xi _n^2+\xi _n^3+\lambda _n. \end{aligned}$$
(16)

Combining (15) and (16) we get

$$\begin{aligned} |\widetilde{d}_\mathcal {M}(u,v) - d_\mathcal {M}(u,v)|&\le |{\widetilde{d}}_\mathcal {M}(u,v)-d_G(x_u,x_v)|+|d_G(x_u,x_v)-d_\mathcal {M}(u,v)|\\&\le \lambda _n(1+\xi _n^2)+2\xi _n^3+|d_G(x_u,x_v)-d_\mathcal {M}(u,v)|\\&\le \lambda _n(1+\xi _n^2)+d_\mathcal {M}(x_u,x_v)\xi _n^2+3\xi _n^3+\lambda _n. \end{aligned}$$

Applying the triangle inequality to the last distance,

$$\begin{aligned} d_\mathcal {M}(x_u,x_v)\le d_\mathcal {M}(u,v)+d_\mathcal {M}(u,x_u)+d_\mathcal {M}(v,x_v)\le d_\mathcal {M}(u,v)+\lambda _n, \end{aligned}$$

we get

$$\begin{aligned} |{\widetilde{d}}_\mathcal {M}(u,v)-d_\mathcal {M}(u,v)|\le d_\mathcal {M}(u,v) \xi _n^2+2\lambda _n(1+\xi _n^2)+3\xi _n^3=o(\delta _n^3). \square \end{aligned}$$

Finally, we need to prove Proposition 1. Since, on the event \(\varOmega _n\),

$$\begin{aligned} |{\widetilde{d}}_\mathcal {M}(u,v)-d_\mathcal {M}(u,v)|\le o(\delta _n^3), \end{aligned}$$

the proof follows immediately from the following elementary result on Wasserstein metrics.

Lemma 4

Let \((\mathcal {X},d)\) and \((\mathcal {X},{\widetilde{d}})\) be two metric spaces such that

$$\begin{aligned} |d(x,y)-{\widetilde{d}}(x,y)|\le K \end{aligned}$$

holds for all \(x,y\in \mathcal {X}\) and some \(K>0\). Denote by \(W_1\) and \({\widetilde{W}}_1\) the Wasserstein metric associated with d and \({\widetilde{d}}\), respectively. Then for any two probability measures \(\mu _1\) and \(\mu _2\) on \(\mathcal {X}\),

$$\begin{aligned} |{\widetilde{W}}_1(\mu _1,\mu _2)-W_1(\mu _1,\mu _2)|\le K. \end{aligned}$$

Proof

For any coupling \(\mu \) between \(\mu _1\) and \(\mu _2\),

$$\begin{aligned} \int {\widetilde{d}}(x,y)\,\textrm{d}\mu (x,y)\le \int (d(x,y)+K)\,\textrm{d}\mu (u,v)\le \int d(x,y)\,\textrm{d}\mu (x,y)+K \end{aligned}$$

and similarly

$$\begin{aligned} \int d(x,y)\,\textrm{d}\mu (x,y)\ge \int {\widetilde{d}}(x,y)\,\textrm{d}\mu (x,y)+K \end{aligned}$$

Then it follows that

$$\begin{aligned}{} & {} {\widetilde{W}}_1(\mu _1,\mu _2) \nonumber \\{} & {} \quad =\inf _\mu \int {\widetilde{d}}(x,y)\,\textrm{d}\mu (x,y)\le \inf _\mu \int d(x,y)\,\textrm{d}\mu (x,y)+K=W_1(\mu _1,\mu _2)+K \end{aligned}$$

and similarly

$$\begin{aligned} W_1(\mu _1,\mu _2)\le {\widetilde{W}}_1(\mu _1,\mu _2)+K, \end{aligned}$$

from which the result follows. \(\square \)

5.2 Probability Measures on Graphs

In this section we give the proof of Proposition 2. Recall that \(m_x^G\) and \(m_x^\mathcal {M}\) denote the uniform probability measures on the set of nodes in \(\mathcal {B}_G(x;\delta _n)\) and \(\mathcal {B}_\mathcal {M}(x;\delta _n)\), respectively. The goal is then to show that

$$\begin{aligned} \mathbb {E}\bigl [W_1(m_x^G,m_x^\mathcal {M})\bigr ]=o(\delta _n^3). \end{aligned}$$

As we mentioned, these two sets are not necessarily contained in each other. Hence, to bound the Wasserstein metric we will work with slightly smaller and larger balls \(B^-\) and \(B^+\) such that

$$\begin{aligned} B^-\cap G_n\subseteq \mathcal {B}_G(x;\delta _n),\mathcal {B}_\mathcal {M}(x;\delta _n)\cap G_n\subseteq B^+\cap G_n. \end{aligned}$$

We can then obtain an upper bound by comparing the Wasserstein metric between \(m_x^G\), \(m_x^\mathcal {M}\), and the uniform probability measure on \(B^+\cap G_n\). This bound can be made \(o(\delta _n^3)\), by carefully selecting the radii of \(B^-\) and \(B^+\).

Before we give the details, we need the following general result concerning Poisson random variables.

Lemma 5

Let \(\alpha _n,\beta _n\rightarrow \infty \) and \(X_n,Y_n\) be two independent Poisson random variables with means \(\alpha _n\) and \(\beta _n\), respectively. Then

$$\begin{aligned} \mathbb {E}\left[ \frac{X_n}{X_n+Y_n}\,\Big |\,X_n+Y_n\ge 1\right] =O\biggl (\frac{\alpha _n}{\alpha _n+\beta _n}\biggr ). \end{aligned}$$

Proof

First, let \(C>\sqrt{2}\) be some large fixed constant. Then we have that (c.f. [32, Lemma 2.1])

$$\begin{aligned} \mathbb {P}\bigl (|X_n-\alpha _n|>C\sqrt{\alpha _n\log \alpha _n}\bigr )=O\bigl (\alpha _n^{-C^2/2}\bigr ). \end{aligned}$$

In particular, if we define \(\alpha _n^\pm =\alpha _n\pm C\sqrt{\alpha _n\log \alpha _n}\), then

$$\begin{aligned} \max {\{\mathbb {P}(X_n<\alpha _n^-),\mathbb {P}(X_n>\alpha _n^+)\}}=O\bigl (\alpha _n^{-C^2/2}\bigr ). \end{aligned}$$

Similar results hold for \(Y_n\) with \(\beta _n^\pm \) defined similarly. We start by conditioning on \(X_n\):

$$\begin{aligned}&\mathbb {E}\left[ \frac{X_n}{X_n+Y_n}\,\Big |\,X_n+Y_n\ge 1\right] =\sum _{k=0}\,\mathbb {E}\left[ \frac{k}{k+Y_n}\,\Big |\,Y_n\ge 1\right] \cdot \mathbb {P}\left( X_n=k\right) \\&\qquad \ \ =\sum _{k<\alpha _n^-}\mathbb {E}\left[ \frac{k}{k+Y_n}\,\Big |\,Y_n\ge 1\right] \cdot \mathbb {P}\left( X_n=k\right) \\&\qquad \qquad +\sum _{k\ge \alpha _n^-}\mathbb {E}\left[ \frac{k}{k+Y_n}\,\Big |\,Y_n\ge 1\right] \cdot \mathbb {P}\left( X_n=k\right) :=I_n^{(1)}+I_n^{(2)}. \end{aligned}$$

We will bound each term separately.

First we bound the expectation inside each summation by further conditioning on \(Y_n\):

$$\begin{aligned} \mathbb {E}\left[ \frac{k}{k+Y_n}\,\Big |\,Y_n\ge 1\right]&\le \frac{k}{k+1}\mathbb {P}(1\le Y_n<\beta _n^-)\\&\qquad +\sum _{\beta _n^-\le y\le \beta _n^+}\frac{k}{k+y}\mathbb {P}(Y_n=y)+\frac{k}{k+\beta _n^+}\mathbb {P}(Y_n>\beta _n^+)\\&\le k\left( \frac{1}{k+\beta _n^-}+\frac{\mathbb {P}\bigl (|Y_n-\beta _n|>C \sqrt{\beta _n\log \beta _n}\bigr )}{k+1}\right) \\&\le \frac{k}{k+\beta _n^-}\bigl (1+O\bigl (\beta _n^{1-C^2/2}\bigr )\bigr )=\frac{k}{k+\beta _n^-}(1+o(1)), \end{aligned}$$

because \(C>\sqrt{2}\). We can now bound \(I_n^{(1)}\) as follows:

$$\begin{aligned} I_n^{(1)}\le \frac{\alpha _n^-}{\beta ^-}\mathbb {P}(X_n<\alpha _n^-) =O\bigl ((\beta _n^{-})^{-1}\alpha _n^{1-C^2/2}\bigr )=O\bigl (\beta _n^{-1}\alpha _n^{1-C^2/2}\bigr ), \end{aligned}$$

where we used that \(\beta _n^-\sim \beta _n\), i.e., \(\beta _n^-/\beta _n\rightarrow 1\). For \(I_n^{(2)}\) we have, using that \(\alpha _n^-\sim \alpha _n\),

$$\begin{aligned} I_n^{(2)}\le (1+o(1))\sum _{k\ge \alpha _n^-}\frac{k}{k+\beta _n^-}\mathbb {P}\left( X_n=k\right) \le O\biggl (\frac{\mathbb {E}\hspace{0.33325pt}[X_n]}{\alpha _n^-+\beta _n^-}\biggr )=O\biggl (\frac{\alpha _n}{\alpha _n+\beta _n}\biggr ), \end{aligned}$$

and thus the result follows since we are free to select \(C>\sqrt{2}\) large enough so that \(I_n^{(1)}\) is of smaller order. \(\square \)

We are now ready to prove Proposition 2.

Proof of Proposition 2

Let \(\delta _n^\pm =(\delta _n\pm \xi _n^3)/(1\mp \xi _n^2)\) and let \(D_n\) be the event on which approximation (4) of Definition 4.1 holds and recall that \(\varOmega _n\subset D_n\). Therefore, since \(\mathbb {P}\left( \varOmega _n\right) \rightarrow 1\),

$$\begin{aligned} \mathbb {E}\bigl [W_1(m_x^G,m_x^\mathcal {M})\,|\,\varOmega _n\bigr ]\le (1+o(1)) \mathbb {E}\bigl [W_1(m_x^G,m_x^\mathcal {M})\mathbbm {1}_{\{D_n\}}\bigr ], \end{aligned}$$

and so it suffices to look at \(\mathbb {E}\bigl [W_1(m_x^G,m_x^\mathcal {M})\mathbbm {1}_{\{D_n\}}\bigr ]\).

Note that on the event \(D_n\),

$$\begin{aligned} \mathcal {B}_\mathcal {M}(x;\delta _n^-)\cap G_n\subseteq \mathcal {B}_G(x;\delta _n),\mathcal {B}_\mathcal {M}(x;\delta _n)\cap G_n\subseteq \mathcal {B}_\mathcal {M}(x;\delta _n^+)\cap G_n. \end{aligned}$$

Let \(V_n\subseteq \mathcal {M}\) be any neighborhood of x such that \(\textrm{vol}_\mathcal {M}(\mathcal {B}_n)=\varTheta (\delta _n^N)\) and

$$\begin{aligned} \mathcal {B}_\mathcal {M}(x;\delta _n^-)\subseteq \mathcal {B}_n\subseteq \mathcal {B}_\mathcal {M}(x;\delta _n^+)\cap G_n, \end{aligned}$$

where \(\mathcal {B}_n=V_n\cap G_n\). Denote by \(m_n\) the uniform probability measure on \(\mathcal {B}_n\). We will prove that

$$\begin{aligned} \mathbb {E}\bigl [W_1(m_n,m_x^+)\hspace{0.05554pt}\mathbbm {1}_{\{D_n\}}\bigr ]=o(\delta _n^3). \end{aligned}$$
(17)

Since

$$\begin{aligned} \mathbb {E}\bigl [W_1(m_x^G,m_x^\mathcal {M})\mathbbm {1}_{\{D_n\}}\bigr ]\le \mathbb {E} \bigl [W_1(m_x^G,m_x^+)\mathbbm {1}_{\{D_n\}}\bigr ]+\mathbb {E}\bigl [W_1(m_x^\mathcal {M},m_x^+) \mathbbm {1}_{\{D_n\}}\bigr ], \end{aligned}$$

applying (17) twice, once with \(\mathcal {B}_n=\mathcal {B}_G(x;\delta _n)\) and once with \(\mathcal {B}_n=\mathcal {B}_\mathcal {M}(x;\delta _n)\cap G_n\), will yield the required result.

Let us write \(\mathcal {B}_n^\pm :=\mathcal {B}_\mathcal {M}(x;\delta _n^\pm )\cap G_n\) and denote by \(m_x^\pm \) the uniform probability measure on \(\mathcal {B}_n^\pm \). To establish (17) we will show that

$$\begin{aligned} \mathbb {E}\bigl [W_1(m_n,m_x^+)\mathbbm {1}_{\{D_n\}}\bigr ] =O\biggl (\frac{(\delta _n^+)^N-(\delta _n^-)^N}{(\delta _n^+)^{N-1}}\biggr ). \end{aligned}$$
(18)

Note that by definition of \(\delta _n^\pm \) we have \((\delta _n^+)^N-(\delta _n^-)^N=O(\xi _n^2\delta _n^N)\). Therefore, if (18) holds,

$$\begin{aligned} \mathbb {E}\bigl [W_1(m_n, m_x^+)\mathbbm {1}_{\{D_n\}}\bigr ]\le O\biggl (\frac{(\delta _n^+)^N -(\delta _n^-)^N}{\delta _n^{N-1}}\biggr )=O(\delta _n\xi _n^2)=o(\delta _n^3), \end{aligned}$$

since \(\xi _n=o(\delta _n)\). To establish (18) we condition on \(|\mathcal {B}_n^-|\):

$$\begin{aligned} \mathbb {E}\bigl [W_1(m_n,m_x^+)\mathbbm {1}_{\{D_n\}}\bigr ]&=\mathbb {E}\bigl [W_1(m_n,m_x^+)\mathbbm {1}_{\{D_n\}}\,|\,|\mathcal {B}_n^-|=0\bigr ] \cdot \mathbb {P}(|\mathcal {B}_n^-|=0)\\&\qquad +\mathbb {E}\bigl [W_1(m_n,m_x^+)\mathbbm {1}_{\{D_n\}}\,|\,|\mathcal {B}_n^ -|\ge 1\bigr ]\cdot \mathbb {P}(|\mathcal {B}_n^-|\ge 1). \end{aligned}$$

For the first term we have

$$\begin{aligned}&\mathbb {E}\bigl [W_1(m_n,m_x^+)\mathbbm {1}_{\{D_n\}}\,|\,|\mathcal {B}_n^-|=0\bigr ] \cdot \mathbb {P}(|\mathcal {B}_n^-|=0)\le 2\delta _n^+\mathbb {P}(|\mathcal {B}_n^-|=0)\\&\qquad =O(\delta _n^-) e^{-n\varTheta ((\delta _n^-)^N)}=O\biggl (\frac{(\delta _n^+)^N-(\delta _n^-)^N}{(\delta _n^+)^{N-1}}\biggr ), \end{aligned}$$

where we used that \(\mathbb {E}\hspace{0.33325pt}[|\mathcal {B}_n^-|]=n{\text {vol}}_\mathcal {M}(\mathcal {B}_n^-)=n\varTheta ((\delta _n^-)^N)\). It now suffices to show that

$$\begin{aligned} \mathbb {E}\bigl [W_1(m_n,m_x^+)\mathbbm {1}_{\{D_n\}}\,|\,|\mathcal {B}_n^-|\ge 1\bigr ] =O\biggl (\frac{(\delta _n^+)^N-(\delta _n^-)^N}{(\delta _n^+)^{N-1}}\biggr ). \end{aligned}$$
(19)

We will do this by constructing a specific transport plan (coupling) between the measures \(m_n\) and \(m_x^+\). Define the joint probability mass function on \({\mathcal {B}_n\times \mathcal {B}_n^+}\):

$$\begin{aligned} m(u,v)={\left\{ \begin{array}{ll}\displaystyle \frac{1}{|\mathcal {B}_n^+|}&{}\text {if }u=v,\\ \displaystyle \frac{1}{|\mathcal {B}_n|\cdot |\mathcal {B}_n^+|}&{}\text {if }v\in \mathcal {B}_n^+\setminus \mathcal {B}_n,\end{array}\right. } \end{aligned}$$

and observe that m(uv) is a coupling between \(m_x^G\) and \(m_x^+\). Therefore

$$\begin{aligned} W_1(m_x^G,m_x^+)&\le \sum _{u\in \mathcal {B}_n}\,\sum _{v\in \mathcal {B}_n^+}d_\mathcal {M}(u,v) m(u,v)=\sum _{u\in \mathcal {B}_n}\,\sum _{v\in \mathcal {B}_n^+\setminus \mathcal {B}_n}\!\!\frac{d_\mathcal {M}(u,v)}{|\mathcal {B}_n|\cdot |\mathcal {B}_n^+|}\\&\le 2\delta _n^+\frac{|\mathcal {B}_n^+|-|\mathcal {B}_n|}{|\mathcal {B}_n^+|}\le 2\delta _n^+\frac{|\mathcal {B}_n^+|-| \mathcal {B}_n^-|}{|\mathcal {B}_n^+|}=2\delta _n^+\frac{|\mathcal {B}_n^+\setminus \mathcal {B}_n^-|}{|\mathcal {B}_n^+|}. \end{aligned}$$

Now define \(X_n=|\mathcal {B}_n^+\setminus \mathcal {B}_n^-|\) and \(Y_n=|\mathcal {B}_n^-|\). Then \(X_n\) and \(Y_n\) are independent Poisson random variables satisfying

$$\begin{aligned} \frac{|\mathcal {B}_n^+\setminus \mathcal {B}_n^-|}{|\mathcal {B}_n^+|}=\frac{X_n}{X_n+Y_n}. \end{aligned}$$

It then follows from Lemma 5 that

$$\begin{aligned} \mathbb {E}\bigl [W_1(m_x,m_x^+)\,|\,|\mathcal {B}_n^-|\ge 1\bigr ]\le O \biggl (\frac{\delta _n^+\mathbb {E}\hspace{0.33325pt}[X_n]}{\mathbb {E}\hspace{0.33325pt}[X_n]+\mathbb {E}\hspace{0.33325pt}[Y_n]}\biggr )=O\biggl (\frac{\delta _n^+{\text {vol}}_\mathcal {M}(\mathcal {B}_n^+\setminus \mathcal {B}_n)}{\textrm{vol}_\mathcal {M}(\mathcal {B}_n^+)}\biggr ). \end{aligned}$$

Equation (19) then follows by noting that \(\textrm{vol}_\mathcal {M}(\mathcal {B}_n^+\setminus \mathcal {B}_n)=\varTheta \hspace{0.33325pt}((\delta _n^+)^N-(\delta _n^-)^N)\). \(\square \)

5.3 Continuous and Discrete Measures on \(\mathcal {M}\)

5.3.1 Collecting Relevant Known Results

The following is a summary of results on the Wasserstein metric between empirical and uniform measures on the N-dimensional cube. The case \(N=2\) was explicitly stated in [39]. Although the results for \(N\ge 3\) are known, they are not stated in the explicit form we need. For completeness we thus include a proof here.

Proposition 6

Let \(X_1,X_2,\dots \) be independent uniformly distributed random variables on \([0,1]^N\), let \(m_n\) denote the empirical measure

$$\begin{aligned} m_n(y)=\frac{1}{n}\sum _{i=1}^n\mathbbm {1}_{\{X_i=y\}}, \end{aligned}$$

and \(\mu \) the uniform measure on \([0,1]^N\). Then

$$\begin{aligned} \mathbb {E}\bigl [W_1^N(m_n,\mu )\bigr ]={\left\{ \begin{array}{ll}O\left( \!\sqrt{\displaystyle \frac{\log n}{n}}\right) &{}\text {if }N=2,\\ O(n^{-1/N})&{}\text {if }N\ge 3.\end{array}\right. } \end{aligned}$$

Proof

The result for \(N=2\) follows from [39, (1.1)], see also the results in [22, 35]. For \(N\ge 3\) we let \(Y_1,Y_2,\dots \) be independent uniformly distributed random variables on \([0,1]^N\) and define

$$\begin{aligned} M_n:=\inf _\sigma \sum _{i=1}^n\Vert X_i-Y_{\sigma (i)}\Vert , \end{aligned}$$

where the infimum is taken over all permutations \(\sigma \) of \(\{1,2,\dots ,n\}\). Then, it follows from [38, Lemma 1] that

$$\begin{aligned} M_n=\sup _{f\in Lip _1}\left| \,\sum _{i=1}^n(f(X_i)-f(Y_i))\right| , \end{aligned}$$

where \(Lip _1\) now denotes the set of Lipschitz continuous functions with constant 1, with respect to the Euclidean distance \(d_N\).

Next, we recall the duality formula for the Wasserstein metric on the space \(\mathcal {X}\),

$$\begin{aligned} W_1(\mu _1,\mu _2)=\sup _{f\in Lip _1}\left\{ \int _\mathcal {X}f(x)\,\textrm{d}\mu _1(x)-\int _\mathcal {X}f(y)\,\textrm{d}\mu _2(y)\right\} . \end{aligned}$$

Since

$$\begin{aligned} \int _{[0,1]^N}\!f(z)\,\textrm{d}\mu (z)=\mathbb {E}\hspace{0.33325pt}[f(Y_i)], \end{aligned}$$

we have

$$\begin{aligned} W_1^N(m_n,\mu )&=\sup _{f\in Lip _1}\left| \,\frac{1}{n}\sum _{i=1}^n\left( f(X_i)-\int _{[0,1]^N}f(z)\, \textrm{d}\mu (z)\right) \right| \\&=\frac{1}{n}\sup _{f\in Lip _1}\left| \,\sum _{i=1}^n\,(f(X_i)-\mathbb {E}\hspace{0.33325pt}[f(Y_i)])\right| \\&\le \frac{1}{n}\,\mathbb {E}\left[ \sup _{f\in Lip _1}\left| \,\sum _{i=1}^n\,(f(X_i)-f(Y_i))\right| \;\bigg |\;X_1,\dots ,X_n\right] \\&=\frac{\mathbb {E}[M_n\,|\,X_1,\dots ,X_n]}{n}, \end{aligned}$$

and hence

$$\begin{aligned} \mathbb {E}\bigl [W_1^N(m_X,\mu )\bigr ]\le \frac{\mathbb {E}\hspace{0.33325pt}[M_n]}{n}. \end{aligned}$$

Finally [38, Thm. 1] implies for \(N\ge 3\),

$$\begin{aligned} \mathbb {E}\hspace{0.33325pt}[M_n]=O(n^{1-1/N}), \end{aligned}$$

which then yields

$$\begin{aligned} \mathbb {E}\bigl [W_1^N(m_n,\mu )\bigr ]=O(n^{-1/N}). \square \end{aligned}$$

5.3.2 Uniform and Discrete Measures on the Unit Cube

We first extend Proposition 6 to the case where the points correspond to a Poisson process. We will actually proof a slightly more general version which allows for intensities \((1+o(1)) n\).

Lemma 6

Consider the N-dimensional unit cube \([0,1]^N\), with \(N\ge 2\), and consider a Poisson process \(\mathcal {P}\) with intensity measure \((1+f_n) n\,\textrm{d}\textrm{vol}_N\) on \([0,1]^N\), for some sequence \(f_n\rightarrow 0\). Let \(m^N_\mathcal {P}\) denote the empirical random measure with respect to \(\mathcal {P}\), i.e.,

$$\begin{aligned} m_\mathcal {P}^N(y)=\frac{1}{|\mathcal {P}|}\sum _{p\in \mathcal {P}}\mathbbm {1}_{\{p=y\}}, \end{aligned}$$

and \(\mu ^N\) the uniform measure on the square. Then, as \(n\rightarrow \infty \),

$$\begin{aligned} \mathbb {E}\bigl [W_1^N(m^N_\mathcal {P},\mu ^N)\bigr ]=O(n^{-1/N}\log n). \end{aligned}$$

Proof

We shall establish the result by conditioning on the size \(|\mathcal {P}|\) which has a Poisson distribution with mean \((1+f_n) n\). Conditioned on \(|\mathcal {P}|=k\), each point is uniformly distributed and therefore it follows from Proposition 6 that as \(k_n\rightarrow \infty \)

$$\begin{aligned} \mathbb {E}\bigl [W_1(m^N_\mathcal {P},\mu ^N)\,|\,|\mathcal {P}|=k_n\bigr ]= {\left\{ \begin{array}{ll}\displaystyle O\left( \!\sqrt{\frac{\log k_n}{k_n}}\right) &{}\text {if }N=2\\ O\bigl (k_n^{-1/N}\bigr )&{}\text {if }N\ge 3 \end{array}\right. }=\;O\bigl (k_n^{-1/N}\sqrt{\log k_n}\bigr ). \nonumber \\ \end{aligned}$$
(20)

Recall the Chernoff concentration result [32, Lemma 1.2] for a Poisson random variable \(\textrm{Po}(a)\) with mean a:

$$\begin{aligned} \mathbb {P}(|\textrm{Po}(a)-a|>x)\le 2e^{-x^2/(2(a+x))}. \end{aligned}$$
(21)

Fix a \(c > 0\). Then by (21) with \(a=(1+f_n) n\) and \(x=c\sqrt{(1+f_n) n\log n}\),

$$\begin{aligned}&\mathbb {P}\bigl (|\textrm{Po}((1+f_n) n)-(1+f_n) n|>c \sqrt{(1+f_n) n\log n}\bigr )\\&\qquad \le 2\exp {\frac{-c^2(1+f_n) n\log n}{2\bigl ((1+f_n) n+c\sqrt{n\log n}\bigr )}}=O\bigl (e^{-(c^2\log n)/2}\bigr )=O\bigl (n^{-c^2/2}\bigr ). \end{aligned}$$

Therefore, if we define

$$\begin{aligned} a_n^\pm =(1+f_n) n\pm c\sqrt{(1+f_n) n\log n}, \end{aligned}$$

it follows that

$$\begin{aligned}&\mathbb {P}(\textrm{Po}((1+f_n) n)<a_n^-)\\&\qquad =\mathbb {P}\bigl ((1+f_n) n-\textrm{Po}((1+f_n) n)>c \sqrt{(1+f_n) n\log n}\bigr )\\&\qquad \le \mathbb {P}\bigl (|\textrm{Po} ((1+f_n) n)-(1+f_n) n|>c\sqrt{(1+f_n) n\log n}\bigr )=O\bigl (n^{-c^2/2}\bigr ), \end{aligned}$$

and similarly

$$\begin{aligned} \mathbb {P}(\textrm{Po}((1+f_n) n)\ge a_n^+)=O\bigl (n^{-c^2/2}\bigr ). \end{aligned}$$

We shall use this and the upper bound (20) for \(\mathbb {E}\bigl [W_1^N(m^N_\mathcal {P},\mu ^N)\,|\,|P|=k_n\bigr ]\) to compute an upper bound for \(\mathbb {E}\bigl [W_1^N(m^N_\mathcal {P},\mu ^N)\bigr ]\) as follows:

$$\begin{aligned} \mathbb {E}\bigl [W_1^N(m^N_\mathcal {P},\mu ^N)\bigr ]&=\sum _{k=0}^{a_n^--1} \mathbb {E}\bigl [W_1(m^N_\mathcal {P},\mu ^N)\,|\,|P|=k\bigr ]\cdot \mathbb {P} (\textrm{Po}((1+f_n) n)=k)\\&\qquad {}+\sum _{k=a_n^-}^{a_n^+}\mathbb {E}\bigl [W_1(m^N_\mathcal {P},\mu ^N)\,|\,|P|=k\bigr ] \cdot \mathbb {P}(\textrm{Po}((1+f_n) n)=k)\\&\qquad {}+\sum _{k=a_n^++1}^\infty \mathbb {E}\bigl [W_1(m^N_\mathcal {P},\mu ^N)\,|\,|P|=k\bigr ] \cdot \mathbb {P}(\textrm{Po}((1+f_n) n)=k)\\&:=I_1+I_2+I_3. \end{aligned}$$

Since any two points in \([0,1]^N\) are at most at distance \(\sqrt{N}\), we have for \(I_1\)

$$\begin{aligned} I_1\le \sqrt{N}\sum _{k=0}^{a_n^--1}\mathbb {P}(\textrm{Po}((1+f_n) n)=k)=O(\mathbb {P}(\textrm{Po}((1+f_n) n)<a_n^-))=O\bigl (n^{-c^2/2}\bigr ), \end{aligned}$$

while for \(I_3\) we get, using (20),

$$\begin{aligned} I_3&\le O\Bigl ((a_n^+)^{-1/N}\mathbb {P}(\textrm{Po}((1+f_n) n)>a_n^+)\sqrt{\log a_n^+}\Bigr )\\&=O\Bigl ((a_n^+)^{-1/N}n^{-c^2/2}\sqrt{\log a_n^+}\Bigr )=O\bigl (n^{-c^2/2-1/N}\sqrt{\log n}\bigr ). \end{aligned}$$

The main contribution comes from \(I_2\) for which we use that \(k\mapsto {\mathbb {P}}(\textrm{Po}(Qn)=k)\) is concave on \([a_n^-,a_n^+]\) and attains is maximum at \(k=(1+f_n) n\) to obtain

$$\begin{aligned} I_2&\le O\bigl (n^{-1/N}\sqrt{\log n}\bigr )\,{\mathbb {P}}(\textrm{Po}((1+f_n) n))=n(1+f_n)(a_n^+-a_n^-)\\&\le O\bigl ( n^{-1/N}\sqrt{\log n}\bigr )\frac{2(1+f_n) c \sqrt{(1+f_n) n\log n}}{\sqrt{2\pi }\sqrt{n}}=O(n^{-1/N}\log n), \end{aligned}$$

where we used (20) with \(k_n=(1+f_n) n\) for the first line and Stirling’s approximation for n! for the second line. Since \(c>0\) was arbitrary we conclude that

$$\begin{aligned} \mathbb {E}\bigl [W_1^N(m^N_\mathcal {P},\mu ^N)\bigr ]=O(n^{-1/N}\log n). \end{aligned}$$
(22)

\(\square \)

5.3.3 Uniform and Discrete Measures on the Ball \(\mathcal {B}_\mathcal {M}(x;\delta _n)\)

The following result follows from Lemma 6 by a simple rescaling argument.

Corollary 3

Let \(r_n\rightarrow 0\) and consider a Poisson process \(\mathcal {P}\) with intensity n on the N-dimensional square \([0,2r_n]^N\). Let \(m^N_\mathcal {P}\) denote the empirical measure on the square \([0,2r_n]^N\) with respect to \(\mathcal {P}\), i.e.,

$$\begin{aligned} m^N_\mathcal {P}(y)=\frac{1}{|\mathcal {P}\cap [0,2r_n]^N|}\sum _{p\in \mathcal {P}}\mathbbm {1}_{\{p=y\}}\mathbbm {1}_{\{y\in [0,2\delta _n]^N\}}, \end{aligned}$$

and \(\mu ^N\) the uniform measure on the square \([0,2r_n]^N\). Then

$$\begin{aligned} \mathbb {E}\bigl [W_1^N(m^N_\mathcal {P},\mu ^N)\bigr ]=O(n^{-1/N}\log n). \end{aligned}$$

Proof

Consider the map \(\phi :[0,2 r_n]^N\rightarrow [0,1]^N\) defined by \(\phi (x)=r_n^{-1} x/2\). Then \(\phi (\mathcal {P})\) is a Poisson Point Process on \([0,1]^N\) with intensity measure \(2^Nr_n^Nn\). Now let \({\hat{m}}^N_\mathcal {P}=m^N_\mathcal {P}\circ \phi ^{-1}\) and \({\hat{\mu ^N}}=\mu ^N\circ \phi ^{-1}\) denote, respectively, the empirical measure with respect to \(\phi (\mathcal {P})\) and the uniform measure on \([0,1]^N\). It follows from Lemma 6 that

$$\begin{aligned} \mathbb {E}\bigl [W_1^N({\hat{m}}^N_\mathcal {P},{{\hat{\mu }}}^N)\bigr ]=O \bigl (r_nn^{-1/N}\log (nr_n^N)\bigr )=O\bigl (n^{-1/N}r_n(\log n+N\log r_n)\bigr ). \end{aligned}$$

Since for any \(x,y\in [0,2r_n]^N\) we have \(d_N(\phi (x),\phi (y))=2^{-1}r_n^{-1}d_N(x,y)\) it follows that

$$\begin{aligned} \mathbb {E}\bigl [W_1(m^N_\mathcal {P},\mu ^N)\bigr ]&=2^{-1}r_n^{-1}\hspace{0.7222pt}\mathbb {E} \bigl [W_1({\hat{m}}^N_\mathcal {P},{\hat{\mu ^N}})\bigr ]\\&=O\bigl (n^{-1/N}(\log n+N\log r_n)\bigr )=O(n^{-1/N}\log n), \end{aligned}$$

because \(r_n\rightarrow 0\). \(\square \)

For our analysis we first extend Corollary 3 to N-dimensional balls. For this we note that if \(m_x^N\) and \(\mu _x^N\) denote, respectively, the empirical and uniform measure on the ball \(\mathcal {B}_N(x;\delta _n)\subseteq \mathbb {R}^N\), then

$$\begin{aligned} W_1^N(m_x^N,\mu _x^N)\le W_1^N(m^N,\mu ^N), \end{aligned}$$

where \(m^N\) and \(\mu ^N\) are, respectively, the empirical and uniform measure on a cube \([0,2\delta _n]^N\). It then follows from Corollary 3 that

$$\begin{aligned} \mathbb {E}\bigl [W_1(m_x^N,\mu _x^N)\bigr ]=O(n^{-1/N}\log n)=o(\lambda _n)=o(\delta _n^3). \end{aligned}$$

We thus have the following result:

Proposition 7

Let \(f_n\rightarrow 0\), \(x\in \mathbb {R}^N\), and consider a Poisson process \(\mathcal {P}\) with intensity measure \((1+f_n) n\,\textrm{d}\textrm{vol}_N\) on the N-dimensional ball \(\mathcal {B}_N(x;\delta _n)\). Let \(m_x^N\) denote the empirical measure with respect to \(\mathcal {P}\), i.e.,

$$\begin{aligned} m_x^N(y)=\frac{1}{|\mathcal {P}|}\sum _{p\in \mathcal {P}}\mathbbm {1}_{\{p=y\}}, \end{aligned}$$

and \(\mu _x^N\) the uniform measure on \(\mathcal {B}_N(x;\delta _n)\). Then

$$\begin{aligned} \mathbb {E}\bigl [W_1^N(m_x^N,\mu _x^N)\bigr ]=o(\delta _n^3). \end{aligned}$$

5.3.4 From the Manifold to the Tangent Space and Back

To prove Proposition 3 we have to extend Proposition 7 to the setting of Riemannian manifolds. For this we use that for n large enough, the ball \(\mathcal {B}_\mathcal {M}(x;\delta _n)\) can be mapped diffeomorphically by the exponential map to a slightly larger ball in the tangent space of x. Since the tangent space is diffeomorphic to \(\mathbb {R}^N\) we can use Proposition 7 to obtain the result. However, we have to be careful since the exponential map does not preserve the metric.

Proof of Proposition 3

We shall denote by \(\mathcal {B}_N(x;\delta )\) the ball of radius \(\delta \) around \(x\in {\mathbb {R}}^N\), according to the Euclidean metric. Fix a \(0<\xi <1\) and pick a small enough, but fixed, neighborhood U of the origin in \(T_x\mathcal {M}\) such that: 1) the exponential map restricted to U is a diffeomorphism, 2) there exists a constant \(C>1\) such that \(U\subseteq \mathcal {B}_N(0;C\delta _n)\), and 3) for any two points \(y,z\in \exp (U)\),

$$\begin{aligned} (1-\xi ) d_N\bigl (\exp _x^{-1}y,\exp _x^{-1}z\bigr )\le d_\mathcal {M}(y,z)\le (1+\xi ) d_N\bigl (\exp _x^{-1}y,\exp _x^{-1}z\bigr ). \end{aligned}$$

In particular, this implies that for n large enough,

$$\begin{aligned} {\mathcal {B}}_N\biggl (0;\frac{\delta _n}{1+\xi }\biggr )\subseteq \exp ^{-1}\{\mathcal {B}_\mathcal {M}(x;\delta _n)\}\subseteq {\mathcal {B}}_N\biggl (0;\frac{\delta _n}{1-\xi }\biggr )\subset U. \end{aligned}$$

Next we note that the probability measures \(m_x^\mathcal {M}\) and \(\mu _x^{\delta _n}\) on \(\mathcal {B}_\mathcal {M}(x;\delta _n)\) only depend on the restriction of the Poisson process to this ball. In particular it only depends on the restriction \(\mathcal {P}_U\) of the process to the fixed neighborhood U, which is again a Poisson process with intensity \(n\,\textrm{d}\textrm{vol}_\mathcal {M}/{\textrm{vol}_\mathcal {M}(\mathcal {M})}\). Since \(U\subseteq \mathcal {B}_N(0;C\delta _n)\) it follows that on U, \({\textrm{vol}_\mathcal {M}}\circ {\exp _x}=(1+O(\delta _n^2)){\text {vol}}_N\). Therefore, it follows from the Mapping Theorem for Poisson processes [21] that \(\exp _x^{-1}(P_U)\) is a Poisson process on \(\exp _x^{-1}(U)\) with intensity function \((1+O(\delta _n^2)) n\,\textrm{d}\textrm{vol}_N/{\textrm{vol}_\mathcal {M}(\mathcal {M})}\).

Slightly abusing notation, let \(m_x^N\) and \(\mu _x^N\) denote respectively the empirical and uniform measure on \(\mathcal {B}_N(0;\delta _n/(1-\xi ))\) with respect to the Poisson Point Process \(\exp _x^{-1}(\mathcal {P}_U)\). Then, since \(\delta _n/(1-\xi )=\varTheta (\delta _n)\), Proposition 7 implies that

$$\begin{aligned} \mathbb {E}\bigl [W_1^N(m_x^N,\mu _x^N)\bigr ]=o(\delta _n^3). \end{aligned}$$

On the other hand we have, since \(\exp _x\) is a diffeomorphism on U, that

$$\begin{aligned} \mathbb {E}\bigl [W_1(m_x^\mathcal {M},\mu _x^{\delta _n})\bigr ]\le (1+\xi ) \mathbb {E}\bigl [W_1^N(m_x^N,\mu _x^N)\bigr ], \end{aligned}$$

and hence we conclude that

$$\begin{aligned} \mathbb {E}\bigl [W_1(m_x^\mathcal {M},\mu _x^{\delta _n})\bigr ]=o(\delta _n^3), \end{aligned}$$

which proves Proposition 3. \(\square \)

5.4 Weighted Graph Distances

Recall that \(\lambda _n=n^{-1/N}(\log n)^{2/N}\). To prove Proposition 4 we first show the following.

Lemma 7

Let \(Q>3\), \(U=\mathcal {B}_\mathcal {M}(x^*;Q\delta _n)\), and define the event

$$\begin{aligned} A_n:=\bigcup _{u,v\in U\cap G_n}\biggl \{|d_G^w(u,v)-d_\mathcal {M}(u,v)|>d_\mathcal {M}(u,v)\frac{3\lambda _n}{\varepsilon _n}+2\lambda _n\biggr \}. \end{aligned}$$

Then \(\mathbb {P}\left( A_n\right) =o(\delta _n^3)\), as \(n\rightarrow \infty \).

Fig. 4
figure 4

Depiction of the splitting of the geodesic between u and v in k equal segments

Proof

The proof closely follows the strategy of the proof of Lemma 3. Let \(C_n\) denote the event in Corollary 2. We will show that on this event,

$$\begin{aligned} |d_G^w(u,v)-d_\mathcal {M}(u,v)|\le \frac{3d_\mathcal {M}(u,v)\lambda _n}{\varepsilon _n}+2\lambda _n \end{aligned}$$

for all \(u,v\in U\cap G_n\). This then implies that \({\mathbb {P}}(A_n\cap C_n)=0\) from which the results follows, since by Corollary 2

$$\begin{aligned} {\mathbb {P}}(A_n)\le {\mathbb {P}}(A_n\cap C_n)+(1-{\mathbb {P}}(C_n))=o(\delta _n^3). \end{aligned}$$

Take any two \(u,v\in U\cap G_n\) and let \(\gamma (u,v)\) denote the geodesic between u and v. We then partition this geodesic into

$$\begin{aligned} k=\biggl \lceil \frac{3d_\mathcal {M}(u,v)}{\varepsilon _n}\biggr \rceil \le \frac{3d_\mathcal {M}(u,v)}{\varepsilon _n}+1. \end{aligned}$$

pieces of equal length and let \(u:=u_0,u_1,\dots ,u_{k-1},u_k:=v\) denote the \(k+1\) end- points of the intervals, see Fig. 4. On the event \(C_n\), each \(u_t\) belongs to some ball \(B_t\) of radius \(\lambda _n/4\) which contains a vertex \(x_t\in G\), where we can take \(x_0=u\) and \(x_k=v\). In particular, since \(d_\mathcal {M}(u_t,x_t)\le \lambda _n/2\), \(d_\mathcal {M}(u_{t-1},u_{t})\le \varepsilon _n/3\) and \(\lambda _n=o(\varepsilon _n)\), it follows that for large enough n,

$$\begin{aligned} d_\mathcal {M}(x_t,x_{t+1})\le d_\mathcal {M}(u_t,x_t)+d_\mathcal {M}(u_{t+1},x_{t+1})+d_\mathcal {M}(u_t,u_{t+1})\le \lambda _n+\frac{\varepsilon _n}{3}\le \varepsilon _n, \end{aligned}$$

so that \(\{u,x_1,\dots ,x_k,v\}\) is a path in \(G_n\) (see Fig. 4). Moreover, \(d_G^w(x_t,x_{t+1})\le d_\mathcal {M}(u_t,u_{t+1})+\lambda _n\) by the triangle inequality. Therefore,

$$\begin{aligned} d_G^w(u,v)&\le \sum _{t=0}^{k-1}d_G^w(x_t,x_{t+1})\le \sum _{t=0}^{k-1}\,(d_\mathcal {M}(u_t,u_{t+1})+\lambda _n)\\&\le d_\mathcal {M}(u,v)+k\lambda _n\le d_\mathcal {M}(u,v)\biggl (1+\frac{3\lambda _n}{\varepsilon _n}\biggr )+\lambda _n. \end{aligned}$$

To finish the proof we note that by definition \(d_G^w(u,v)\ge d_\mathcal {M}(u,v)\) and hence

$$\begin{aligned} |d_G^w(u,v)-d_\mathcal {M}(u,v)|=d_G^w(u,v)-d_\mathcal {M}(u,v)\le \frac{3d_\mathcal {M}(u,v)\lambda _n}{\varepsilon _n}+2\lambda _n. \end{aligned}$$

\(\square \)

Proof of Proposition 4

Due to Lemma 7 it suffices to show that the conditions on \(\varepsilon _n\) and \(\delta _n\) imply \(\lambda _n/\varepsilon _n=o(\delta _n^2)\). We compute that

$$\begin{aligned} \frac{\lambda _n}{\varepsilon _n\delta _n^2}=\varTheta \bigl (n^{\alpha +2\beta -1/N}(\log n)^{2/N-a-2b}\bigr ). \end{aligned}$$

The latter is o(1) precisely when either \(\alpha +2\beta <1/N\) or \(\alpha +2\beta =1/N\) and \(a+2b>2/N\), which are the conditions of Proposition 4. Thus, under the conditions of Proposition 4 it holds that the manifold-weighted graph distance \(d_G^w\) is a \(\delta _n\)-good approximation with \(\xi _n=\max {\bigl \{\sqrt{\lambda _n/\varepsilon _n},\lambda _n^{1/3}\bigr \}}\). \(\square \)

5.5 Rescaled Graph Distances

Consider the 2-dimensional Euclidean space equipped with the Euclidean distance \(d_2\). Let \(\mathcal {C}=[0,1]^2\) and take \(G_n={\mathbb {G}}_n(\varepsilon )\) to be the random geometric graph on \(\mathcal {C}\) with connection radius \(\varepsilon \). The main result in [9] relates the shortest-path distance \(d_{G_n}^s\) and the Euclidean distance \(d_2\). We state a version of this result here, which includes the error bounds that follow from [9, Propositions 2.2 and 2.4].

Theorem 5

[9, Thm. 1.1] Consider the random geometric graph \(G_n\) on the unit square \([0,1]^2\) with connection radius \(\varepsilon _n=o(1)\). Then for any pair of vertices \(x,y\in G_n\) with \(d_2(x,y)>\varepsilon _n\), the following holds:

  • If \(d_2(x,y)\ge \max {\{12(\log n)^{3/2}/(n\varepsilon _n),21\varepsilon _n\log n\}}\), then

    $$\begin{aligned} \mathbb {P}\left( d_G^s(x,y)\ge \biggl \lfloor \frac{d_2(x,y)}{\varepsilon _n}\biggl (1+\frac{1}{2 (n\varepsilon _nd_2(x,y))^{2/3}}\biggr )\biggr \rfloor \right) \ge 1-o(n^{-5/2}). \end{aligned}$$
  • If \(\varepsilon _n\ge 224\sqrt{(\log n)/n}\) then

    $$\begin{aligned} \mathbb {P}\left( d_G^s(x,y)\le \biggl \lceil \frac{d_2(x,y)}{\varepsilon _n}(1+\gamma _n)\biggr \rceil \right) \ge 1-o(n^{-5/2}) \end{aligned}$$

    with

    $$\begin{aligned} \gamma _n:=\max {\biggl \{1358\biggl (\frac{3\log n}{n\varepsilon _n^2+n\varepsilon _nd_2(x,y)}\biggr )^{\!2/3}\!\!,\frac{4\cdot 10^6(\log n)^2}{n^2\varepsilon _n^4},\biggl (\frac{30000}{n\varepsilon _n^2}\biggr )^{\!2/3}\biggr \}}. \end{aligned}$$

From this we obtain the following result, which gives bounds on the graph distance \(\varepsilon _nd_G^s\) in terms of the manifold distance, between two nodes of the graph \(G_n\) that are within manifold distance \(O(\delta _n)\).

Lemma 8

Let \(\varepsilon _n\ge 244\sqrt{(\log n)/n}\), \(Q>3\), \(U=\mathcal {B}_\mathcal {M}(x^*;Q\delta _n)\), and define the event

$$\begin{aligned} A_n:=\bigcup _{u,v\in U\cap G_n}\!\!\!\{|\varepsilon _nd_G^s(u,v)-d_\mathcal {M}(u,v)|>d_\mathcal {M}(u,v)\gamma _n+\varepsilon _n\}. \end{aligned}$$

Then \({\mathbb {P}}(A_n)=o(\delta _n^3)\), as \(n\rightarrow \infty \).

Proof

Note that since the the neighborhood U is shrinking as n increases we can map it to \({\mathbb {R}}^2\) diffeomorphically for sufficiently large n. This affects the distances at most by a constant factor and hence it suffices to prove the statement for \(\mathcal {M}={\mathbb {R}}^2\). By the second statement of Theorem 5 we have that for any two \(u,v\in U\cap G_n\),

$$\begin{aligned} {\mathbb {P}}(|\varepsilon _nd_G^s(u,v)-d_\mathcal {M}(u,v)|>d_\mathcal {M}(u,v) \gamma _n+\varepsilon _n)=o(n^{-5/2}). \end{aligned}$$

By conditioning on the number of nodes in U (\(|U\cap G_n|\)) and applying the union bound we get

$$\begin{aligned} {\mathbb {P}}(A_n\,|\,|U\cap G_n|)\le |U\cap G_n| o(n^{-5/2}). \end{aligned}$$

Now \(\mathbb {E}[|U\cap G_n|]=\varTheta (n\delta _n^2)\) and therefore

$$\begin{aligned} \mathbb {P}(A_n)=\mathbb {E}[{\mathbb {P}}(A_n\,|\,|U\cap G_n|)]\le O(n^{-3/2}\delta _n^2)=o(\delta _n^3), \end{aligned}$$

where we used that \(n^{-3/2}=o(\delta _n)\) for all \(\delta _n=\varTheta \hspace{0.33325pt}(n^{-\beta }(\log n)^b)\) and \(\beta \le 1\). \(\square \)

We can now prove Proposition 5.

Proof of Proposition 5

First observe that \(\varepsilon _nd_G^s(u,v)\ge d_\mathcal {M}(u,v)\) for all \(u,v\in \mathcal {B}_\mathcal {M}(x^*;Q\delta _n)\). Moreover, the conditions of the proposition imply that \((\log n)^{1/2}n^{-1/2}=o(\varepsilon _n)\). Therefore, by Lemma 8 we have that with probability \(1-o(\delta _n^3)\),

$$\begin{aligned} |\varepsilon _nd_G^s(u,v)-d_\mathcal {M}(u,v)|\le d_\mathcal {M}(u,v)\gamma _n+\varepsilon _n \end{aligned}$$

for all \(u,v\in \mathcal {B}_\mathcal {M}(x^*;Q\delta _n)\cap G_n\). Moreover, since by assumption \(\alpha \ge 3\beta \) and \(a<3b\) if \(\alpha =3\beta \) it follows that \(\varepsilon _n=o(\delta _n^3)\). Thus, to prove Proposition 5 it remains to show that \(\gamma _n=o(\delta _n^2)\). Since \(\gamma _n\) is the maximum of three terms

$$\begin{aligned} 1358\biggl (\frac{3\log n}{n\varepsilon _n^2+n\varepsilon _nd_2(x,y)}\biggr )^{\!2/3}\!\!,\quad \frac{4\cdot 10^6(\log n)^2}{n^2\varepsilon _n^4},\quad \biggl (\frac{30000}{n\varepsilon _n^2}\biggr )^{\!2/3}. \end{aligned}$$

We will show that each of them is \(o(\delta _n^2)\). For the first term it suffices to show that \(n^{-1}\varepsilon _n^{-2}\log n=o(\delta _n^3)\). This follows since

$$\begin{aligned} n^{-1}\varepsilon _n^{-2}\delta _n^{-3}\log n=O\bigl (n^{-(1-2\alpha -3\beta )} (\log n)^{1 -2a - 3b}\bigr ), \end{aligned}$$

which is o(1) by the assumption that \(2\alpha +3\beta \le 1\) and \(2a+3b>1\) if \(2\alpha +3\beta =1\). We now immediately have that \((n^{-1}\varepsilon _n^{-2}\log n)^2=o(\delta _n^6)\), which proves that the second term is \(o(\delta _n^2)\). Finally, the result for the third term follows from \(n^{-1}\varepsilon _n^{-2}=o(n^{-1}\varepsilon _n^{-2}\log n)=o(\delta _n^3)\). \(\square \)