Chemical distance in geometric random graphs with long edges and scale-free degree distribution

We study geometric random graphs defined on the points of a Poisson process in $d$-dimensional space, which additionally carry independent random marks. Edges are established at random using the marks of the endpoints and the distance between points in a flexible way. Our framework includes the soft Boolean model (where marks play the role of radii of balls centred in the vertices), a version of spatial preferential attachment (where marks play the role of birth times), and a whole range of other graph models with scale-free degree distributions and edges spanning large distances. In this versatile framework we give sharp criteria for absence of ultrasmallness of the graphs and in the ultrasmall regime establish a limit theorem for the chemical distance of two points. Other than in the mean-field scale-free network models the boundary of the ultrasmall regime depends not only on the power-law exponent of the degree distribution but also on the spatial embedding of the graph, quantified by the rate of decay of the probability of an edge connecting typical points in terms of their spatial distance.


Background
An important topic in percolation theory and, more generally, the theory of geometrically embedded random graphs, is the comparison of Euclidean distances of two points with their graph distance, often called chemical distance. Starting with the work of Grimmett and Marstrand [20], this problem has been studied for Bernoulli percolation, for example by Antal and Pisztora [1] and Garet and Marchand [15,16], but also for models with long range interactions, such as random interlacements, seeČerný and Popov [8], its vacant set and the Gaussian free field, see Drewitz et al. [14]. In the supercritical phase of these models Euclidean and chemical distance of points on the unbounded connected component are typically of comparable order when the points are distant, see [14] for general conditions for percolation models on Z d to share this behaviour. The introduction of additional long edges can change this behaviour and the graph distance can be a power of the logarithm or even an iterated logarithm of the Euclidean distance. In the latter case the graph is called ultrasmall. The focus of this paper is to characterise ultrasmallness in geometric random graphs and provide a universal limit theorem for typical distances in such graphs.
We briefly review what is known on this problem. A classical scenario is long-range percolation. Here points x, y of a Poisson process in R d or of the lattice Z d are connected independently with probability p(x, y) = |x − y| −δd+o (1) , for some δ > 1. Biskup [4,5] has shown that if 1 < δ < 2 then the chemical distance is d(x, y) = (log |x − y|) ∆+o (1) , with high probability as x, y are fixed points on the infinite component with |x−y| → ∞, where ∆ = log 2 log(2/δ) . If δ > 2 it was shown by Berger [3] that the chemical distance is at least linear in the Euclidean distance and for δ = 2 there is recent progress by Ding and Sly [12], but in both cases the precise asymptotics is still an open problem. In general, ultrasmallness cannot occur in long-range percolation models.
Ultrasmallness is however a well established phenomenon in scale-free networks. These networks are typically not modelled as spatial graphs, so to compare the results to our scenario we restrict the graph to the vertices inside a ball of radius R, which now contains N lattice or Poisson points, with N of order R d . The mean-field nature of these models is reflected in the fact that connection probabilities do not depend on the spatial position of these points. Instead, points carry independent uniform marks and connections between points are established independently given the marks, with a probability 1 ∧ 1 N g(s, t) depending on the marks s, t of the vertices at the ends of a potential edge. Dependencies of interest are, for example, For all these examples, the graphs have scale-free degree distributions with power-law exponent τ = 1 + 1 γ . When γ < 1 2 (or, equivalently, τ > 3) the chemical distance of two randomly chosen points x, y in the largest component is of order log N or, equivalently, log |x − y|, see Bollobas et al. [6]. If however γ > 1 2 (or, equivalently, 2 < τ < 3), then the graph is ultrasmall and there is a universal limit theorem for the chemical distance of two randomly chosen potints x, y, namely where c = 2 for (i) and c = 4 for (ii), (iii), see Dommers et al. [13], van der Hofstad et al. [31] and Norros and Reittu [28] for the existence of an ultrasmall phase and Dereich et al. [11] for general lower bounds that match the upper bounds in the ultrasmall phase in all those examples.
Looking at spatially embedded graphs with a scale-free degree distribution, Deijfen et al. [9], Deprez et al. [10] and Bringmann et al. [7] investigated a range of spatial models where points are endowed with weights, which are heavy-tailed random variables corresponding loosely to negative powers t −γ of uniformly chosen marks t. The connection probability of two marked points depends on the product of the weights and the spatial distance of the points, which is the case in models like scale-free percolation and hyperbolic random graphs. Behaviour analogous to kernel (i) in the non-spatial case is identified in [7] for these models, namely that the transition between ultrasmall and small world behaviour occurs at γ = 1 2 (equivalently, τ = 3) and in the former case a limit theorem as in (1) with c = 2 holds.
We shall see in the present paper that not only the proof techniques but also the results of [9], [10] and [7] depend crucially on the fact that connections are considered that depend on the weights of points by taking the product. In fact, the situation changes radically when other, equally natural, ways of connecting vertices are considered, and we shall see that the novel behaviour that we unlock in this paper is also of a universal nature. We now discuss two natural examples, which constitute our main motivation. In both cases the vertices of the graph are the points of a standard Poisson process in R d and every point is endowed with an independent mark, which is uniformly distributed on the unit interval (0, 1).
In the Boolean (graph) model on R d the points carry random radii, which can be derived from the uniform marks t, for example as t −γ/d . In the hard version of the model two points are connected by an edge if the balls around them with the associated random radii intersect. In the more powerful soft version of the Boolean model independent, identically distributed positive random variables X = X(x, y) are associated with every unordered pair of vertices {x, y} and a connection is made iff where s, t are the marks of the vertices. The choice X = 1 corresponds to the hard Boolean model, while the choice of γ = 0 and a heavy-tailed random variable X with decay P(X > r) ≍ r −δd as r → ∞, for some δ > 1, replicates the long-range percolation model. While neither of these boundary cases is ultrasmall, we show that a choice of γ ∈ (0, 1) and δ > 1 gives • ultrasmallness if γ > δ δ+1 but, • no ultrasmallness if γ < δ δ+1 . Note that this boundary depends not only on the power-law exponent of the degree distribution, which is τ = 1 + 1 γ , but also on δ, which is a geometric quantity related to the decay in the presence of long edges between typical vertices. In particular ultrasmallness does not occur when the variance of the degree distribution becomes infinite, but at a threshold that depends on spatial correlations influencing the graph topology beyond the degree distribution, a feature that is not present in the scale-free percolation or hyperbolic random graph models. In the ultrasmall case we also get a different form of the limit theorem for the chemical distance, namely with high probability as |x − y| → ∞, where the dependence of the limiting constant on δ is another novel feature.
In our second example we look at the age-based random connection model, which was introduced in Gracar et al. [17]. Here the mark of a vertex is considered to be its birth time so that the model is intrinsically dynamical. At its birth time t a vertex is connected to all vertices born previously with a probability where s < t is the birth-time of the older vertex and ϕ : (0, ∞) → [0, 1] is a nonincreasing profile function. As (t/s) γ is the asymptotic order of the expected degree at time t of a vertex born at time s ↓ 0 this infinite graph model mimics the behaviour of spatial preferential attachment networks [2,25]. An upper bound for the chemical distance for spatial preferential attachment is given by Hirsch and Mönch in [24], but lower bounds are not known. Our results show that, as in the soft Boolean model, we have in the age-dependent random connection model that ultrasmallness fails if γ < δ δ+1 . If γ > δ δ+1 we get a lower bound matching that of [24] and we get the precise asymptotics for the chemical distance as stated in (2).
The similarity in the behaviour of our examples is a strong hint that there is a large class of spatial graph models which displays universal behaviour markedly different from both the class of spatial scale-free graphs investigated in [7] and the non-spatial scalefree models studied, for example, in [30]. This idea is further supported by the recent paper by Gracar et al. [19] which investigates the existence of a subcritical percolation phase and reveals the same regime boundary depending on the parameters γ and δ. In the present paper we explore this universality class of spatial scale-free random graphs by providing general bounds for the chemical distance based only on upper and lower bounds on the connection probabilities between finitely many pairs of points. This approach is sufficiently flexible to yield the fine results described above for the entire range of models in this class, including of course both of the examples described above. The main difficulty here is to produce lower bounds larger than those obtainable for the non-spatial scale-free models by making substantial use of the restrictions coming from the underlying Euclidean geometry.

Framework
Suppose G is a graph with vertex set given by the points of a Poisson process X of unit intensity on R d × (0, 1). We write the points of this process as x = (x, t) and refer to x as the location and t as the mark of the vertex x. Small marks indicate powerful vertices. We write x ∼ y if the vertices x, y are connected by an edge in G .
We denote by P X the law of G conditioned on the Poisson process X and by P x1,...,xn the law of G conditioned on the event that x 1 , . . . , x n are points of the Poisson process X . The following assumption depends on parameters δ > 1 and 0 ≤ γ < 1, it leads to lower bounds on chemical distances in the graph. Assumption 1.1. There exists κ > 0 such that, for every finite set of pairs of vertices I ⊂ X 2 in which each vertex appears at most twice, we have In Section 1.4 we shall see several natural examples of geometric random graphs which satisfy Assumption 1.1. Note that the assumption does not include conditional independence of the events {x i ∼ y i }, which makes several classical tools, such as the BK-inequality, unavailable in our proofs. Without the conditional independence one cannot give a precise description for the degree distribution. However, it is worth noting that Assumption 1.1 is formed in such a way that it implies the existence of a constant C > 0 for which the expected degree of a vertex with mark t is smaller than Ct −γ . The next assumption, which we use to give matching upper bounds on chemical distances in the ultrasmall regime, however, does contain a conditional independence assumption. Assumption 1.2. Given X edges are drawn independently of each other and there exists α, κ > 0 such that, for every pair of vertices x = (x, t), y = (y, s) ∈ X , The weight dependent random connection model is a class of graphs introduced in [18,19] as a general framework, which incorporates many (but not all) of our examples of spatial random graphs. In that context our assumptions roughly mean that the random graphs are stochastically dominated by the random connection model with preferential attachment kernel (Assumption 1.1) and dominate the random connection model with min kernel (Assumption 1.2). Note, that these models have a scale-free degree distribution with power-law exponent τ = 1 + 1 γ . Hence, as previously mentioned these examples deviate from the behaviour of non-spatial models and scale-free percolation in that the emergence of ultrasmallness does not depend only on the power-law exponent.

Statement of the main results
We write x n ↔ y if there exists a path of length n from x to y in G , i.e. there exist We denote by x ↔ y if x n ↔ y holds for some n, i.e. if x and y are in the same connected component in G . The graph distance, or chemical distance, is given by Our main results identify the regime where G is ultrasmall, i.e. where the graph distance behaves like an iterated logarithm of the Euclidean distance. Moreover in this regime we provide a precise limit theorem for the behaviour of the graph distance of remote points. The first and foremost result in this context are lower bounds for the chemical distance of two points at large Euclidean distance using only Assumption 1.1.
(a) If γ < δ δ+1 , then G is not ultrasmall, i.e. for x, y ∈ R d × (0, 1), under P x,y , the distance d(x, y) is of larger order than log log |x − y| with high probability as |x − y| → ∞.
under P x,y with high probability as |x − y| → ∞.
The second result provides a matching upper bound for the chemical distance in the ultrasmall regime under Assumption 1.2. Put together we get the following limit theorem for the chemical distance under Assumptions 1.1 and 1.2 in the ultrasmall regime.
Theorem 1.2. Let G be a general geometric random graph which satisfies Assumption 1.1 and Assumption 1.2 for some γ > δ δ+1 . Then G is ultrasmall and, for x, y ∈ R d × (0, 1), we have under P x,y ( · | x ↔ y) with high probability as |x − y| → ∞.

Remarks:
• For the convergence in Theorem 1.2 we fix marks s, t ∈ (0, 1) and add points x = (x, s) and y = (y, t) to the Poisson process. Then we show that • Stronger results, like explicit lower bounds on d(x, y) under Assumption 1.1 and upper bounds under Assumption 1.2 only will be formulated in Propositions 2.1 and Proposition 3.1 below.
• The results continue to hold mutatis mutandis when the underlying Poisson process is replaced by the points of the lattice Z d endowed with independent uniformly distributed marks.

The soft Boolean model
As explained in the introduction in the (soft) Boolean model on R d the points x carry independent identically distributed random radii R x and unordered pairs of points {x, y} carry independent identically distributed nonnegative random variables X(x, y). Given these variables two points x and y are connected iff For a lower bound we assume that there are constants C 1 , C 2 > 0 such that We can put this model into our framework by constructing the radius R x of a point where F is the distribution function of the radius distribution and F −1 (t) = inf{u : F (u) ≥ t} its generalised inverse. Given X , the probability of a connection of x and y is we infer that the probability of a connection of x and y is bounded by κ(t ∧ s) −δγ |x − y| −δd and hence, using conditional independence of edges, Assumption 1.1 holds. The assumption then implies no ultrasmallness if γ < δ δ+1 , which holds in particular in the hard model for arbitrary 0 < γ < 1, as X(x, y) is constant and hence δ can be chosen arbitrarily large. Similarly, if γ > δ δ+1 and for every small ǫ > 0 there are constants c, C > 0 such that, for all r ≥ 1, then Assumptions 1.1 and 1.2 hold for values arbitrarily close to γ and δ and hence the full limit theorem in probability (3) holds.

Hirsch's scale-free Gilbert graph
Hirsch [23] discusses a model which in its soft version connects every unordered pair of where R x , R y and X(x, y) are as in Example 1.4.1. He gives a lower bound for the chemical distance of the hard model, which is of the from |x − y|/ log |x − y|. Our result also shows that the hard model is not ultrasmall albeit with a much smaller lower bound of an order slightly below log |x − y|. However, this bound extends uniformly to the soft model if δ > γ 1−γ . This includes long-range percolation, which corresponds to the case γ = 0, in which we know from [4] that if δ < 2 the chemical distance is indeed of the order of a power of a logarithm. Our results become best possible looking at the soft model with X heavy-tailed with δ < γ 1−γ . In that case we show that distances can be drastically smaller and satisfy the limit theorem in Theorem 1.2.

The age-dependent random connection model
This dynamical model was introduced in [17] as a simplification of the spatial preferential attachment model of Jacob and Mörters [25,26]. A vertex x = (x, t) is born at time t and at birth connects to all vertices y = (y, s) born previously with probability where β > 0 is a density parameter and ϕ : (0, ∞) → [0, 1] is a non-increasing profile function standardized to satisfy ϕ(|x| d ) dx = 1. It is easy to see that for t ≫ s the expected degree at time t of a vertex born at time s is of asymptotic order (t/s) γ , so that the model combines preferences of attachment to vertices of high degree and to nearby vertices in a balanced way. If ϕ(r) ≤ Cr −δ we see that Assumption 1.1 holds so that ultrasmallness fails if γ < δ δ+1 . But if γ > δ δ+1 and also, for every ǫ > 0, there is c > 0 with ϕ(r) ≥ cr −δ+ǫ for all r ≥ 1, then ultrasmallness holds and we get the asymptotic chemical distance as stated in (2).

Scale-free percolation
As explained in Section 1.1 for the model of Deijfen et al. [9] and other models constructed by taking products of vertex weights and distances we do not expect our results to be relevant or even sharp. In fact, the dependence on the weights in these models is so strong that the geometry does not play a significant role and the techniques developed in this paper are not needed to understand the behaviour of the chemical distance. For these models Assumption 1.1 only holds for γ < 1 2 and in this case we recover from Theorem 1.1 the well-known result that the graph is not ultrasmall when the power-law exponent is τ > 3. For recent results for the chemical distance when γ < 1 2 , see [21].

The reinforced age-dependent random connection model
We consider a reinforced version of the age-dependent random connection model described above, where the connection probability between vertices is reinforced by additional weights of the nodes. Interestingly, although edges do not occur independently of each other due to the additional weights, our results still apply in full generality. Let the vertex set be a Poisson point process X on R d × (0, 1) as before. We assign in addition to each point x ∈ X an independent identically distributed reinforcement weight W = W x , for which we assume the second moment exists that it is almost surely bounded away from zero, i.e. there exists α > 0 such that P(W ≥ α) = 1. Given X and the reinforcement weights, edges are then formed independently between x = (x, t) and y = (y, s) with probability where ϕ is as in Example 1.4.3. Let I ⊂ X 2 be a set of pairs of vertices where each vertex appears at most twice. If there is C > 0 such that ϕ(r) ≤ Cr −δ for all r > 0, where the second inequality holds since each reinforcement weight appears at most twice in the product and they are independent of X . As the second moment of the weights exists, Assumption 1.1 holds for an appropriately chosen κ. Hence, ultrasmallness fails if γ < δ δ+1 . On the other hand, we can easily couple the reinforced age-dependent random connection model to an age-dependent random connection model with a modified density parameter, such that the later is a subgraph of the former. Indeed, for each pair of vertices we draw an independent uniform random variable U (x, y). Given the Poisson process X , the reinforcement weights and the family (U (x, y)) x,y∈X , we can construct the age-dependent random connection model and the reinforced model in the following way. First, add an edge between any pair of vertices when This leads to the age-dependent random connection model with new density parameter β = βα −2/δ . Since W ≥ α almost surely, each such edge is also added in the reinforced model. To get the full reinforced model, we add additional edges to hitherto unconnected pairs of vertices if As the age-dependent random connection model is ultrasmall when γ > δ δ+1 and if for every ǫ > 0, there exists c > 0 with ϕ(r) ≥ cr −δ+ǫ for all r ≥ 1, the reinforced model is ultrasmall as well and we get the asymptotic chemical distance as stated in (2) under both tail assumptions stated for ϕ in this section. Note that Examples 1.4.1 and 1.4.2 can similarly be reinforced, and similar conclusions can consequently be drawn.

Ellipses percolation
In [29] Teixeira and Ungaretti introduce a model on R 2 as a collection of random ellipses centred on points of a Poisson process X on R 2 ×(0, 1) with uniform marks t, from which the size of the major half-axis is derived as t −γ/2 while its direction is sampled uniformly. The size of the minor half-axis is one. The random graph is then constructed by taking the Poisson process as the vertex set and forming edges given the collection of random ellipses between pairs of points of the point process if their ellipses intersect. Hilário and Ungaretti [22] show that, for γ ∈ (1, 2), the model is ultrasmall.
We introduce a soft version of this model, where for each pair of vertices x, y we consider copies of their ellipses where the size of the major axes are multiplied with independent, identically distributed positive heavy-tailed random variables X = X(x, y) with P(X > r) ∼ r −2δ for some δ > 1. An edge between x and y is then formed if the new ellipses intersect. Note that given X edges are not drawn independently of each other, as the neighbourhood of each vertex depends on the orientation of the ellipses. Our results show that, for γ ∈ [0, 1), the original model is never ultrasmall and the soft model is not ultrasmall if γ < δ δ+1 . We see that if an edge is formed between x = (x, t) and y = (y, s), this implies that balls around x and y with radii X(x, y)t −γ/2 and X(x, y)s −γ/2 intersect. Thus, there exists κ > 0 such that Since the random variables X(x, y) are independent, Assumption 1.1 holds for γ ∈ [0, 1) and δ > 1 and the claimed result follows.

Proof of the lower bounds for the chemical distance
Truncated first moment method To prove the lower bounds of Theorem 1.1 we find an upper bound for P x,y {d(x, y) ≤ 2∆} and choose ∆ as large as possible while keeping the probability sufficiently small. Note that the definition of the graph distance d can be reduced to the existence of self-avoiding paths, since if there exists a path of length n between two given vertices there also exists a self-avoiding path with shorter or equal length between those two. Hence, the paths considered throughout this section are assumed to be self-avoiding. The event {d(x, y) ≤ 2∆} is equivalent to the existence of at least one path between x and y of length smaller than 2∆. Hence, where x = x 0 , y = x n , = (resp. = ) denotes the union (resp. sum) over all possible sets of pairwise distinct vertices x 0 , . . . , x n of the Poisson process and E is the expectation with respect to the law of a Poisson process with unit intensity on R d ×(0, 1). To keep notation throughout the paper short we will abbreviate the previous notation and write x1,...,xm for the sum over all sets of m distinct vertices of the Poisson process. We get, by using Mecke's equation [27] and Assumption 1.1 that This bound is only good enough if γ < 1 2 . If γ ≥ 1 2 the expectation on the right is dominated by paths which are typically not present in the graph. These are paths which connect x or y quickly to vertices with small mark t. Our strategy is therefore to truncate the admissible mark of the vertices of a possible path between x and y. We define a decreasing sequence (ℓ k ) k∈N0 of thresholds and call a tuple of vertices (x 0 , . . . , x n ) good if their marks satisfy t k ∧ t n−k ≥ ℓ k for all k ∈ {0, . . . , n}. A path consisting of a good tuple of vertices is called a good path. We denote by A (x) k the event that there exists a path starting in x which fails this condition after exactly k steps, i.e. a path ((x, t), (x 1 , t 1 ), . . . (x k , t k )) with t ≥ ℓ 0 , t 1 ≥ ℓ 1 , . . . , t k−1 ≥ ℓ k−1 , but t k < ℓ k . Furthermore we denote by B (x,y) n the event that there exists a good path of length n between x and y. Then, for given vertices x and y This decomposition is the same as for the mean-field models in [11]. The main feature of our proof is to show that the geometric restrictions and resulting correlations in our spatial random graphs make it much more difficult for a path to connect to a vertex with small mark. Hence a larger sequence (ℓ k ) of thresholds can be chosen that still makes the two first sums on the right of (TMB) small, allowing the third sum to be small for a larger choice of ∆. This requires a much deeper analysis of the graph and its spatial embedding.

Outline of the proof
The characteristic feature of the shortest path connecting two typical vertices is that, starting from both ends, the path contains a subsequence of increasingly powerful vertices. The two parts started at the ends meet roughly in the middle in a vertex of exceptionally high power depending on the distance between the starting vertices. In our framework powerful vertices are characterised by small marks. For geometric random graphs fulfilling Assumption 1.1 we show that arbitrary strategies connecting increasingly powerful vertices are dominated by an optimal strategy by which paths make connections between vertices of increasingly high power in a way depending on the parameters γ and δ in our assumption: • If γ > δ δ+1 we connect two powerful vertices x and y via a connector, a single vertex with a larger mark which is connected to both x and y; • if γ < δ δ+1 we connect them by a single edge. In both cases, we now sketch how our argument works on paths containing only the optimal type of connection between powerful vertices. The principal challenge of the proof will however be to show how these proposed optimal strategies dominate the entirety of other possible strategies. This is particularly hard in the former case, because a vast number of potential strategies leads to a massive entropic effect that needs to be controlled. Note also that at this point we need not show that the proposed optimal strategies actually work. This (easier) part of the proof requires Assumption 1.2 and is carried out in Section 3. Figure 1: An example of a path with optimal connection type for γ > δ δ+1 . The horizontal axis corresponds to the sequential numbering of vertices on the path, the vertical axis represents the mark space. Powerful vertices (indicated by black dots) alternate with connectors (indicated by grey dots).
In the case γ > δ δ+1 the optimal connection strategy is to follow a path of length 2n between x and y, where we assume that n is even and that the vertices x 1 = (x 1 , t 1 ), . . . , x 2n−1 = (x 2n−1 , t 2n−1 ) of the path satisfy that t 2(k+1) < t 2k < t 2k+1 and t 2n−2(k+1) < t 2n−2k < t 2n−2k+1 for all k = 0, . . . , n/2, i.e. the vertices with even index can be seen as powerful vertices, while the ones with odd index represent the connectors between them, see Figure 1. Note that at this point we make no assumptions on the locations of these vertices.
For arbitrary ε > 0, we now determine a truncation sequence (ℓ k ) k∈N0 , such that paths starting in x, resp. y, which are not good, only exist with a probability smaller than ε. To do so, we establish an upper bound for the probability of the event A (x) n that there exists a path starting in x whose n-th vertex is the first vertex which has a mark smaller than the corresponding ℓ n . We denote by N (x, y, n) the number of paths of length n from x = (x, t) to a vertex y = (y, s) whose vertices (x 1 , t 1 ), . . . (x n−1 , t n−1 ) fulfill t 2(k+1) < t 2k < t 2k+1 for all k = 0, . . . , ⌊n/2⌋ − 1 and which is one half of a good path, i.e. t ≥ ℓ 0 , t 1 ≥ ℓ 1 , . . . , t n−1 ≥ ℓ n−1 . The mark of y is not restricted in this definition and is therefore allowed to be smaller than ℓ n . Hence, in this case the event A (x) n can only occur for n even, since by definition a connector is less powerful than the preceding and following vertex and therefore has a mark larger than the corresponding ℓ n .
For n even we have by Mecke's equation that dy E x,y N (x, y, n).
Since the existence of a path counted in N (x, y, n) is equivalent to the existence of vertices z 1 , . . . , z n/2−1 such that the marks are bounded from below by ℓ 2 , ℓ 4 , . . . , ℓ n−2 , with z 0 = x, z n/2 = y the marks u 0 , . . . , u n/2 of z 0 , . . . , z n/2 are decreasing, and z i , z i+1 are connected via a single connector, Mecke's equation yields is the number of connectors between z i and z i+1 . Using Mecke's equation and Assumption 1.1 we have . We see in Lemma 2.1 that there exists C > 0 such that, for two given vertices x = (x, t) and y = (y, s) far enough from each other, This inequality holds for the optimal connection type between two powerful vertices of the path and we will see that this type of bound holds also for the case of multiple connectors between two powerful vertices (cf. Lemma 2.3). It also clearly displays the influence of the spatial embedding of the random geometric graph via the parameter δ. Assuming (5) for the moment, we obtain x 3 x 4 x 5 x 6 x 7 y x 6 x 4 x 2 x x 1 x 3 x 5 x 7 Figure 2: Representation of a path with optimal connection type by a binary tree. For a less trivial example resulting from a general connection strategy, see Figure 5.
For a sufficiently large constant c > 0 the right-hand side of (6) can be bounded by as shown in Lemma 2.5 considering all paths. With ℓ 0 smaller than the mark of x we choose the truncation sequence (ℓ k ) for ε > 0, such that and we have Writing η n := ℓ −1 n we can deduce from (7) a recursive description of (ℓ n ) n∈N0 such that Consequently there exist b > 0 and B > 0 such that η n ≤ b exp(B(γ/(δ(1 − γ))) n/2 ). We close the argument with heuristics that leads from this truncation sequence to a lower bound for the chemical distance. Let x and y be two given vertices. If there exists a path of length n < log |x − y| between them, there must exist at least one edge in this path which is longer than |x−y| log|x−y| . For |x − y| large, this edge typically must have an endvertex whose mark is, up to a multiplicative constant, smaller than |x − y| −d . Hence, if we choose we ensure ℓ ∆ is of larger order than |x − y| −d . Therefore there is no good path whose vertices are powerful enough to be an endvertex of an edge longer than |x−y| log|x−y| and consequently no good path of length shorter than 2∆ can exist between x and y.
Turning to the case γ < δ δ+1 , we consider paths whose powerful vertices are connected directly to each other. For a path of length n between two given vertices x and y we assume that n is even and for the vertices x 1 = (x 1 , t 1 ), . . . , x n−1 = (x n−1 , t n−1 ) of the path we assume that we have t 0 > t 1 > . . . > t n/2 and t n > t n−1 > . . . > t n/2 , where t 0 is the mark of x and t n the mark of y. We again make no restrictions on the locations of those vertices. Restricting the paths described in A (x) n and B (x,y) n to paths with this structure we follow the same argumentation as above to establish sufficiently small bounds for the event A (x) n for a given vertex x = (x 0 , t 0 ), where we again without loss of generality integrate over a larger range. For c > 0 large enough, the right-hand side can be further bounded by see Lemma 2.9. Choosing ℓ 0 < t 0 and (ℓ n ) n∈N0 for ǫ > 0, such that the last displayed term equals ε π 2 n 2 ensures that n P x (A x n ) < ε 6 and by induction we see that this choice is possible while for any p > 1 there exists B > 0 such that η n ≤ B n log p (n+1) . Following the same heuristics as before leads to the choice for some constant c > 0 such that paths between x and y with length shorter than 2∆ do not exist with high probability.

The ultrasmall regime
We now start the full proof in the case γ > δ δ+1 considering all possible connection strategies. We prepare this by first modifying the graph by adding edges between vertices which are sufficiently close to each other. We call a path step minimizing if it connects any pair of vertices on the path by a direct edge, if it is available. Note that the length of any path connecting two fixed vertices can be bounded from below by the length of a step mimimizing path connecting the two vertices. Two spatial constraints emerge from this: On the one hand, vertices on a step minimizing path in the modified graph that are not neighbours on the path cannot be near to each other. On the other hand, vertices connected by one of the added edges have to be near to each other. To make full use of these constraints we need to distinguish between original edges and edges added to the graph. This can be done efficiently by endowing every edge with a conductance, which is one for original and two for added edges.
More precisely, we consider a graphG where edges are endowed with conductances as follows: First, create a copy of G and assign to every edge conductance one. Then, between two vertices x = (x, t) and y = (y, s) ofG an edge is added toG with conductance two whenever Since all conductances and edges ofG are deterministic functionals of G , there exists an almost sure correspondence between G andG , under which an edge with conductance one inG implies the existence of the same edge in G . With conductances assigned to every edge ofG , we define the conductance of a path P = (x 0 , . . . , x n ) inG as the sum over all conductances of the edges of P and denote it by w P .
We call a self-avoiding path P = (x 0 , . . . , x n ) in G orG step minimizing if there exists no edge between x i and x j for all i, j with |i − j| ≥ 2.
Note that a step minimizing path in G is not necessarily step minimizing inG , since there could exist an edge of conductance two between two vertices of the path that would reduce the number of steps. But by removing the vertices connecting such a pair of vertices from the path we can shorten the path to a step minimizing path inG whose length and conductance is no more than the length of the original path. Hence the chemical distance d(x, y) between vertices x and y in G is larger or equal than the conductance d w (x, y) := min{w P : P is a path between x and y} between them inG .
To bound the probabilities occurring in (TMB), we express the events on G with the help of corresponding events onG by replacing the role of the length of a path by its conductance. The role of the conductance is crucial, as it allows us to distuingish newly added edges in a path, which is necessary to keep the bounds of the probabilities in (TMB) sufficiently small. We call a path P = (x 0 , . . . , x n ) inG good if its marks satisfy t k ≥ ℓ wP (k) and t n−k ≥ ℓ wP −wP (n−k) for all k = 0, . . . , n, where w P (k) is the conductance of P between x 0 and x k . We denote byÃ x k the event that there exists a step minimizing path starting in x inG with conductance k which fails to be good on its last vertex. Notice that if there exists a path described by the event A x k , i.e. a path for which the k-th vertex is the first one whose mark is smaller than the corresponding truncation value ℓ k , then due to the correspondence between G andG there also exists a step minimizing path P inG with w P ≤ k which also fails the condition on its last vertex. Hence, the first two summands of the right-hand side of (TMB) can be bounded from above by n ), we count the expected number of paths occurring in the eventÃ holds for vertices x and y, there exist no step minimizing paths between x and y with conductance larger or equal three and there exists one step minimizing path with conductance two, since there exists an edge of conductance two between the two vertices. This property also holds for any of the subclasses of step minimizing paths introduced in the following.
For given vertices x = (x, t) and y = (y, s) define the random variable N (x, y, n) as the number of distinct step minimizing paths P between x and y with w P = n, whose connecting vertices ( n is the event that there exists a path with conductance n, where the final vertex is the first and only one which has a mark smaller than the corresponding ℓ n , the final vertex is also the most powerful vertex of the path. Hence, the number of paths described by the eventÃ (x) n can be written as the sum of N (x, y, n) over all sufficiently powerful vertices y of the graph and, by Mecke's formula, we have We now decompose N (x, y, n). For k = 1, . . . , n − 1, define N (x, y, n, k) as the number of step minimizing paths P between x and y with w P = n and • whose connecting vertices (x 1 , t 1 ), . . . (x m−1 , t m−1 ) have marks larger than the corresponding thresholds ℓ wP (1) , . . . , ℓ wP (m−1) and larger than the mark of y, and • there exists r ∈ {1, . . . , m−1} such that we have w P (r) = n−k and the connecting vertex x r = (x r , t r ) has the smallest mark among the connecting vertices and x. The vertex x r can be understood as the powerful vertex of the path which connects to y via a path of less powerful vertices with conductance k. Consequently, we write N (x, y, n, n) for the number of step minimizing paths of conductance n, which connect x and y via less powerful vertices. Then we have, for n ∈ N, For k = 1, . . . , n−1, the existence of a path counted in N (x, y, n, k) implies the existence of a vertex z such that a step minimizing path counted by N (x, z, n − k) exists which connects to y via a path of less powerful vertices with conductance k. Hence for n ∈ N and k = 1, . . . , n − 1, where we denote by K(z, y, k) the number of step minimizing paths P between z and y with w P = k whose vertices have marks larger than the marks of z and y. Note that unlike N (x, y, n), this random variable is symmetric in its first two arguments and by definition we have that N (x, y, n, n) = K(x, y, n).
Observe that K(z, y, 1) is the indicator whether z and y are connected by an edge with conductance one. We turn our attention to K(z, y, k) in the case k ≥ 2, i.e. two powerful vertices are connected via one or more connectors or an edge with conductance two.
Connecting powerful vertices. First consider the random variable K(x, y, 2).
the vertices x and y are connected by an edge with conductance two and we infer that K(x, y, 2) = 1. In the other case, K(x, y, 2) is equal to the number of connectors between x and y, i.e the number of vertices with mark larger than the marks of x and y, which form an edge of conductance one to x and y. The following lemma shows the stated inequality (5) from Section 2.1 for this case. Recall that we write ρ(x) := 1 ∧ x −δ and define I ρ : .
Proof. The first inequality follows directly by summing over all possible connectors and applying Assumption 1.
We consider the event that vertices x and y are connected by multiple vertices with larger marks. Recall that K(x, y, k) is the number of step minimizing paths P between x and y with w P = k whose vertices have marks larger than the marks of x and y. As before we call the vertices of such a path connectors. To control the number of such paths, notice that for any possible choice of connectors between x and y, there exists an almost surely unique connector with smallest mark, i.e the most powerful connector. For i = 1, . . . , k, we denote by K(x, y, k, i) the number of step minimizing paths between x and y where the connectors have a larger mark than x and y and there is a vertex x r with w P (r) = i which is the most powerful connector of those vertices. Then, Assume now that the connector x r is the most powerful of all connectors and w P (r) = i. In this case, the possible connectors x 1 , . . . , x r−1 and x r+1 , . . . , x m−1 need to have larger mark than x r . Hence, the paths between x r and x, resp. y, considered on their own have the same structure as the initial path and this leads to 1 ℓ x y Figure 3: Decomposing the path at the most powerful connector.
We use this decomposition together with Assumption 1.1 to find an upper bound for (13) and otherwise e K (x, y, 2) = 1 and e K (x, y, k) = 0 for k ≥ 3.
Lemma 2.2. Let x, y ∈ R d × (0, 1] be two given vertices. Then, for all k ∈ N, we have E x,y K(x, y, k) ≤ e K (x, y, k).
Note that by Assumption 1.1 and Lemma 2.1, we have E x,y K(x, y, 1) ≤ e K (x, y, 1) and E x,y K(x, y, 2) ≤ e K (x, y, 2). We prove the result for general k by induction using (12), but to do so we need to classify the possible connection strategies according to the way in which powerful vertices are placed. This classification is done by means of coloured binary trees. We write T k−1 for the set of all binary trees with k − 1 vertices. Here a binary tree is a rooted tree in which every vertex can have either no child, a right child, a left child or both. We colour the vertices of a tree T ∈ T k−1 in such a way that the leaves of the tree can be either blue or red, and every other vertex is coloured blue. Thus, for each T ∈ T k−1 there exist 2 ℓ different colourings, where ℓ is the number of leaves of T . Let T c k−1 be the set of all coloured trees.
Before proceeding we outline the role of the tree and its coloured vertices in regard to the information they capture. We will construct the tree so as to describe the precise order of the connectors' marks. In order to distuingish between connections of vertices that are sufficiently close to form an edge with conductance two and connections between vertices which are further apart, red vertices of the tree will represent the first case and blue the second. x 3 x 4 x 5 x 6 x 7 x 8 x 9 y x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 Figure 4a: Classification of a connection strategy by means of a binary tree. Local minima of the path correspond to branchpoints and local maxima to blue leaves of the corresponding binary tree T . Matching labels in the tree on the right are obtained by left-to-right labelling. x 3 x 4 x 5 x 6 x 7 x 4 x 5 x 6 x 7 x 8 Figure 4b: One connector of the path in Figure 4a is replaced by an edge of conductance two. This edge corresponds to the red vertex in the tree to which no label and hence no vertex of the path is attached.
To each step minimizing path of conductance k between x and y we associate a coloured tree T ∈ T c k−1 in two steps, see Figure 4a: (1) If the connectors of the step minimizing path P of conductance k are x 1 , . . . , x m with m ≤ k − 1, we associate a vector u = (u 1 , . . . , u k−1 ) to the path defined as follows. We set u wP (i) := t i for all i ∈ 1, . . . , m and u j = 1 for all j ∈ {1, . . . , k − 1}\{w P (1), . . . , w P (m)}. Then (2) To u ∈ U k−1 we associate a coloured tree T ∈ T c k−1 as follows: -For k = 2 we have u = (u 1 ) and the set T c 1 contains two trees T , each consisting only of the root which may be coloured blue or red. If u = (1), then u is associated to the tree T with the red root and otherwise u is associated to the tree with the blue root.
-For k > 2, assume that to every tuple in u ∈ U j−1 with 2 ≤ j < k we have already associated a coloured tree T ∈ T c j−1 . Let u = (u 1 , . . . , u k−1 ) and let u i be the smallest value of u. Then, there exist trees T 1 ∈ T c i−1 and T 2 ∈ T c k−i−1 associated to u 1 = (u 1 , . . . , u i−1 ), resp. u 2 = (u i+1 , . . . , u k−1 ). To u we associate the tree T ∈ T c k−1 , which has T 1 as the left subtree of the root and T 2 as the right subtree and colour the root blue.
Conversely, given a tree T ∈ T c k−1 let m be the number of blue vertices of the tree. We define a labelling of the blue vertices in T by letting σ T (i) be the ith vertex removed in a left-to-right exploration of the tree consisting of the blue vertices. This exploration starts with the vertex obtained by starting at the root and going left at any branching until this is no longer possible. Remove this vertex and repeat the procedure unless the removal disconnects a part from the tree or removes the root. If a part is disconnected explore this part (which is rooted in the right child of the last removed vertex) until it is fully explored and removed, and continue from there with the remaining tree. If the root is removed while it has a right child, explore the tree rooted in that child until it is fully explored and then stop. Similarly, define a bijection by letting τ T (i) be the ith vertex seen by a left-to-right exploration of all vertices on the tree T . We also set σ −1 T (τ T (0)) := 0 and σ −1 T (τ T (k)) := m + 1. Finally, is defined recursively. For the root v of T , we set κ T (v) = (0, k). As before, removing v splits T into a left subtree T 1 and a right subtree T 2 . If these trees are nonempty, Repeat this for the subtrees until κ T (v) is defined for all v ∈ T . Thus, for each vertex v ∈ T , its image κ T (v) captures • as its first entry the labelling τ −1 T of the last vertex seen by a left-to-right exploration before the first vertex of the subtree rooted in v (and set to 0 if there is no such vertex), • as its second entry the labelling τ −1 T of the first vertex seen by a left-to-right exploration after the last vertex of the subtree rooted in v (and set to k if there is no such vertex).
With these labelings at hand, we now describe four restrictions that are satisfied by the marks and locations of the connectors x 1 , . . . , x m of every step-minimizing path connecting x 0 = (x 0 , t 0 ) and x m+1 = (x m+1 , t m+1 ) to which the coloured tree T is associated, namely Note that whereas (i) and (ii) describe the order of the marks, (iii) and (iv) encode the spatial restrictions on the connectors via the colour of the tree vertices. In (iv), x i (resp. x j ) is the first vertex to the left (resp. right) with a smaller mark than x σ −1 and the inequality ensures that x i and x j are far enough apart that no edge with conductance two can exist between them. Conversely, the inequality in (iii) ensures the existence of an edge with conductance two. These conditions motivate the following definitions: • M T as the set of vectors (t 1 , . . . , t m ) ∈ (0, 1) m such that (i), (ii) hold, • I rl T as the set of pairs (i, j) ∈ {0, . . . , m + 1} 2 for which a red leaf v of T exists such that i = σ −1 T (τ T (κ (1) T (v))) and j = σ −1 T (τ T (κ (2) T (v))), • I b T as the set of pairs (i, j) ∈ {0, . . . , m + 1} 2 for which a blue vertex v of T exists such that i = σ −1 T (τ T (κ (1) T (v))) and j = σ −1 T (τ T (κ (2) T (v))), • and I bc T as the set of pairs (i, i + 1) ∈ {0, . . . , m + 1} 2 for which we have that τ −1 T (σ T (i + 1)) − τ −1 T (σ T (i)) = 1. Whereas M T captures the restrictions on the marks, I rl T and I b T contain the indices to which the the spatial restrictions (iii) and (iv) apply, as for (i, j) ∈ I b T the vertices x i and x j cannot be near to each other and for (i, j) ∈ I rl T the vertices x i and x j have to be that near to each other so that an edge of conductance two exists between them. For each pair (i, j) ∈ I rl T we have j = i + 1 and I rl T , I bc T form a partition of {(i, i + 1) : i = 0, . . . , m}, because for any (i, i + 1) ∈ I bc T , there exists an edge of conductance one between the vertices x i and x i+1 .
Proof of Lemma 2.2. For T ∈ T c k−1 , we define K T (x, y) as the number of step minimizing paths P between x and y with w P = k whose vertices have marks larger than the marks of x and y to which T is associated. Then If k = 1 (or equivalently T = ∅) we have that K T (x, y) is the indicator of the event that x and y are connected by an edge. For k = 2, if T is the tree consisting of the red For k ≥ 3 we split the tree at the root, i.e.
where T 1 and T 2 are the left, resp. right, subtree of T obtained by cutting the root. Repeat the step (15) by consecutively splitting the tree at the vertices as seen in the order of a depth first search of the blue vertices in the tree, reducing the product to terms corresponding to empty or single red vertex trees. We get where x 0 = x, x m+1 = y and v (i,i+1) ∈ T is the red leaf associated to (i, j) in the definition of I rl T . Note that the term K v (i,i+1) contains further spatial restrictions on x i and x i+1 , ensuring that these vertices are sufficiently close. Taking expectations yields By Assumption 1.1, we have Hence, using the Mecke formula for m points, we get What remains to be seen is that when the right-hand side in (17) is denoted e T K (x, y) and summed over all T ∈ T c k−1 we obtain e K (x, y, k). This is clearly true when k = 1 and k = 2. Otherwise we use (13) to decompose e K (x, y, k). By induction, the factors in this decomposition can be represented as in (17) and we obtain Writing the terms e T1 K (x, z) and e T2 K (z, y) as in (17) as integrals over x 1 , . . . , x m1 and x m1+2 , . . . , x m we can insert z as x m1+1 and note that the conditions and terms emerging in that integral are exactly the same as in (17) for the tree T with T 1 and T 2 as left and right subtree of the root. Indeed, • the vector (t 1 , . . . , t m ) of the marks of x 1 , . . . , x m is an element of M T iff (t 1 , . . . , t m1 ) ∈ M T1 , (t m1+2 , . . . , t m ) ∈ M T2 and t m1+1 > s ∨ t, • the spatial conditions described by I b T are fulfilled iff x 1 , . . . , x m1 fulfills the ones decribed by I b T1 , x m1+2 , . . . , x m the ones by I b T2 and where the values of the pairs of I rl T2 have been increased by m 1 + 1 and in the same way I bc T directly emerges from I bc T1 and I bc T2 . Hence, e K (x, y, k) can be obtained by summing e T K (x, y) over all T ∈ T c k−1 .
Proof. Choose C > 1 such that C is larger than the constants appearing in Lemma 2.1 and Lemmas A.1 and A.2 of the appendix. We now show by induction that holds for all k ≥ 2, where Cat(k−1) is the (k−1)-th Catalan number. Note that, for k ≥ 2, it holds e K (x, y, k) ≤ 1 for |x − y| dze(x, z, 2)e(z, y, 1).

Using the bounds established in Lemma 2.1 together with Lemma A.2 leads to
Let k ≥ 4 and assume that (18) holds for all j = 2, . . . , k − 1. For x, y such that (13), With (18) we hence get,

Using Lemma A.1 and Lemma A.2 the last expression can be further bounded by
If k is even, i and k − i need to be either both even or both odd, for i = 1, . . . , k − 1.
Since ℓ > 0 is chosen small enough that log( 1 ℓ ) 2 < ℓ 1−γ−γ/δ , we have that in both cases If k is odd, an analogous observation leads to Hence, we have and (18) holds for k. The observation that Cat(k) ≤ 4 k concludes the proof.

Probability bounds for bad paths. With Lemma 2.3 we can establish a bound
for E x,y N (x, y, n), recall the definitions in Section 2.2. As in (10) and (11), we have Here z is the most powerful vertex of the path disregarding y and connects to y via less powerful vertices. As done for K(x, y, k) in the previous section we compare E x,y N (x, y, n) with a deterministic mapping defined as and for n ≥ 2 e N (x, y, n) = e K (x, y, n) + for x, y ∈ R d ×(0, 1], if |x − y| d > κ 1/δ (t ∧ s) −γ (t ∨ s) −γ/δ , and otherwise e N (x, y, 2) = 1 and e N (x, y, n) = 0 for n ≥ 3.
Lemma 2.4. Let x, y ∈ R d × (0, 1] be two given vertices. Then, for all n ∈ N, we have E x,y N (x, y, n) ≤ e N (x, y, n).
Proof. First recall that for |x − y| d ≤ κ 1/δ (t ∧ s) −γ (t ∨ s) −γ/δ we have N (x, y, n) = 0 for n ≥ 3 and N (x, y, 2) = 1. Thus in this case N (x, y, n) is equal to e N (x, y, n) and consequently their expectations are equal. Otherwise, the proof follows the same argument as in Lemma 2.2, where we again classify the possible connection strategies between x and y through coloured binary trees. We therefore only briefly present the required class of trees, explain the association of a path to the corresponding tree and the restrictions on marks and space which a step minimizing path that associates to T has to satisfy.
Let T cb n be a class of coloured rooted binary trees with n vertices which are constructed as follows. For k ≤ n, we have a backbone consisting of k vertices, starting with the root followed by k − 1 vertices, each a left child of the previous one. The last vertex in this line is coloured red, the others blue. Let i 1 , . . . , i k ∈ N with i 1 + . . . + i k = n − k. A tree T ∈ T cb n is formed by attaching to the j-th vertex (as seen by a left-to-right exploration of the backbone) a coloured subtree T j ∈ T c ij rooted in its right child, for j = 1, . . . , k.
To any path P = (x 0 , x 1 , . . . , x m+1 ) with x 0 = x and x m+1 = y where the connecting vertices have larger marks than y we associate a tree T ∈ T cb n as follows. We say x i is a powerful vertex of P if t i ≤ t j for all j = 0, . . . , i − 1. By definition, the vertices x 0 and x m are always powerful vertices. We denote by {x i1 , . . . , x i k+1 } the set of powerful vertices keeping the order in the path. Then two consecutive powerful vertices x ij and x ij+1 are, by definition, connected via a path of connectors x ij +1 , . . . x ij+1−1 of conductance w j := w P (i j+1 ) − w P (i j ). If w j ≥ 2, associate the connectors of the path connecting x ij and x ij+1 to a non-empty coloured tree T j ∈ T c wj −1 as in the proof of Lemma 2.2. Let T ∈ T cb n be the coloured tree which has a backbone of length k and where T j is attached to the j-th vertex (as seen by a left-to-right exploration of the backbone) such that its right child is the root of T j , see Figure 5 for an example. x 3 x 4 x 5 x 6 x 7 x 8 x 9 y x 5 x 2 x x 1 x 3 x 4 x 8 x 9 x 6 x 7  by v 1 , . . . , v k the vertices of the backbone of T and T 1 , . . . , T k the subtrees rooted in their right child. Set i j := σ −1 (v j ), for i = 1, . . . , k, and i k+1 := m + 1. Then, the following restrictions on marks and space are satisfied by the vertices x 1 , . . . , x m of any path connecting x 0 = x and x m+1 = y to which T is associated: (iii) for j = 1, . . . , k, the vertices x ij +1 , . . . , x ij+1−1 satisfy the four restrictions on marks and space given by the coloured tree T i and x ij , x ij+1 as described prior to the proof of Lemma 2.2.
For T ∈ T cb n , we define N T (x, y) as the number of step minimizing paths to which T is associated. Denote again by v 1 , . . . , v k the vertices of the backbone of T and set i j := σ −1 (v j ), i k+1 := m + 1. Splitting the tree at each blue vertex of the backbone leads to where T j is the subtree attached to the right child of v j . Proceeding for each K Tj and using the iterative structure of e N as in the proof of Lemma 2.2 yields the result.
As a path described by the eventÃ (x) n (recall the definition from Section 2.2) has a restriction on the mark but not on the location of its last vertex, we can use the integral with y = (y, s) and s smaller than some yet to be determined value to bound P x (Ã (x) n ). Thus, we define for given x = (x, t) and n ∈ N the mapping µ Recall that we write k * := k (mod 2) and I ρ := dx ρ(κ −1/δ |x| d ). By the definition of e N (x, y, 1) we have µ x 1 (s) ≤ I ρ s −γ t γ−1 , for s ∈ (0, t], and, for n ≥ 2, with a short calculation using Lemma 2.3 we get the recursive property where C > 0 is the constant from Lemma 2.3. Here, the first summand (24) corresponds to the first summand of (19), i.e. the number of paths with conductance n where the first vertex x and the last vertex with mark s are the two most powerful vertices of the path. The summands (25) and (26) describe the second summand of (19), where (26) covers the case that the last vertex of a path is directly connected to the preceding most powerful vertex.
Using the recursive inequality in (24) -(26) we now establish bounds for µ x n . To make the proof more transparent we continue working with a general sequence (ℓ n ) n∈N0 assuming only that it is at least exponentially decaying, i.e. for any b > 0 it holds that ℓ n+2 < bℓ n . We choose b > 0 small enough such that converges. This choice is possible because in our regime γ + γ/δ is larger than one. We denote the limit of the series by c b > 1. As we have already seen for the optimal path structure in Section 2.1, the chosen sequence (ℓ n ) n∈N0 decays much faster than any exponential rate so that this assumption will not have any effect on the result. Without loss of generality we may additionally assume ℓ 0 < 1 e .
Lemma 2.5. Let x = (x, t) be a given vertex and let the sequence (ℓ n ) n∈N0 be at least exponentially decaying with ℓ 0 < t ∧ 1 e . Then, there exists a constant c such that, for n ∈ N, we have µ x n (s) ≤ C n s −γ , for s ∈ (0, t], where C n+2 = c 2 ℓ 1−γ−γ/δ n C n + c log 1 ℓn+1 C n+1 (28) and Proof. We choose the constant c > 0 such that it is larger than (γ+γ/δ−1)∧1 and larger than the constant C from Lemma 2.3. Since this also implies that c > I ρ , by the definition of µ x 1 we have For n = 2, the recursive inequality for µ x 2 yields Using the already established bound for n = 1 we have Now let n ≥ 3 and we assume that (27) holds for allñ ≤ n − 1. Then, using the already established bounds and the recursive inequality property we have Assume for the moment that holds. Then, as c > C, the term µ x n (s) can be further bounded by which by (28) is smaller than C n s −γ for s ∈ (0, t]. Hence, by induction the stated inequality holds for all n ∈ N. It remains to show that (29) holds. If k is even, a repeated application of (28) and ℓ n+2 < bℓ n yields If k is odd a similar calculation leads to Distinguishing whether n is even or odd, the second term of (29) can be bounded in a similar way and so the whole expression can be bounded by where the two sums can be bounded by c b which implies that (29) holds.
Notice that, as stated in Section 2.1, the inequality (29) shows us that the major contribution to the expected value of N (x, y, n) comes from the paths where the two most powerful vertices are connected via a single connector. To see why, notice that the right-hand side of (29) is, up to a constant, the same as the k = 2 term of the left-hand side. In fact, Lemma 2.5 shows that the dominant class of possible paths is the one described in Section 2.1.
We are now ready to bound the probability of the eventÃ (x) n , i.e. the event that there exists a path of conductance n where the final vertex is the first and only one which has a mark smaller than the corresponding ℓ n . In particular the final vertex is the most powerful vertex of the path. By Mecke's equation, we have dy E x,y N (x, y, n).
Hence, Fubini's theorem and Lemma 2.5 yield As in Section 2.1, with ℓ 0 < t ∧ 1 e we choose the sequence (ℓ n ) n∈N0 for ε > 0, such that and we have Since C n is defined recursively, we can obtain a recursive representation of the sequence (ℓ n ) n∈N0 . Let η n := ℓ −1 n for n ∈ N 0 . Then, we have Hence, there exists a different constant c > 0 such that η 1−γ n+2 ≤ cη γ/δ n + c log(η n+1 )η 1−γ n+1 . By induction, we conclude that there exist b > 0 and B > 0 such that and thus the rate of decay of (ℓ n ) n∈N0 is faster than exponential.
Probability bounds for good paths. We now proceed to establish a bound on the last summand = denotes the union across all possible sets of pairwise distinct vertices x 0 , . . . , x n of the Poisson process. By Mecke's equation the right-hand side can be bounded from above by The following lemma reduces this bound to a non-spatial problem for paths of "reasonable" length which only depends on the marks of x 1 , . . . , x n−1 but not on their location. This allows us to use a similar strategy as the one used by Dereich et al. in [11], where lower bounds for the typical distance of non-spatial preferential attachment models are established.
Remark 2.1. The constants a andκ of Lemma 2.6 depend on the choice of ε and c ε . But for ∆ = O(log |x − y|), for any ǫ > 0 there exists a c ǫ > 0, such that, for |x − y| large enough, we have ∆ ≤ c ε |x − y| ǫ . Thus, if |x − y| is large enough, the choice of a andκ does not depend on |x − y|.
Proof. Let {x, x 1 , . . . , x n−1 , y} be a set of given vertices. By Assumption 1.1 we have As n ≤ c ε |x − y| ǫ , no matter the choice of vertices, there must exist at least one edge between two vertices x k−1 = (x k−1 , t k−1 ) and Hence, the expression above can be further bounded by where the last inequality is achieved by integration over the location of the vertices. We chooseκ > 2c d ε κ 1/δ ∨ 2I ρ . Since δ > 1, the term can be bounded by ε) and therefore there exists a constant a > 0 such that we have By Remark 2.1, with Lemma 2.6 and Fubini's theorem we obtain where x = (x, t 0 ) and y = (y, t n ). We define, and set ν x 0 (s) = δ 0 (t − s). Then, the inequality above can be rewritten as Note that as defined, ν x n (s) can be written recursively as This allows us to establish an upper bound for ν x n (s) analogous to the non-spatial case in [11]. The following lemma is a corollary of [11, Lemma 1].
Proof. For n = 1, we have by (33) that Assume (35) holds for n ∈ N. Then, by (34), we have that Hence, by induction (35) holds for all n ∈ N.
Although Lemma 2.7 holds for an arbitrary sequence (ℓ n ) n∈N , recall that we have chosen (ℓ n ) n∈N such that (30) holds. This implies by (31) that there exists a constant c 1 > 0 such that where η n := ℓ −1 n as before. Additionally, notice that (α n ) n∈N and (β n ) n∈N are nondecreasing sequences. By Lemma 2.7, we have that It follows from the definition of (α n ) n∈N and (β n ) n∈N that β n ≤ c −1 α n+1 and β n ≤ c α n ℓ 1−2γ where the second summand on the right-hand side is bounded by a multiple of the first. Therefore, there exists a constant c 2 > 0 such that β 2 n ≤ c 2 α 2 n+1 ℓ 1−2γ n+1 . This and the monotonicity of the sequences (α n ) n∈N and (ℓ n ) n∈N gives that Recall that the sequence (C n ) n∈N from Lemma 2.5 is defined as We compare this sequence to (α n ) n∈N in order to bound (38) further. By writing α n+2 in terms of α n and β n we have that As all summands on the right-hand side are bounded by a multiple of α n ℓ 1−2γ n log(1/ℓ n+1 ) and log(1/ℓ n+1 ) is smaller than a multiple of log(1/ℓ n ), there exists a constant c 3 such that α n+2 ≤ c 3 α n ℓ 1−2γ n log(1/ℓ n ). To compare (α n ) n∈N and (C n ) n∈N , notice that, up to a constant, α 1 and α 2 are equal to C 1 and C 2 . Moreover Applying this inequality recursively and expressing α n+2 we obtain that for some c 4 > 0 Hence, we have where the second inequality follows by (30) and (37). Observe that, as δ(1−γ) Hence, there exists a constant which is larger than c 1 to the power i j=1 (δ(1 − γ)/γ) i for any i ∈ N and a constant c 5 > 0 such that Furthermore since we have established that η n is of the order displayed in (32) it follows directly that the left-hand side multiplied with the product above can also be bounded by ℓ −c5 n+1 for any sufficiently large constant c 5 > 0. Hence, there exists a further constant . Therefore, we have by using (32) once more that Let D > 0 such that B(1 + c 5 )(γ/(δ(1 − γ))) 1−D 2 < a and choose ∆ ≤ 2 log log|x−y| log(γ/δ(1−γ))) − D. Then the above expression is of order O(log log |x − y| −2 ). Hence, for our choice of ∆, we have which implies the stated lower bound of Theorem 1.1(b).

The non-ultrasmall regime
In this section we consider the case γ < δ δ+1 and show that the graph is not ultrasmall, i.e. the chemical distance in the graph is not of double logarithmic order of the Euclidean distance. In particular, we show the following. Proposition 2.1. Let G be a geometric random graph which satisfies Assumption 1.1 for some δ > 1 and 0 < γ < δ δ+1 . Then, for any p > 1, there exists c > 0 such that, for x, y ∈ R d × (0, 1), we have under P x,y with high probability as |x − y| → ∞.
The proof is structurally analogous to the ultrasmall case, but significantly easier due to the simpler nature of the dominating strategy. As in Section 2.2, we bound the probabilities in (TMB) using a suitable truncation sequence (ℓ n ) n∈N0 such that the probability that bad paths starting in a vertex x exist can be made arbitrarily small. In this case, however, the truncation sequence decreases only exponentially. Similarly to the ultrasmall case, we construct a graphG which contains a copy of G and additionally an edge is added between two vertices x = (x, t) and y = (y, s) ofG whenever Unlike done previously in Section 2.2, we assign no conductance to any edge inG and therefore only consider the lengths of paths. We declare a self-avoiding path P = (x 0 , . . . , x n ) inG step minimizing if there exists no edge between x i and x j for all i, j with |i − j| ≥ 2 and denote byÃ x n the event that there exists a step minimizing path starting in x of length n inG , where the final vertex is the first vertex which has a mark smaller than the corresponding ℓ n . Then the first two summands of the right-hand side of (TMB) can be bounded from above by n there exists a step minimizing path inG of smaller or equal length which also fails to be good on its last vertex.
To bound these probabilities, we define the random variable N (x, y, n) as the number of distinct step minimizing paths between x and y of length n, whose vertices (x 1 , t 1 ), . . . (x n−1 , t n−1 ) fulfill t ≥ ℓ 0 , t 1 ≥ ℓ 1 , . . . , t n−1 ≥ ℓ n−1 and which all have a larger mark than y. By Mecke's equation we have that dy E x,y N (x, y, n), for n ∈ N.
As before, the paths counted in N (x, y, n) can be decomposed such that (19) holds, where K(x, y, k) is the number of step minimizing paths between x and y of length k such that the vertices x 1 , . . . , x k−1 between them have marks larger than x and y. We again refer to such vertices as connectors. Note that if |x − y| d ≤ κ 1/δ (t ∧ s) −γ (t ∨ s) γ−1 , there exists no step minimizing paths of length larger or equal two between x and y. Hence, we have N (x, y, n) = K(x, y, n) = 0 for n ≥ 2 under this assumption.
We now bound the expectation of K(x, y, k). As in Section 2.2, we define a mapping and otherwise e K (x, y, k) = 0. As before we use a binary tree to classify the connection strategies and use this together with Assumption 1.1 to obtain E x,y K(x, y, k) ≤ e K (x, y, k), for k ∈ N.
Then there exists C > 1 such that, for k ≥ 2, we have We now show by induction that holds for all k ≥ 2. This is sufficient, since Cat(k) ≤ 4 k . For k = 2 this follows from (39). Let k ≥ 3 and assume (40) holds for all j = 2, . . . , k − 1. For |x − y| d > κ 1/δ (t ∧ s) −γ (t ∨ s) γ−1 this, together with the definition of e K (x, y, k), yields With (39) the right-hand side can be further bounded by we get that (40) holds for k.
Probability bounds for bad paths Using Lemma 2.8 and (19) we find a suitable upper bound for R d E x,y N (x, y, n)dy, with y = (y, s), which leads to a bound for P x (Ã x n ). Recall that by (19) we have, for n ∈ N, N (x, y, n) ≤ K(x, y, n) + n−1 k=1 z=(z,u) t>u>ℓ n−k ∨s N (x, z, n − k)K(z, y, k).
As in Section 2.2, to establish an upper bound on E x,y N (x, y, n), we define a mapping and otherwise e N (x, y, n) = 0. As in Section 2.2 we have E x,y N (x, y, n) ≤ e N (x, y, n), for n ∈ N. Thus, for a given vertex x = (x, t) and n ∈ N, an upper bound of R d dyE x,y N (x, y, n) is given by the mapping µ x n : (0, t] → [0, ∞) defined by where y = (y, s). We interpret s as the mark of the last vertex of a path counted by the random variable N (x, y, n). With I ρ = dxρ(κ −1/δ |x| d ) we can see by the definition of e N (x, y, 1) that µ x 1 (s) ≤ I ρ s −γ t γ−1 for s ∈ (0, t] and for n ≥ 2 it follows by a short calculation and Lemma 2.8 that where C > 1 is the constant from Lemma 2.8. To establish a bound for µ x n no further assumptions on the truncation sequence (ℓ n ) n∈N0 are necessary. As discussed in Section 2.1 we will see that the major contribution to the mass of µ x n (s) comes from the paths where the two most powerful vertices are connected directly and not via one or more connectors. This is indicated by the definition of the sequence (C n ) n∈N0 and the inequality (44) in the proof of the following lemma. Lemma 2.9. Let x = (x, t) be a given vertex and let the sequence (ℓ n ) n∈N0 be monotonically decreasing with ℓ 0 < t ∧ 1 e . Then, there exists c > 0 such that, for n ∈ N, where C 1 = cℓ γ−1 0 and C n+1 = c log( 1 ℓn )C n . Proof. We choose the constant c > 2(C ∨ I ρ ), where C is as in Lemma 2.8. Then by definition of µ x 1 we have µ x 1 (s) = I ρ s −γ t γ−1 ≤ cs −γ ℓ γ−1 0 = C 1 s −γ for s ∈ (0, t]. Let n ≥ 2 and assume that (43) holds for allñ ≤ n − 1. Then, by (43), We now want to show that since assuming this leads to µ x n (s) ≤ 2I ρ log( 1 ℓn−1 )C n−1 s −γ ≤ c log( 1 ℓn−1 )C n−1 s −γ = C n s −γ , which completes the proof. By definition of the constants C n we have that As log( 1 ℓn ) > 1, for all n ∈ N 0 , we have C n+1 ≥ cC n by definition of (C n ) n∈N0 , and using that c > 2C, the right-hand side can be further bounded by which shows (44).
Now we bound the probability of the eventÃ (x) n , i.e. the event that there exists a path of length n, where the last vertex is the only vertex which has a mark smaller than its truncation bound ℓ n . As in Section 2.2, Mecke's equation yields where we have used Fubini's theorem in the second inequality and Lemma 2.9 in the third one. With ℓ 0 < t ∧ 1 e we choose the sequence (ℓ n ) n∈N for ǫ > 0, such that From the recursive definition of the sequence (C n ) we obtain a recursive representation of (ℓ n ) n∈N0 . Let η n := ℓ −1 n for n ∈ N 0 , then Hence, there exists a new constant c > 0 such that η 1−γ n+1 ≤ c log(η n+1 )η 1−γ n+1 and by induction we get that for any p > 1 there exists B > 1 large enough such that η n ≤ B n log p (n+1) . (45) Probability bounds for good paths We now consider the existence of good paths between two given vertices x and y. We focus on the case γ ∈ ( 1 2 , δ δ+1 ), as the cases γ = 1 2 and γ < 1 2 follow with analogous or simpler arguments. As before we restrict the event B x,y n to the existence of a step minimizing good path of length n connecting x and y inG. Deviating a bit from the method of Section 2.2 we relax the definition of B x,y n by definingB x,y n as the event that there exists a step minimizing path between x and y inG where the most powerful vertex of the path has a mark larger than ℓ ⌊ n 2 ⌋ . Then the term 2∆ n=1 P x,y (B x,y n ) in (TMB) can be replaced by 2∆ n=1 P x,y (B x,y n ).
We characterize the paths used inB x,y n by their powerful vertices, as done for regular paths in [19]. A vertex x k of a path (x 0 , . . . , x n ) is powerful if t i ≥ t k for all i = 0, . . . , k − 1 or if t i ≥ t k for all i = k + 1, . . . , n. Note that by definition the vertices x = x 0 and y = x n are always powerful. The indices of the powerful vertices are a subset of {0, . . . , n} which we denote by {i 0 , i 1 , . . . , i m−1 , i m }, where m+1 is the number of powerful vertices in a path and i 0 = 0, i m = n. As the most powerful vertex of a good path fulfils the assumption above, there exists a k ∈ {0, . . . , m} such that x i k is the most powerful vertex of the path. We decompose the good paths at the powerful vertices first and then proceed to decompose the path between powerful vertices x ij and x ij−1 in the same way as done for the random variable K(x ij−1 , x ij , i j − i j−1 ) in Section 2.2. Using Mecke's equation, we get Then, following the same arguments as in the proof of Lemma 2.6, there exists a > 0 andκ > 0 such that P x,y (B By a simple calculation 1 the sum over k on the right-hand side can be bounded by a constant multiple of Since m−1 k=1 m−2 k−1 ≤ 2 m−2 and the second summand can be bounded by a multiple of the first, there exists a constant c 1 > 0 such that P x,y (B (x,y) n ) is bounded by where we have used (45) for the second inequality and denoted (−1)! = 1 andκ might have changed between the steps. Since for all m = 1, . . . , n and n m=1 n−1 m−1 ≤ 2 n , there exists a constant c 2 ≥ 2(C ∨κ) such that the right-hand side above can be further bounded by By Stirling's formula we have that n n−2 (n−2)! ≤ e n . Hence, there exists c 3 > 0 such that We can see that B (2γ−1)∆ log p (∆+1) dominates the right-hand side in the sense that there exist constants c 4 , c 5 > 0 such that We now set Then, we have that ≤ log(|x − y| a ) 1 − p log log log(|x−y| a ) log log(|x−y| a ) p − log(|x − y| a ).
A second order Taylor expansion shows that the right-hand side converges to −∞ as |x − y| → ∞. Hence, for such a choice of ∆, we have P x,y {d(x, y) ≤ 2∆} ≤ ε + o(1) which implies the statement of Proposition 2.1.

Proof of the upper bound for the chemical distance
To prove the upper bound for the chemical distance, we show the following proposition.  vertices connecting 0 and x, with high probability under P 0,x ( · | 0 ↔ x) as |x| → ∞.
To prove this result, we rely on a strategy introduced in [24]. Since the vertices of G are given by the points of a Poisson process, the most powerful vertex inside a box with volume of order |x| d around the midpoint between 0 and x typically has a mark smaller than |x| −d log |x|. Hence, it is sufficient to construct a short enough path from 0 resp. x to this most powerful vertex inside the box. Here, as in Section 2.1, the typical connection type between two powerful vertices is crucial. For γ > δ δ+1 we expect two powerful vertices to be connected via a vertex with larger mark, which we again call a connector. In fact, the following lemma shows that for a powerful vertex with mark t and a suitable vertex with a sufficiently smaller mark, the probability that there exist no connector which neighbours each of the two vertices is decaying exponentially fast as the mark t gets small. This is a corollary of [24] and follows with the same calculations as in [19,Lemma 3.1]. We now fix for the rest of the section α 1 ∈ 1, γ δ(1−γ) and α 2 ∈ α 1 , γ δ (1 + α 1 δ) , noting that our assumptions ensure that the intervals are nonempty.
We now look into a box H(x) = x 2 + [−2 |x| , 2 |x|] d and introduce a hierarchy of layers L 1 ⊂ L 2 ⊂ . . . ⊂ X ∩ H(x) × (0, 1) of vertices inside the box containing 0 and x. While the layer L 1 only contains vertices with very small mark, vertices with larger and larger marks are included in layers with larger index. More precisely, as in [24] we set L k = X ∩ H(x) × 0, (4 |x|) −dα −k 1 and K = min k ≥ 1 : where η = (γ − (α 2 − α 1 γ)δ) ∧ (α 2 − α 1 ) > 0. As the vertex set X is a Poisson process, by Lemma 3.1 for a given vertex in layer L k+1 there exists with high probability a suitable vertex in layer L k such that both vertices are connected via a connector with high probability. As in [24] and [26] we can use an estimate as in Lemma 3.1 to see that a vertex in L 1 is either the most powerful vertex in the box or connected to it via a connector, with high probability as |x| → ∞. Hence we get that diam(L K ) ≤ 4K.
Since K is of order (1 + o(1)) log log|x| log α1 , to finish the proof it suffices to show that the vertices 0 and x are connected to the layer L K in fewer than o(log log |x|) steps. To do so, we first show that 0 (resp. x) is connected to a vertex with sufficiently small mark and within distance smaller than |x| in finitely many steps. Then, we show that this vertex is connected to a vertex of L K in o(log log |x|) steps. To keep the existence of these two paths sufficiently independent we rely on a sprinkling argument. For b < 1 we assign independently to each vertex in X the color black with probability b and red with probability r = 1 − b. Then, we denote by G b the graph induced by restricting G to the black vertices and the edges between them. In the same way we define G r for the black vertices. Note that G r ∪ G b is a subgraph of G .
We use the black vertices to ensure the existence of the first part of the path in G b . Thus, we define for 0 (and similarly for x) the event E b (D, s, v) that there exists a black vertex z with mark smaller than s and within distance shorter than v such that there exists a path in G b of length smaller D between 0 and z. Then, given z, we use the red vertices to show that z is connected to the layer L K in sufficiently few steps. We denote by L r k the restriction of L k to its red vertices. Observe that we still have diam(L r K ) ≤ 4K in G r , as Lemma 3.1 restricted to G r also holds if the constant c is multiplied by r. We define F to be the event that z is connected by a path of length smaller than o(log log |x|) to L r K in G r . Note that the event 0 ↔ x implies that with high probability 0 and x are part of the unique infinite component K ∞ of G , since P 0,x ({0 ↔ x}\{0, x ∈ K ∞ }) converges to zero as |x| → ∞. This is a consequence of the uniqueness of the infinite component K ∞ as {0 ↔ x}\{0, x ∈ K ∞ } implies that 0 and x are part of the same finite component whose asymptotic proportion of vertices is zero. Thus, to prove Proposition 3.1 it is sufficient to show that for any s > 0 there exists a almost surely finite random variable D(s) such that lim bր1 lim inf sց0 lim inf where θ is the asymptotic proportion of vertices in the infinite component of G and K b ∞ is the infinite component of G b . Note that, as γ > δ δ+1 , the critical percolation parameter of the graph G is 0 by [19], and therefore K b ∞ exists and is unique. Now the probability above can be bounded from below by . We show in the following two lemmas that the last two terms converge to 0 as s → 0 and |x| → ∞ as in [24], which yields lim inf sց0 lim inf where θ b is the asymptotic proportion of vertices in the infinite component of G b . As in [26,Proposition 7] it can be shown that the percolation probability θ b is continuous in b such that θ b converges to θ as b ր 1, which completes the proof. Proof. Assume t < s, then we have where we used for the first inequality that, for z ∈ R d , either |x − z| or |y − z| is larger than |x−y| 2 and for the third inequality that γ > δ δ+1 implies γ + γ/δ − 1 > 0. Lemma A.2. Let x, y ∈ R d , t, s ∈ (0, 1] and 1 e > ℓ > 0 with ℓ < t ∨ s. For γ > δ δ+1 , (δ−1)(γ+γ/δ−1)∧1 .