1 Introduction

While networks appear in many different applications, many real-world networks can be thought of as being embedded in some geometric space. In social networks, for example, the geometric space may contain information on people’s locations and interests, while in networks of chemical reactions, the underlying geometric space may contain information on the properties of the involved molecules. Usually, nearby vertices in these geometric spaces are then more likely to be connected than far away vertices. For this reason, many random graph models that generate geometric networks have been developed. Usually, vertices are then embedded in a continuous, d-dimensional space, and nearby vertices are more likely to connect than further apart ones [4, 7, 19].

In general, cliques are an important indicator of structural network properties of a network. Large cliques, or clique-like structures, indicate the tendency of a network to cluster into groups. Small cliques on the other hand are related to the frequently analyzed clustering coefficient of a network. Therefore, cliques have been an object of intensive studies in many random graph models [1, 2, 9, 11, 12, 15, 19]. Often, these works focus on the total number of cliques, or the largest clique that is present in the network. Comparatively, little attention has been devoted to the properties of these cliques instead. Here, we focus on the structures of the cliques and investigate which types of edges are part of cliques. We investigate this problem for the soft random geometric graph [20], a random graph model that ensures that nearby vertices are more likely to connect to each other than vertices that are further apart, but also allows for longer connections. In this soft version, the random geometric graph can be seen as the grand canonical ensemble that maximizes ensemble entropy under the constraints that the average number of particles (edges) and the average energy that depends on the distance between nodes are fixed to given values [3]. As such, properties of the soft random geometric graph and other versions of it that add more constraints on the model [3, 4] have been an object of intense study [3, 4, 17, 24, 25]. One particular question of interest is in understanding how networks respond to perturbations or failures, see [5, 6]. The longest edge might be a critical factor in determining the robustness of the network structure. Thus, the study of edge-lengths naturally emerges as crucially interesting, offering insights into the probabilistic aspects of network structure and potentially playing a pivotal role in understanding the resilience and response of complex systems.

Intuitively, in geometric models, cliques are formed between nearby vertices, as close by vertices are more likely to connect. However, in the soft random geometric graph, some long edges are present as well. The extreme value properties of these edge-lengths have been investigated in [24, 25], revealing that the longest edge scales polynomially in the length in a given observation window of the model. The presence of these long edges indicates that some cliques may still contain longer edges as well. Our work therefore focuses on the question: what is the longest distance between two points in any k-clique? And how many cliques contain long edges?

We investigate the number of cliques with at least one edge of length larger than some given threshold \(r_n\) in the soft random geometric graph on a d-dimensional torus of radius n, and we study their properties as \(n \rightarrow \infty \). We prove a scaling limit for the number of k-cliques with at least one edge of length exceeding \(r_n\), and prove a localization phenomenon, meaning that if there is at least one long edge in a given clique, then asymptotically there are \(k-2\) long edges with one equal endpoint, while all other edges of the clique are short. We also show that the longest edge over all cliques of size k decreases in length with growing k. In particular, its length scales as \(n^{d/(d-(k-1)\alpha )}\), where \(\alpha \) is a parameter of the random graph model that controls the likelihood of long edges. Furthermore, we show that its re-scaled version converges to a Fréchet distribution. Interestingly, for all fixed k, the total number of k-cliques scales as \(n^d\) in this model. Thus, while the total number of cliques scales similarly in the network size for all fixed k, their extreme behavior is remarkably different.

Organization of the paper

In Sect. 1.1 we give a mathematical formulation of the model we investigate. In Sect. 2 we state the main results of this article. These include the statement of a localization phenomenon, i.e. asymptotically, there is only one remote vertex in cliques with sufficiently large distances, while all other vertices are close to each other, and a scaling limit for the total number of cliques with at least one edge of length \(r_n\). Moreover, we provide a characterization of the largest distance within cliques of arbitrary size. In particular, we find an extreme value behavior by showing that a re-scaled version of the maximal clique distance converges in distribution to a Fréchet distribution. Thereby, we describe the order of magnitude according to which the largest distance decreases with the clique size. The results are accompanied by simulations in order to support our findings. The remaining Sects. 3, 4 and 5 are devoted to the proofs of the individual results. In order to facilitate readability, we provide an index of notation for some objects defined in this paper at the end of the paper.

1.1 Model

We now recall the definition of a soft random geometric graph, which dates back to [20]. It is a finite version of the random connection model, a standard model of continuum percolation (see [14, 18]), and can be defined on more general bounded domains (see [10]).

Throughout this paper, let \(d, n\in \mathbb {N}\), and let \(\mathcal {P}\) be a homogeneous Poisson point process on \(\mathbb {R}^d\) with unit intensity. We will consider \(\mathcal {P}\) on the d-dimensional torus of radius n denoted by \(\mathcal {T}_n^d\) and we view \(\mathcal {P}\) as a random countable subset of \(\mathcal {T}_n^d\). Given the vertex set \(\mathcal {P}\) the edge-set is constructed as follows. Each pair of points \(\textbf{x},\textbf{y} \in \mathcal {P}\) is connected by an edge \(\{\textbf{x},\textbf{y}\}\) with probability \(g(\textbf{x},\textbf{y})\) independently of all other pairs of points, where we assume that g is given by

$$\begin{aligned} g(\textbf{x},\textbf{y})=1-\exp (-|\textbf{x}-\textbf{y}|_T^{-\alpha }), \quad \textbf{x},\textbf{y} \in \mathbb {R}^d, \end{aligned}$$
(1)

for \(\alpha \in (d,\infty )\), where

$$\begin{aligned} |\textbf{x}-\textbf{y}|_T=\sqrt{\sum _{i=1}^d\min (|x_i-y_i|, n-|x_i-y_i|)^2}. \end{aligned}$$
(2)

In particular g has unbounded support and polynomial decay as \(|\textbf{x}-\textbf{y}|_T \rightarrow \infty \). For \(x_i\in \mathbb {R}\), we will also denote \(|x_i|_T=\min (|x_i|, n-|x_i|)\).

1.2 Notation

We end this section by introducing some notation. For \(d\in \mathbb {N}\) we write \([d]=\{1, \ldots , d\}\). For two functions fg we write \(f(n) = o(g(n))\) if \(\lim _{n \rightarrow \infty } f(n)/g(n)=0\), and \(f(n) = O(g(n))\) if \(\limsup _{n \rightarrow \infty } f(n)/g(n) < \infty \). For a sequence of random variables \(X_n\) and a deterministic sequence \(a_n\), \(n\in \mathbb {N}\), we write \(X_n = O_\mathbb {P}(a(n))\) if for every \(\varepsilon >0\) there exists \(M>0\), \(N\in \mathbb {N}\) with \(\mathbb {P} ( |X_n|/a(n) \ge M) < \varepsilon \), for all \(n>N\). Furthermore, we write \(X_n = o_\mathbb {P}(a(n))\) if for every \(\varepsilon >0\) it holds that \(\mathbb {P} ( |X_n|/a(n) \ge \varepsilon ) \rightarrow 0\), as \(n\rightarrow \infty \). Moreover, we write \(\xrightarrow [n\rightarrow \infty ]{\mathbb {P}}\) for convergence in probability, and \(\xrightarrow [n\rightarrow \infty ]{d}\) for convergence in distribution.

2 Main Results

Throughout this paper, we let \(k \in \mathbb {N}\) with \(k\ge 3\). We will consider cliques of size k (k-cliques) in the soft random geometric graph, and focus on their geometric structure. In particular, we focus on the number of cliques with at least one edge of length at least \(r_n\). We also state a localization phenomenon. That is, with high probability, cliques with long edges contain only one remote vertex, and all other vertices of the clique are nearby.

To be more precise, let \(W_n^k(r_n)\) denote the number of k-cliques with at least one edge of length at least \(r_n\), and \(r_n\) is a given sequence with \(r_n \rightarrow \infty \), as \(n \rightarrow \infty \). Let \(W_n^k(r_n,\varepsilon )\) denote the number of k-cliques with exactly \(k-1\) edges of length at least \(r_n\), and all other edges of length at most \(1/\varepsilon \), for some \(\varepsilon >0\). Then, the following theorem shows that these types of cliques with exactly \(k-1\) long edges are asymptotically all cliques with long edges:

Theorem 2.1

(Number of cliques with long edges) When \(1\ll r_n\ll n^{d/((k-1)\alpha -d)}\), then for all \(\varepsilon _n\) such that \(\lim _{n\rightarrow \infty }\varepsilon _n=0\) and \(\varepsilon _n\ge \log (n)^{-1}\),

$$\begin{aligned} \frac{W_n^k(r_n,\varepsilon _n)}{W_n^k(r_n)} \xrightarrow [n\rightarrow \infty ]{\mathbb {P}}1. \end{aligned}$$

Furthermore,

$$\begin{aligned} \frac{W_n^k(r_n)}{n^{d}r_n^{d-(k-1)\alpha }} \xrightarrow [n\rightarrow \infty ]{\mathbb {P}}\frac{dC_d\pi ^{d/2} M_k}{2\Gamma (1+\frac{d}{2})((k-1)\alpha -d)}, \end{aligned}$$

where

$$\begin{aligned} M_k = 2\int _{[-\infty ,\infty ]^d}\dots \int _{[-\infty ,\infty ]^d}\prod _{i=1}^{k-2}g(\mathbf {x_i},\textbf{0})\prod _{1\le u<v\le k-2}g(\mathbf {x_u},\mathbf {x_v})d\mathbf {x_1}\dots d\mathbf {x_{k-2}}<\infty . \end{aligned}$$
(3)

and \(C_d \) is the constant such that the volume of the d-dimensional torus is \(C_dn^d.\) In particular, for triangles (when \(k=3\)),

$$\begin{aligned} M_3 = \frac{2\pi ^{d/2}\Gamma (1-\frac{d}{\alpha })}{\Gamma (1+d/2)}. \end{aligned}$$

Moreover,

$$\begin{aligned} \frac{W^k_n(r_n) - \mathbb {E}[W^k_n(r_n)] }{\sqrt{\mathbb {E}[W^k_n(r_n)]}} \xrightarrow [n\rightarrow \infty ]{d} Z, \end{aligned}$$

where Z is a random variable with standard normal distribution.

This theorem shows that asymptotically all cliques with long edges are formed between \(k-1\) close by vertices within a radius of \(1/\varepsilon _n\), and \(k-1\) edges of length at least \(r_n\). Furthermore, the scaling of the largest distance in a k-clique decreases in k, as \(\alpha >d\).

We now focus on the boundary case of Theorem 2.1, where \(r_n\sim n^{d/((k-1)\alpha -d)}\), and show that this corresponds to the explicit asymptotics of the longest distance found in any k-clique. The following theorem shows that, for this choice of the sequence \(r_n\), we now get a convergence in distribution to a Poisson distribution:

Theorem 2.2

(Large distances in k-cliques) Let

$$\begin{aligned} r_n = \Bigg (\frac{2\Gamma (1+\frac{d}{2})((k-1)\alpha -d)}{dC_d\pi ^{d/2} M_k}\Bigg )^{\frac{1}{d-(k-1)\alpha }} r n^{\frac{d}{(k-1)\alpha -d}}, \quad r \in (0, \infty ), \end{aligned}$$

where the constants \(C_d,M_k\) are as in Theorem 2.1. Then

$$\begin{aligned} W^k_n(r_n) \xrightarrow [n\rightarrow \infty ]{d} W^k, \end{aligned}$$

where \(W^k\) denotes a random variable with Poisson distribution and mean \(r^{d-(k-1)\alpha }\).

Remark 2.1

The quantity \(W^k_n(r_n)\) can be considered to be a U-statistic of order k, for which there are powerful results regarding Poisson approximation, see for example [8, Theorem 7.1]. However, the method that we will use in this paper relies on the approach presented in [21], as it will give us stronger bounds that will also enable us to prove the central limit theorem in Theorem 2.1, see also Remark 4.1 below for a more detailed discussion.

A consequence of Theorem 2.2 is the following corollary, which gives the explicit asymptotics of the longest distance found in k-cliques and reveals its extreme value behavior.

Corollary 2.3

(Maximal distance within a clique) Let \(e_n^{k,*}\) be the largest distance within a clique of size k, i.e. the largest edge-length in such a clique. Then,

$$\begin{aligned} \Bigg ( \frac{2\Gamma (1+\frac{d}{2})((k-1)\alpha -d)}{dC_d\pi ^{d/2} M_k} \Bigg )^{\frac{1}{(k-1)\alpha -d}} n^{\frac{d}{d-(k-1)\alpha }} e_n^{k,*} \xrightarrow [n\rightarrow \infty ]{d} \Phi _{(k-1)\alpha -d}, \end{aligned}$$

where \(\Phi _{(k-1)\alpha -d}\) denotes a Fréchet distribution with parameter \((k-1)\alpha -d\), and the constants \(C_k, M_k\) are as in Theorem 2.1.

In [24, 25] the scaling of the length of the overall longest edge (i.e. not necessarily in a clique) was found to be \(n^{\frac{d}{\alpha -d}}\), up to a constant, which would correspond to the same scaling if we plugged in \(k=2\) in Corollary 2.3. This would indicate that the overall largest edge-length is typically not part of a clique.

Finally, we focus on the overall distances in k-cliques. Let \(K_k(\varepsilon _n)\) denote the number of k-cliques where all vertices have interdistances at most \(1/\varepsilon _n\), and \(K_k\) the total number of k-cliques.

Theorem 2.4

For all \(\varepsilon _n\) such that \(\lim _{n\rightarrow \infty }\varepsilon _n=0\) and \(\varepsilon _n\ge \log (n)^{-1}\),

$$\begin{aligned} \frac{K_k(\varepsilon _n)}{K_k} \xrightarrow [n\rightarrow \infty ]{\mathbb {P}}1. \end{aligned}$$

Simulations. To illustrate our results, we provide some simulations, where we simulate a soft random geometric graph on a (one-dimensional) torus and compute the empirical distribution function of the longest edge among all triangles. We plot the empirical cumulative distribution function (see Fig. 1) derived from repeated simulations of the length of the longest edge among all triangles (in dimension \(d=1\) for the value \(\alpha =4\)), normalized by

$$\begin{aligned} \Big [\frac{\pi \Gamma (\frac{3}{4})}{7\Gamma (\frac{3}{2})^2} \Big ]^{\frac{1}{7}} n^{\frac{1}{7}}. \end{aligned}$$

According to Corollary 2.3, this normalized length approximates a Fréchet distribution with parameter 7, which is also supported by the simulations.

Fig. 1
figure 1

Empirical distribution function of the normalized largest edge-length among all triangles derived from N simulations (points) compared with the distribution function of the Fréchet distribution from Corollary 2.3 (green line) (Color figure online)

3 Proof of Theorem 2.1

In this section, we prove Theorem 2.1. We first focus on triangles, and then show how these results extend to the k-clique setting.

3.1 Special Case: Triangles

Recall that by assumption the intensity of the Poisson point process equals 1. Let \(\textbf{0}\) denote the zero vector and \(\mathbf {r_n}=[r_n,0,\dots ,0]\in \mathbb {R}^d\). Let \(\mathbb {E}\left[ T_n(r_n)\right] \) be the average number of triangles that a given edge of length \(r_n\) is part of. We compute

$$\begin{aligned} \mathbb {E}\left[ T_n(r_n)\right] = \int _{\mathcal {T}_n^d}g(\textbf{x},\textbf{0})g(x,\mathbf {r_n})d\textbf{x}. \end{aligned}$$
(4)

Then

$$\begin{aligned} \mathbb {E}\left[ T_n(r_n)\right] = \int _{\mathcal {T}_n^d}(1-\exp (-|\textbf{x}|_T^{-\alpha }))(1-\exp (-|\textbf{x}-\textbf{r}_n|_T^{-\alpha }))dx. \end{aligned}$$

The following lemma then gives an explicit expression for \(\mathbb {E}\left[ T_n(r_n)\right] \):

Lemma 3.1

Let \(1 \ll r_n\ll n\) and \(\alpha >d\). Then,

$$\begin{aligned} \frac{\mathbb {E}\left[ T_n(r_n)\right] }{ r_n^{-\alpha }}= \frac{2\pi ^{d/2}\Gamma (1-\frac{d}{\alpha })}{\Gamma (1+\frac{d}{2})}(1+o(1)). \end{aligned}$$

Proof

We prove the lemma by constructing a matching upper and lower bound for the expectation.

Lower bound. Let \(\mathcal {B}_{0,1/\varepsilon _n}\) be the d-dimensional ball of radius \(1/\varepsilon _n\) around 0, and \(\mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\) the ball of radius \(1/\varepsilon _n\) around \(\mathbf {r_n}\), where \(\varepsilon _n\rightarrow 0\) as \(n\rightarrow \infty \). To lower bound (4) we focus on the contribution from the integral on \(\mathcal {B}_{0,1/\varepsilon _n}\) and \(\mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\), which are the same by symmetry.

For \(0\ll r_n\ll n\),

$$\begin{aligned}&\int _{\mathcal {B}_{0,1/\varepsilon _n}}(1-\exp (-|\textbf{x}|_{T}^{-\alpha }))(1-\exp (-|\textbf{x}-\textbf{r}_n|_{T}^{-\alpha }))d\textbf{x}\nonumber \\&= r_n^{-\alpha }(1+o(1)) \int _{\mathcal {B}_{0,1/\varepsilon _n}}(1-\exp (-|\textbf{x}|_{T}^{-\alpha }))d\textbf{x}. \end{aligned}$$
(5)

Then, switching to spherical coordinates yields

$$\begin{aligned}&\int _{\mathcal {B}_{0,1/\varepsilon _n}}(1-\exp (-|\textbf{x}|_{T}^{-\alpha }))d\textbf{x}\nonumber \\&= \int _0^{1/\varepsilon _n}\!\int _{0}^{2\pi }\!\int _0^\pi \!\dots \!\int _0^\pi \rho ^{d-1}(1-\exp (-\rho ^{-\alpha }))\sin (\phi _1)^{d-2}\dots \sin (\phi _{d-2})d\phi _1 \dots d\phi _{d-1} d\rho \nonumber \\&= \frac{2\pi ^{d/2}}{\Gamma (\frac{d}{2})} \int _0^{1/\varepsilon _n} \rho ^{d-1}(1-\exp (-\rho ^{-\alpha })) d\rho . \end{aligned}$$
(6)

Thus,

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}\left[ T_n(r_n)\right] }{r_n^{-\alpha }}\ge \frac{4\pi ^{d/2}}{\Gamma (\frac{d}{2})} \int _0^{\infty }\rho ^{d-1}(1-\exp (-\rho ^{-\alpha }))d\rho , \end{aligned}$$

where the extra factor 2 arises from the fact that the integral over \(\mathcal {B}_{0,1/\varepsilon _n}\) and \(\mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\) are the same by symmetry. Partial integration and a substitution of \(u=x^{-\alpha }\) gives

$$\begin{aligned} \int _0^{\infty }\!x^{d-1}(1-\exp (-x^{-\alpha }))dx&= \!\frac{\alpha }{d}\! \int _0^{\infty }\!x^{d}x^{-\alpha -1}\exp (-x^{-\alpha })dx \!+\! \frac{1}{d}x^d(1-\exp (-x^{-\alpha }))\Big |_0^\infty \nonumber \\&= \frac{1}{d} \int _0^{\infty }u^{-d/\alpha }\exp (-u)du =\frac{1}{d} \Gamma (1-d/\alpha ), \end{aligned}$$

where \(\Gamma \) denotes the gamma function. Taking the limit of \(n\rightarrow \infty \) yields

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}\left[ T_n(r_n)\right] }{ r_n^{-\alpha }}\ge \frac{4\pi ^{d/2}}{d\Gamma (\frac{d}{2})}\Gamma (1-\frac{d}{\alpha }) = \frac{2\pi ^{d/2}\Gamma (1-\frac{d}{\alpha })}{\Gamma (1+\frac{d}{2})}. \end{aligned}$$
(7)

Upper bound. For an upper bound, we show that the contribution to the expected value on \(\mathcal {B}_{0,1/\varepsilon _n}\) and \(\mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\) dominates, and that all other contributions to the expectation are asymptotically smaller. We therefore show that the integral of (4) over areas where \(\textbf{x}\notin \mathcal {B}_{0,1/\varepsilon _n}\cup \mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\) is small.

We first investigate the region where \(|\textbf{x}-\mathbf {r_n}|_T\ge r_n/2\). This yields

$$\begin{aligned}&\int _{\mathcal {T}_n^d\setminus (\mathcal {B}_{0,1/\varepsilon _n}\cup \mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}):|\textbf{x}-\mathbf {r_n}|_T\ge r_n/2} (1-\exp (-|\textbf{x}|_T^{-\alpha }))(1-\exp (-|\textbf{x}-\textbf{r}_n|_T^{-\alpha }))d\textbf{x}\nonumber \\&= r_n^{-\alpha }(1+o(1))\int _{\mathcal {T}_n^d\setminus (\mathcal {B}_{0,1/\varepsilon _n}\cup \mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}):|\textbf{x}-\mathbf {r_n}|_T\ge r_n/2} (1-\exp (-|\textbf{x}|_T^{-\alpha }))d\textbf{x}\nonumber \\&\le r_n^{-\alpha }(1+o(1))\int _{\mathcal {T}_n^d\setminus \mathcal {B}_{0,1/\varepsilon _n}} (1-\exp (-|\textbf{x}|_T^{-\alpha }))d\textbf{x}. \end{aligned}$$
(8)

Furthermore, switching to spherical coordinates yields

$$\begin{aligned}&\int _{\mathcal {T}_n^d\setminus \mathcal {B}_{0,1/\varepsilon _n}} (1-\exp (-|\textbf{x}|_T^{-\alpha }))d\textbf{x}\nonumber \\&\le \int _{1/\varepsilon _n}^\infty \int _{0}^{2\pi }\int _0^\pi \dots \int _0^\pi \rho ^{d-1}(1-\exp (-\rho ^{-\alpha }))\sin (\phi _1)^{d-2}\dots \sin (\phi _{d-2})d\phi _1 \dots d\phi _{d-1} d\rho \nonumber \\&= \frac{2\pi ^{d/2}}{\Gamma (\frac{d}{2})} \int _{1/\varepsilon _n}^\infty \rho ^{d-1}(1-\exp (-\rho ^{-\alpha }))d\rho . \end{aligned}$$
(9)

Moreover, the contribution to the integral of (4) over the region where \(|x -r_n|_T < r_n/2\) is symmetric to the contribution over the region where \(|\textbf{x}|_T< r_n/2\), which equals

$$\begin{aligned}&\int _{\mathcal {T}_n^d\setminus (\mathcal {B}_{0,1/\varepsilon _n}\cup \mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}):|\textbf{x}|_T< r_n/2} (1-\exp (-|\textbf{x}|_T^{-\alpha }))(1-\exp (-|\textbf{x}-\textbf{r}_n|_T^{-\alpha }))d\textbf{x}\nonumber \\&= r_n^{-\alpha }(1+o(1))\int _{\mathcal {B}_{0,r_n/2}\setminus \mathcal {B}_{0,1/\varepsilon _n}} (1-\exp (-|\textbf{x}|_T^{-\alpha }))d\textbf{x}. \end{aligned}$$
(10)

Again, changing to spherical coordinates then yields

$$\begin{aligned}&\int _{\mathcal {B}_{0,r_n/2}\setminus \mathcal {B}_{0,1/\varepsilon _n}} (1-\exp (-|\textbf{x}|_T^{-\alpha }))d\textbf{x}\nonumber \\&\le \int _{1/\varepsilon _n}^{r_n/2} \! \int _{0}^{2\pi }\!\int _0^\pi \dots \!\int _0^\pi \rho ^{d-1}(1-\exp (-\rho ^{-\alpha }))\sin (\phi _1)^{d-2}\dots \sin (\phi _{d-2})d\phi _1 \dots d\phi _{d-1} d\rho \nonumber \\&= \frac{2\pi ^{d/2}}{\Gamma (\frac{d}{2})} \int _{1/\varepsilon _n}^{r_n/2} \rho ^{d-1}(1-\exp (-\rho ^{-\alpha }))d\rho . \end{aligned}$$
(11)

Combining (8) (9), (10) (11) gives that

$$\begin{aligned}&\int _{\mathcal {T}_n^d\setminus ({\mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\cup \mathcal {B}_{\textbf{0},1/\varepsilon _n}})}(1-\exp (-|\textbf{x}|_T^{-\alpha }))(1-\exp (-|\textbf{x}-\textbf{r}_n|_T^{-\alpha }))dx \nonumber \\&\le \tilde{C} r_n^{-\alpha }\int _{1/\varepsilon _n}^\infty \rho ^{d-1}(1-\exp (-\rho ^{-\alpha }))d\rho \nonumber \\&= O(r_n^{-\alpha }\varepsilon _n^{\alpha -d}). \end{aligned}$$
(12)

for some \(\tilde{C}>0\).

Thus, as \(d<\alpha \), this contribution is \(o(r_n^{-\alpha })\) because \(\varepsilon _n\rightarrow 0\). Therefore,

$$\begin{aligned} \int _{\mathcal {T}_n^d\setminus ({\mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\cup \mathcal {B}_{\textbf{0},1/\varepsilon _n}})}(1-\exp (-|\textbf{x}|_T^{-\alpha }))(1-\exp (-|\textbf{x}-\textbf{r}_n|_T^{-\alpha }))dx = o(r_n^{-\alpha }). \end{aligned}$$
(13)

Combining this with the integrals over \(\mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\) and \(\mathcal {B}_{0,1/\varepsilon _n}\) of (7), results in

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}\left[ T_n(r_n)\right] }{ r_n^{-\alpha }}\le \frac{2\pi ^{d/2}\Gamma (1-\frac{d}{\alpha })}{\Gamma (1+\frac{d}{2})}. \end{aligned}$$

\(\square \)

We now integrate over all edges of length at least \(r_n\) to obtain the expected number of triangles with one edge of length at least \(r_n\):

Lemma 3.2

Let \(1 \ll r_n\ll n\) and \(\alpha >d\). Then,

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}\left[ W_n^3(r_n)\right] }{n^{d}r_n^{d-2\alpha }} = \frac{dC_d\pi ^{d}\Gamma (1-\frac{d}{\alpha })}{\Gamma (1+\frac{d}{2})^2(2\alpha -d)}. \end{aligned}$$

Proof of Lemma 3.2

Lemma 3.1 gives the expected number of triangles containing a given edge of length \(r_n\). To obtain the expected number of triangles from one vertex at the origin that contain at least one edge of length at least \(r_n\), we integrate \(T_n(x)\) and the probability \(g(\textbf{x},0)\) that a vertex at the origin forms an edge with a vertex at location \(\textbf{x}\) over the region with distance at least \(r_n\) from the origin. Thus, using the notation g(r, 0) for \(g(\textbf{x},0)\) with \(|\textbf{x}|_T = r\), we get

$$\begin{aligned}&\frac{2\pi ^{d/2}\Gamma (1-\frac{d}{\alpha })}{\Gamma (1+\frac{d}{2})} \int _{r_n}^\infty \int _{0}^{2\pi }\int _0^\pi \dots \int _0^\pi \rho ^{d-1-\alpha }g(\rho ,0)\sin (\phi _1)^{d-2}\dots \nonumber \\&\qquad \sin (\phi _{d-2})d\phi _1 \dots d\phi _{d-1} d\rho \nonumber \\&= \frac{4\pi ^{d}\Gamma (1-\frac{d}{\alpha })}{\Gamma (\frac{d}{2})\Gamma (1+\frac{d}{2})}\int _{r_n}^\infty \rho ^{d-1-2\alpha }d\rho \big ( 1 + o(1) \big ) \nonumber \\&= \frac{4\pi ^{d}\Gamma (1-\frac{d}{\alpha })r_n^{d-2\alpha }}{\Gamma (\frac{d}{2})\Gamma (1+\frac{d}{2})(2\alpha -d)}\big ( 1 + o(1) \big ). \end{aligned}$$
(14)

As the volume of the d-dimensional torus equals \(C_dn^d\) and the intensity of the Poisson point process equals 1, there are on average \(C_dn^d\) vertices, and in this manner every triangle is counted twice. Thus,

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}\left[ W^3_n(r_n)\right] }{n^{d}r_n^{d-2\alpha }} = \frac{2C_d\pi ^{d}\Gamma (1-\frac{d}{\alpha }) }{\Gamma (\frac{d}{2})\Gamma (1+\frac{d}{2})(2\alpha -d)}= \frac{dC_d\pi ^{d}\Gamma (1-\frac{d}{\alpha })}{\Gamma (1+\frac{d}{2})^2(2\alpha -d)}, \end{aligned}$$
(15)

where we have used that \(x\Gamma (x)=\Gamma (x+1)\). \(\square \)

3.2 Generalizing to k-Cliques

In this subsection, we generalize Lemma 3.2 from triangles to k-cliques to obtain the expected number of k-cliques with one edge of length at least \(r_n\):

Lemma 3.3

Let \(1\ll r_n\ll n\) and \(\alpha >d\). Then, for all \(k\ge 3\),

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}\left[ W_n^k(r_n)\right] }{n^{d}r_n^{d-(k-1)\alpha }} = \frac{dC_d\pi ^{d/2} M_k}{2\Gamma (1+\frac{d}{2})((k-1)\alpha -d)}, \end{aligned}$$

where \(M_k\) is as in (3).

Proof

Define \(A_k(r_n)\) as the number of k-cliques that a given edge of length \(r_n\) is part of. We first compute its expected value:

$$\begin{aligned} \mathbb {E}\left[ A_k(r_n)\right] = \int _{\mathcal {T}_n^d}\dots \int _{\mathcal {T}_n^d}\prod _{i=1}^{k-2}g(\mathbf {x_i},\textbf{0})g(\mathbf {x_i},\mathbf {r_n})\prod _{1\le u<v\le k-2}g(\mathbf {x_u},\mathbf {x_v})d\mathbf {x_1}\dots d\mathbf {x_{k-2}}. \end{aligned}$$
(16)

Again, we lower bound by the contribution from \(\textbf{x}\in \mathcal {B}_{0,1/\varepsilon _n}\), such that \(g(\textbf{x},\mathbf {r_n})= r_n^{-\alpha }(1+o(1))\), and \(\textbf{x}\in \mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\) such that \(g(\textbf{x},\textbf{0})= r_n^{-\alpha }(1+o(1))\). As a lower bound, this gives

$$\begin{aligned} \mathbb {E}\left[ A_k(r_n)\right]&\ge 2r_n^{-(k-2)\alpha }\int _{\mathcal {B}_{0,1/\varepsilon _n}}\dots \int _{\mathcal {B}_{0,1/\varepsilon _n}}\prod _{i=1}^{k-2}g(\mathbf {x_i},\textbf{0})\nonumber \\&\quad \quad \times \prod _{1\le u<v\le k-2}g(\mathbf {x_u},\mathbf {x_v})d\mathbf {x_1}\dots d\mathbf {x_{k-2}}(1+o(1)). \end{aligned}$$
(17)

Taking the limit of \(n\rightarrow \infty \) then yields

$$\begin{aligned}&\lim _{n\rightarrow \infty } \frac{\mathbb {E}\left[ A_k(r_n)\right] }{r_n^{-(k-2)\alpha }} \ge \nonumber \\&2\int _{[-\infty ,\infty ]^d}\dots \int _{[-\infty ,\infty ]^d}\prod _{i=1}^{k-2}g(\mathbf {x_i},\textbf{0})\prod _{1\le u<v\le k-2}g(\mathbf {x_u},\mathbf {x_v})d\mathbf {x_1}\dots d\mathbf {x_{k-2}}, \end{aligned}$$
(18)

where the factor 2 arises from the symmetric contribution from \(\mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\).

As an upper bound, we bound the probability that a k-clique with long edge between 0 and \(\textbf{r}_n\) appears by the probability that \(k-2\) triangles appear with the vertices at 0 and at \(r_n\). This can be written as

$$\begin{aligned} \mathbb {E}\left[ A_k(r_n)\right] \le \Big (\int _{\mathcal {T}_n^d}g(\textbf{x},\textbf{0})g(\textbf{x},\mathbf {r_n})d\textbf{x}\Big )^{k-2}. \end{aligned}$$
(19)

Now the contribution to this integral from \(\textbf{x}\notin \mathcal {B}_{0,1/\varepsilon _n}\cup \mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\) can be bounded using  (13). Thus, the expected number of k-cliques with at least one edge of length \(r_n\) and \(\textbf{x}\notin \mathcal {B}_{0,1/\varepsilon _n}\cup \mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\), \(\bar{A}_k(r_n,\varepsilon _n)\), satisfies

$$\begin{aligned} \mathbb {E}\left[ \bar{A}_k(r_n,\varepsilon _n)\right]&\le Kr_n^{-\alpha (k-2)}\Bigg ( \int _{1/\varepsilon _n}^\infty x^{d-1}(1-\exp (-x^{-\alpha }))dx \Bigg ) ^{k-2}\nonumber \\&= O(r_n^{-\alpha (k-2)}\varepsilon _n^{d-\alpha })=o(r_n^{-\alpha (k-2)}), \end{aligned}$$
(20)

where the last equality holds because \(\varepsilon _n\rightarrow 0\) as \(n\rightarrow \infty \). Thus, this contribution is sufficiently small compared to the contribution from \(\mathcal {B}_{\textbf{0},1/\varepsilon _n}\) and \(\mathcal {B}_{\mathbf {r_n},1/\varepsilon _n}\) of (17) when n becomes large. This shows that

$$\begin{aligned}&\lim _{n\rightarrow \infty } \frac{\mathbb {E}\left[ A_k(r_n)\right] }{r_n^{-(k-2)\alpha }} \nonumber \\&= 2\int _{[-\infty ,\infty ]^d}\dots \int _{[-\infty ,\infty ]^d}\prod _{i=1}^{k-2}g(\mathbf {x_i},\textbf{0})\prod _{1\le u<v\le k-2}g(\mathbf {x_u},\mathbf {x_v})d\mathbf {x_1}\dots d\mathbf {x_{k-2}}. \end{aligned}$$
(21)

We then integrate this to obtain the average number of k-cliques with one long edge centered at a single vertex at the origin. Using that the volume of the d-dimensional torus is \(C_dn^d\), this yields similarly to (14) and (15) that

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}\left[ W_n^k(r_n)\right] }{n^d r_n^{d-(k-1)\alpha }}&= \frac{\pi ^{d/2} M_kC_d}{\Gamma (\frac{d}{2})((k-1)\alpha -d)}\nonumber \\&= \frac{dC_d\pi ^{d/2} M_k}{2\Gamma (1+\frac{d}{2})((k-1)\alpha -d)}, \end{aligned}$$
(22)

where we have used that \(d/2\Gamma (d/2)=\Gamma (1+d/2)\). Similarly, we obtain that

$$\begin{aligned} \mathbb {E}\left[ \bar{W}_k(r_n,\varepsilon _n)\right] = O(\varepsilon _n^{d-\alpha }n^dr_n^{d-(k-1)\alpha }), \end{aligned}$$
(23)

where \(\bar{W}_n^k(r_n,\varepsilon )\) denotes the number of k-cliques with at least one edge of length at least \(r_n\) that are not in \(W_n^k(r_n,\varepsilon )\). \(\square \)

Finally, we bound the variance of the number of cliques with \(k-1\) long edges and all other edges of length at most \(1/\varepsilon \):

Lemma 3.4

Suppose that \(\varepsilon _n\ge 1/\log (n)\) and that \(\lim _{n\rightarrow \infty }\varepsilon _n=0\). When \(r_n\ll n^{d/((k-1)\alpha -d)}\),

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{Var \left( W_n^k(r_n,\varepsilon _n)\right) }{\mathbb {E}\left[ W_n^k(r_n,\varepsilon _n)\right] ^2} = 0. \end{aligned}$$

Proof

First of all, we can interpret \(W_n^k(r_n,\varepsilon )\) as a U-statistic of order k on a marked space. Because of this, [22, Lemma 3.5] yields that

$$\begin{aligned}&Var \left( W_n^k(r_n,\varepsilon )\right) = \sum _{s=1}^ks!{k \atopwithdelims ()s}^2\int _{\mathcal {T}_n^d}\dots \int _{\mathcal {T}_n^d} \nonumber \\ {}&\times \Bigg (\int _{\mathcal {B}^{d}_{0,n}}\dots \int _{\mathcal {B}^{d}_{0,n}} f({\textbf {x}}_1,\dots ,{\textbf {x}}_s,{\textbf {y}}_1,\dots ,{\textbf {y}}_{k-s}) d{\textbf {y}}_1\dots d{\textbf {y}}_{k-s}\Bigg )^2 d{\textbf {x}}_1\dots d{\textbf {x}}_s, \end{aligned}$$
(24)

where \(f(\textbf{x}_1,\dots ,\textbf{x}_s,\textbf{y}_1,\dots ,\textbf{y}_{k-s})\) denotes the probability that \(\textbf{x}_1,\dots ,\textbf{x}_s,\textbf{y}_1,\dots ,\textbf{y}_{k-s}\) forms a clique with \(k-1\) edges of length at least \(r_n\), and that all other edges have length at most \(\varepsilon _n\). Thus, each term inside the summation measures the expected number of two ’glued’ together cliques with overlap at s vertices.

Define

$$\begin{aligned} \tilde{g}(x)=1-\exp (-x^{-\alpha }). \end{aligned}$$
(25)

Now cliques in \(W_n^k(r_n,\varepsilon _n)\) have \(k-1\) long edges of length at least \(r_n\), while all other edges have length at most \(1/\varepsilon _n\). We first focus on the contribution to the summand in (24) from \(s\in { 2,\dots , k-1}\). When the overlap between the cliques occurs at the vertex incident with the long edges of length \(\ge r_n\), then \(s-1\) long edges overlap (and several short edges possibly as well). In this case, using (21) shows that this summand equals

$$\begin{aligned}&O\left( \mathbb {E}\left[ W_n^k(r_n,\varepsilon _n)\right] \mathbb {E}\left[ A_{k-(s-1)}(r_n)\right] \tilde{g}(\varepsilon _n^{-1})^t\right) \nonumber \\&= O\left( \mathbb {E}\left[ W_n^k(r_n)\right] \mathbb {E}\left[ A_{k-(s-1)}(r_n)\right] \tilde{g}(\varepsilon _n^{-1})^t\right) \nonumber \\&= O\left( n^dr_n^{d-2(k-1)\alpha +s\alpha } \tilde{g}(\varepsilon _n^{-1})^t\right) \nonumber \\&=O\left( n^dr_n^{d-(k-1)\alpha } \tilde{g}(\varepsilon _n^{-1})^t\right) , \end{aligned}$$
(26)

for some \(t\ge 0\). Here \(A_{k-(s-1)}(r_n)\) is defined as in the proof of Lemma 3.3. Indeed, to glue two cliques together at \(s-1\) long edges, one first needs to create one clique with \(k-1\) long edges, which gives the \(\mathbb {E}\left[ W_n^k(r_n,\varepsilon )\right] \) term of Lemma 3.3. Given the presence of this clique, when sharing \(s-1\) long edges, this means that to generate the second clique, only a clique with \(k-s\) extra long edges has to be added to generate the second clique (plus some additional short edges which appear with \(O(\tilde{g}(1/\varepsilon _n))\) probability). This is equivalent to generating a new clique of size \(k-s+1\), (where the +1 comes from the fact that the vertex incident to the long edges is also counted) from a given edge of length at least \(r_n\), giving a contribution of \(\mathbb {E}\left[ A_{k-(s-1)}(r_n)\right] \), computed in (21).

When \(s=k\), then the term in the summand of (24) equals \(\mathbb {E}\left[ W_n^k(r_n,\varepsilon )\right] \).

When the overlap occurs at only short edges however or when \(s=1\) and no edges overlap at all, the second clique still needs to contain \(k-1\) long edges. As the probability of a short edge is \(O(g(1/\varepsilon _n))\), this means that to generate the second clique there are on average \(\mathbb {E}\left[ W_n^k(r_n,\varepsilon )\right] /O(g(1/\varepsilon _n)^{s(s-1)/2}n^{ds})\) options, as there are \(s(s-1)/2\) short edges in the overlapping part of the cliques and on average \(O(n^{ds})\) sets of s vertices. combined, this gives a contribution to the summand in (24) of

$$\begin{aligned}&O\left( \mathbb {E}\left[ W_n^k(r_n,\varepsilon _n)\right] ^2/\tilde{g}(\varepsilon _n^{-1})^{s(s-1)/2}n^{ds}\right) \nonumber \\ {}&= O\left( n^{2d-sd}r_n^{2(d-(k-1)\alpha )}/\tilde{g}(\varepsilon _n^{-1})^{-s(s-1)/2}\right) \nonumber \\ {}&= O\left( n^{d}r_n^{2(d-(k-1)\alpha )}/\tilde{g}(\varepsilon _n^{-1})^{-k(k-1)/2}\right) . \end{aligned}$$
(27)

Now \(\tilde{g}(\varepsilon _n^{-1}) = O(\varepsilon _n^\alpha )\). As by assumption, \(\varepsilon _n\ge \log (n)^{-1}\) and k is fixed, (24), (26) and (27) combined yield that

$$\begin{aligned} Var \left( W_n^k(r_n,\varepsilon _n)\right) = O\left( n^dr_n^{d-(k-1)\alpha }\tilde{g}(\varepsilon _n^{-1})^{-t}\right) \end{aligned}$$
(28)

for some \(t>0\).

Now by (22) and (23) and the fact that \(\varepsilon _n\rightarrow 0\) as \(n\rightarrow \infty \), \(\mathbb {E}\left[ W_n^k(r_n,\varepsilon )\right] ^2 = O(n^{2d}r_n^{2(d-(k-1)\alpha })\). Combining this with (28) yields that for fixed \(\varepsilon \),

$$\begin{aligned} \frac{Var \left( W_n^k(r_n,\varepsilon _n)\right) }{\mathbb {E}\left[ W_n^k(r_n,\varepsilon _n)\right] ^2} = O(n^{-d}r_n^{(k-1)\alpha -d}\varepsilon ^{-\alpha t}). \end{aligned}$$
(29)

Now this expression tends to zero when \(r_n\ll n^{d/((k-1)\alpha -d)}\) and \(\varepsilon _n\ge \log (n)^{-1}\). \(\square \)

We are now ready to prove Theorem 2.1:

Proof of Theorem 2.1

First of all,

$$\begin{aligned} W_n^k(r_n) = W_n^k(r_n,\varepsilon _n) + \bar{W}_n^k(r_n,\varepsilon _n), \end{aligned}$$

where \(\bar{W}_n^k(r_n,\varepsilon _n)\) denotes the number of k-cliques with at least one edge of length at least \(r_n\) that are not in \(W_n^k(r_n,\varepsilon _n)\). Now by (23)

$$\begin{aligned} \mathbb {E}\left[ \bar{W}_n^k(r_n,\varepsilon _n)\right] = O\left( n^dr_n^{d-(k-1)\alpha } \varepsilon _n^{d-\alpha }\right) . \end{aligned}$$

Thus, by the Markov inequality,

$$\begin{aligned} W_n^k(r_n) = W_n^k(r_n,\varepsilon _n) + O_{\mathbb {P}}\left( n^dr_n^{d-(k-1)\alpha } \varepsilon _n^{d-\alpha }\right) . \end{aligned}$$

Now by Lemma 3.4 and the Chebyshev inequality

$$\begin{aligned} W_n^k(r_n) = \mathbb {E}\left[ W_n^k(r_n,\varepsilon _n)\right] (1+o_{\mathbb {P}}(1)) + O_{\mathbb {P}}\left( n^dr_n^{d-(k-1)\alpha }\varepsilon _n^{d-\alpha }\right) . \end{aligned}$$

Now using that as in (22)

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\mathbb {E}\left[ W_n^k(r_n,\varepsilon _n)\right] }{n^dr_n^{d-(k-1)\alpha }} = \frac{dC_d\pi ^{d/2} M_k}{2\Gamma (1+\frac{d}{2})((k-1)\alpha -d)} \end{aligned}$$

finishes the first part of the proof of the theorem. It remains to prove the convergence in distribution. By Lemma 3.3 for sufficiently large \(n \in \mathbb {N}\) we have

$$\begin{aligned} \mathbb {E}\left[ W_n^k(r_n)\right] \ge cn^d r_n^{d-(k-1)\alpha } \end{aligned}$$
(30)

for some constant \(c \in (0, \infty )\). Moreover, in particular we observe that

$$\begin{aligned} \lim _{n \rightarrow \infty } \mathbb {E}\left[ W_n^k(r_n)\right] = \infty \end{aligned}$$
(31)

by the assumption that \(1\ll r_n\ll n^{d/((k-1)\alpha -d)}\). By Lemma 4.2 below, for a random variable \(P_n\) having a Poisson distribution with parameter \(\mathbb {E}\left[ W_n^k(r_n)\right] \):

$$\begin{aligned} d_{\text {TV}}\left( W^k_n(r_n),P_n(\beta )\right)&\le c \mathbb {E}\left[ W_n^k(r_n)\right] ^{-1} n^d r_n^{2d-2(k-1)\alpha } \\&\le c r_n^{d-(k-1)\alpha }, \end{aligned}$$

which tends to zero as \(n \rightarrow \infty \), where we have used (30) in the last inequality. Moreover, since \(P_n\) has a Poisson distribution with parameter \(\mathbb {E}\left[ W_n^k(r_n)\right] \) satisfying  (31), we have that

$$\begin{aligned} \frac{P_n - \mathbb {E}[W^k_n(r_n)] }{\sqrt{\mathbb {E}[W^k_n(r_n)]}} \xrightarrow [n\rightarrow \infty ]{d} Z, \end{aligned}$$

where Z is a random variable with standard normal distribution. Thus, the same conclusion holds for \(W^k_n(r_n)\), since \(d_{\text {TV}}\left( W^k_n(r_n),P_n(\beta )\right) \) converges to 0, as \(n\rightarrow \infty \). This finishes the proof. \(\square \)

4 Proof of Theorem 2.2

By Lemma 3.3, when choosing

$$\begin{aligned} r_n = \Bigg ( \frac{2\Gamma (1+\frac{d}{2})((k-1)\alpha -d)}{dC_d\pi ^{d/2} M_k}\Bigg )^{\frac{1}{d-(k-1)\alpha }} r n^{\frac{d}{(k-1)\alpha -d}}, \end{aligned}$$

with \(r \in (0,\infty )\), then

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {E}\left[ W^k_n(r_n)\right] } = r^{d-(k-1)\alpha }. \end{aligned}$$

This will be used later to conclude the suitable Poisson limit, from which we will subsequently derive the extreme value behavior of the longest edge belonging to a k-clique. We first recall a formal way of constructing the underlying geometric graph by means of a marked Point process. This will later be useful to apply a Poisson approximation theorem.

4.1 Formal Construction of the Random Graph Model

Let \(( \mathbb {M}, \mathcal {M}, m)\) be a mark space, where \(\mathbb {M}=[0,1)^{\mathbb {N}_0}\), \(\mathcal {M}\) is a corresponding \(\sigma \)-algebra and m is the distribution of an infinite sequence of independent [0, 1)-uniformly distributed random variables. We let \(\eta \) be a marked Poisson point process of unit intensity on \(\mathcal {T}_n^d\) with marks in \(\mathbb {M}\) distributed according to m. Thus, \(\eta \) is a Poisson point process on \(\mathcal {T}_n^d \times \mathbb {M}\). We can denote almost every realization of \(\eta \) by

$$\begin{aligned} \eta = \Big \{ (x_i,(t_{i,0}, t_{i,1}, \dots )): {i=1,\dots , |\eta |} \Big \}, \end{aligned}$$

where \(t_{1,0}<t_{2,0}\dots <t_{|\eta |,0}\), i.e. we use the first coordinates in order to determine the order in which the points in \(\eta \) are enumerated. Then there is an edge \(\{x_i,x_j\}\), \(1\le i <j\le |\eta |\), if and only if \(t_{i,j}\le g(x_i,x_j)\). For a set \(\eta \) of marked points as above, we denote by \(G(\eta )=(V(\eta ),E(\eta ))\) the random geometric graph constructed from \(\eta \).

4.2 Poisson Approximation

We denote by \(\textbf{S}\) the set of all locally finite subsets of \(\mathcal {T}_n^d\times \mathbb {M}\) and by \(\textbf{S}_l\) the set of subsets of \(\mathcal {T}_n^d\times \mathbb {M}\) with cardinality l, \(l\in \mathbb {N}\). We recall the following result.

Lemma 4.1

([21, Theorem 3.1.]) Let \(l\in \mathbb {N}\), \(f:\textbf{S}_l\times \textbf{S} \longrightarrow \{0,1\}\) a measurable function and for \(\xi \in \textbf{S}\) set

$$\begin{aligned} F(\xi ):=\sum _{\psi \in \textbf{S}_l:\psi \subset \xi }f(\psi ,\xi \setminus \psi ). \end{aligned}$$
(32)

Let \(\eta \) be a marked Poisson point process, set \(W:=F(\eta )\) and \(\beta :=\mathbb {E}[W]\). For \(x_1,\dots ,x_l\in \mathcal {T}_n^d\) set

$$\begin{aligned} p(x_1,\dots ,x_l):=\mathbb {E}\left[ f\left( \{(x_1,\tau _1),\dots ,(x_l,\tau _l)\},\eta \right) \right] , \end{aligned}$$

where the \(\tau _i\) are independent random variables in \(\mathbb {M}\) with common distribution \(\textbf{m}\).

Suppose that for almost every \(\textbf{x}=(x_1,\dots ,x_l)\), \(x_i\in \mathcal {T}_n^d\), with \(p(x_1,\dots ,x_l)>0\) we can find coupled random variables \(U_\textbf{x}\) and \(V_\textbf{x}\) such that the following holds:

  1. (i)

    \(W\overset{d}{=}U_\textbf{x}\),

  2. (ii)

    \(F\left( \eta \cup \overset{l}{\underset{i=1}{\bigcup }}\{(x_i,\tau _i)\}\right) \) conditional on \(f\left( \overset{l}{\underset{i=1}{\bigcup }}\{(x_i,\tau _i)\},\eta \right) =1\) has the same distribution as \(1+V_\textbf{x}\),

  3. (iii)

    \(\mathbb {E}\left[ \vert U_\textbf{x}-V_\textbf{x}\vert \right] \le w(\textbf{x})\), for some measurable function w.

Let \(P(\beta )\) be a random variable with Poisson distribution and mean \(\beta \). Then

$$\begin{aligned} d_{\text {TV}}\left( W,P(\beta )\right) \le \frac{\min (1,\beta ^{-1})}{l!}\int _{\mathcal {T}_n^d \times \ldots \times \mathcal {T}_n^d}w(\textbf{x})p(\textbf{x}){\text {d}} \textbf{x}, \end{aligned}$$
(33)

where \(d_{TV}\) denotes the total variation distance between two discrete random variables.

Formally, write \(\mathcal {C}_k\) for the set of all k-cliques, i.e. the set of all vertices \(x_1, \ldots , x_k\) that form a clique. Making use of the formal construction described above, we will apply Lemma 4.1 with \(l=k\) and \(f :\textbf{S}_k\times \textbf{S}\) defined by

$$\begin{aligned}&f\left( \left\{ \left( x_1,(u^1_i)_{i\ge 0}\right) , \ldots \left( x_k,(u^k_i)_{i\ge 0}\right) \right\} ,\xi \right) \\&=\textbf{1}\{ C_k:=\{x_1, \ldots , x_k\} \in \mathcal {C}_k, \max _{x,y \in C_k}|x-y|_T \textbf{1}_{\{\{x,y\} \in E(\xi \cup \{( x_1,(u^1_i)_{i\ge 0}) , \ldots ( x_k,(u^k_i)_{i\ge 0})\})\}}>r_n\} \end{aligned}$$

for \(\left\{ \left( x_1,(u^1_i)_{i\ge 0}\right) , \ldots \left( x_k,(u^k_i)_{i\ge 0}\right) \right\} \in \textbf{S}_k\), so that \( W=F(\eta )=W^k_n(r_n)\). By definition, W is the number of k vertices that form a k-clique and contain an edge longer than \(r_n\). One can formally check that f is the indicator function of a measurable set.

Now we define the coupled random variables \(U_\textbf{x}\) and \(V_\textbf{x}\). To this end, we enlarge \(\eta \) by adding k marked points \(x_1^{\text {m}}=(x_1, (u^1_i)_{i\ge 0}), \ldots , x_k^{\text {m}}=(x_k, (u^k_i)_{i\ge 0})\). Then we define \(V_\textbf{x}\) as the number of k vertices other than \(\{x_1, \ldots , x_k\}\) in the enlarged graph \(G(\eta \cup \{x_1^{\text {m}}, \ldots , x_k^{\text {m}}\} )\) that both form a clique and contain an edge longer than \(r_n\):

$$\begin{aligned}&V_\textbf{x}= \sum \nolimits _{1} \textbf{1}\{\max |x-y|_T \textbf{1}_{\{\{x,y\} \in E(\xi \cup \{\left( x_1,(u^1_i)_{i\ge 0}\right) , \ldots \left( x_k,(u^k_i)_{i\ge 0}\right) \})\}}>r_n\} , \end{aligned}$$

where the sum \(\sum _1\) is taken over all \(\{y_1, \ldots , y_k\} \ne \{x_1, \ldots , x_k\}\) with \(y_1, \ldots , y_k \in V (G(\eta \cup \{x_1^{\text {m}}, \ldots , x_k^{\text {m}}\} ))\) that form a clique and the maximum in the indicator is with respect to all endpoints \(x,y \in \{y_1, \ldots , y_k\}\).

Now consider the restriction of \(G(\eta \cup \{x_1^{\text {m}}, \ldots , x_k^{\text {m}}\})\) to \(\eta \), that is the subgraph of \(G(\eta \cup \{x_1^{\text {m}}, \ldots , x_k^{\text {m}}\})\) by deleting the vertices \(x_1, \ldots , x_k\) and all edges having one of them as an endpoint. We denote this graph by \(\eta _R=G(\eta \cup \{x_1^{\text {m}}, \ldots , x_k^{\text {m}}\})_{|\eta } =(V(\eta _r),E(\eta _R))\) and we note that it has the same distribution as \(G(\eta )\). Remark that these two graphs are not equal almost surely, because the presence of an edge not only depends on the marks of its endpoints, but also on all other marks, as they are used in order to determine the order according to which the points are enumerated. The random variable \(U_\textbf{x}\) is then defined as the number of k vertices in the induced graph \(\eta _R\) that both form to a k-clique and contain an edge longer than \(r_n\):

$$\begin{aligned} U_\textbf{x}= \sum \nolimits _2 \textbf{1}\{\max |z-y|_T \textbf{1}_{\{ \{y, z\} \in E(\eta _R) \}} > r_n\}, \end{aligned}$$

where now the sum \(\sum _2\) is taken over all cliques \(\{y_1, \ldots , y_k\}\) with \(y_1, \ldots , y_k \in V (\eta _r) \) and the maximum in the indicator is with respect to all endpoints \(z,y \in \{y_1, \ldots , y_k\}\). Observe that the coupled random variables \(U_\textbf{x}\) and \(V_\textbf{x}\) defined above satisfy Assumptions (i) and (ii) of Theorem 4.1 and are such that \(V_\textbf{x}\ge U_\textbf{x}\) by construction. It follows that

$$\begin{aligned} \mathbb {E}\left[ \vert U_\textbf{x}-V_\textbf{x}\vert \right]&=\mathbb {E}\left[ V_\textbf{x}-U_\textbf{x}\right] \\&=\mathbb {E}\Bigg [ \sum \nolimits _{1} \textbf{1}\{\max |x-y|_T \textbf{1}_{\{\{x,y\} \in E(\xi \cup \{\left( x_1,(u^1_i)_{i\ge 0}\right) , \ldots \left( x_k,(u^k_i)_{i\ge 0}\right) \})\}}>r_n\} \Bigg ]\\&\quad -\mathbb {E}\Bigg [\sum \nolimits _{2} \textbf{1}\{\max |z-y|_T \textbf{1}_{\{ \{y, z\} \in E(\eta _R) \}} > r_n\}\Bigg ]\\&=:w(\textbf{x}). \end{aligned}$$

Thus, Assumption (iii) of Theorem 4.1 is also satisfied. Moreover, from the calculations in the proof of Lemma 3.3 we obtain for \(x_1, \ldots , x_k \in \mathcal {T}_n^d\) that

$$\begin{aligned} w(\textbf{x}) \le c r_n^{d-(k-1)\alpha }. \end{aligned}$$

Similarly, we also have

$$\begin{aligned} p(\textbf{x})&=\mathbb {E} \left[ f \left( \{\left( x_1,(u^1_i)_{i\ge 0}\right) , \ldots \left( x_k,(u^k_i)_{i\ge 0}\right) \},\xi \right) \right] \le c r_n^{d-(k-1)\alpha }. \end{aligned}$$

Thus, we get the following upper bound for the integral in the r.h.s of (33):

$$\begin{aligned} \int _{\mathcal {T}_n^d}w(\textbf{x})&p(\textbf{x})d\textbf{x} \le c n^d r_n^{2d-2(k-1)\alpha }, \end{aligned}$$

which proves the following:

Lemma 4.2

Let \(P_n(\beta )\) be a random variable with Poisson distribution and mean \(\beta = \mathbb {E}[W^k_n(r_n)]\), where \(1\ll r_n \ll n\). Then

$$\begin{aligned} d_{\text {TV}}\left( W^k_n(r_n),P_n(\beta )\right) \le c \frac{\min (1,\beta ^{-1})}{k!} n^d r_n^{2d-2(k-1)\alpha }, \end{aligned}$$

where \(c \in (0, \infty )\) is a constant independent of n.

Now, we are in position to close the proof of Theorem 2.2. Recall again that

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {E}\left[ W^k_n(r_n)\right] } = r^{d-(k-1)\alpha } \end{aligned}$$

in the present case. Moreover, by definition of \(r_n\) the expression \(n^d r_n^{2d-2(k-1)\alpha }\) tends to 0, as \(n \rightarrow \infty \). Thus, since the total variation distance between two Poisson random variables is bounded by the absolute value of the difference of their parameters, Theorem  2.2 follows from Lemma 4.2. Then Corollary 2.3 is an easy consequence.

Remark 4.1

As mentioned in Remark 2.1 one can also use for example [8, Theorem 7.1] regarding Poisson approximation of U-statistics. However, our bound in Lemma 4.2 involves the additional factor \(\min \{1, \beta ^{-1}\}\), which is missing in [8, Theorem 7.1]. This factor thus improves the upper bound in case the expectation is large. Indeed, this is the case in the setting of Theorem 2.1, as the expectation diverges, thus making the factor crucial to prove the central limit theorem.

5 Proof of Theorem 2.4

By Lemma 3.3, the expected number of cliques with one vertex that is at least \(1/\varepsilon _n\) away from the other vertices is upper bounded by \(Cn^d\varepsilon _n^{\alpha (k-1)-d}\) for large enough n. Thus, by the Markov inequality,

$$\begin{aligned} K_k = K_k(\varepsilon _n) + O_{\mathbb {P}}\left( n^d\varepsilon _n^{\alpha (k-1)-d}\right) . \end{aligned}$$
(34)

Furthermore, as all edges in \(K_k(\varepsilon _n)\) are short edges, by the same arguments as in the proof of Lemma 3.4, we obtain

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{Var \left( K_k(\varepsilon _n)\right) }{\mathbb {E}\left[ K_k(\varepsilon _n)\right] ^2} = 0. \end{aligned}$$

Then, the Chebyshev inequality yields that

$$\begin{aligned} K_k(\varepsilon _n) = \mathbb {E}\left[ K_k(\varepsilon _n)\right] (1+o_{\mathbb {P}}(1)). \end{aligned}$$
(35)

Now,

$$\begin{aligned}&\mathbb {E}\left[ K_k(\varepsilon _n)\right] \nonumber \\&\ge C_dn^d\int _{\mathcal {B}_{0,1/\varepsilon _n}}\dots \int _{\mathcal {B}_{0,1/\varepsilon _n}}\prod _{i\le k-1}g(\mathbf {x_i},0)\prod _{1\le u<v\le k}g(\mathbf {x_u},\mathbf {x_v})d\mathbf {x_1}\dots d\mathbf {x_{k-1}}, \end{aligned}$$

resulting in

$$\begin{aligned} \frac{K_k}{K_k(\varepsilon _n)} = 1 + \frac{O_{\mathbb {P}}\left( n^d\varepsilon _n^{\alpha (k-1)-d}\right) }{n^d(1+o_{\mathbb {P}}(1))}, \end{aligned}$$
(36)

so that

$$\begin{aligned} \frac{K_k}{K_k(\varepsilon _n)}\xrightarrow [n\rightarrow \infty ]{\mathbb {P}}1. \end{aligned}$$
(37)

\(\square \)

6 Conclusion and Discussion

In this paper, we have investigated the distances within cliques of arbitrary size k, with a focus on large-deviations behavior of the biggest distance in a k-clique for the soft random geometric graph on a d-dimensional torus of radius n. While all typical cliques contain only low-weighted edges, the largest distance in the k-clique scales as \(n^{d/((k-1)\alpha -d)}\), indicating that the longest edge in a k-clique decreases in k, in contrast to the length of a typical edge in any k-clique. Furthermore, with high probability, such a clique contains \(k-1\) edges of this length, and all other edges are short. We further showed that the properly re-scaled length of the longest edge in a clique converges in distribution to a random variable with Fréchet distribution.

We believe that this leads to several interesting questions for further research. First, it would be interesting to investigate whether this convergence holds for all other possible subgraph counts. We believe that for other subgraph counts, the scaling of the longest edge will only depend on the minimal degree of the subgraph, and not on the subgraph size, and it would be interesting to show this in further research. Another option would be to study asymptotic properties of clique counts and distributional limits of the maximal clique (size) in the soft random geometric graph with a similar approach and methodology as presented in this paper.

Secondly, it would be interesting to extend these results to different types of geometric graphs, particularly for the discrete version of the model in this paper, see [7]. Here, it has been shown in [24] that there are subtle differences for the longest global edge, compared to the continuous version. Other options include investigating other models with different underlying spaces such as hyperbolic random graphs [13]. In these models, the clique-behavior would also be interesting to investigate.

Moreover, this work shows the likelihood of unlikely cliques with long edges, therefore focusing on the structures of cliques rather than their total counts. It would be interesting to see whether these insights on the behavior of the largest edge-lengths can also lead to hypothesis tests to distinguish different random graph models or to detect anomalies in graphs with statistics such as in [16].

Furthermore, in this paper we work with the particular connection probability (1). Still, our results can also be transferred to the setting in which this connection probability has exponential decay, where \(g(x,y)=\exp (-\lambda |x-y|_T^{-\alpha })\), \(\lambda , \alpha >0\). In such a case, we believe that one would obtain a convergence of Gumbel type for the maximal clique distance, similar as for the maximal edge-length in [24].

Finally, often one is interested in the sum of (powers of) the edge-lengths in geometric graphs (see for example [23]). As an additional insight into the structure of cliques, it would be interesting to study the total sum of edge-lengths in k-cliques, and other edge and/or edge-length functionals.

Index of notation

$$\begin{aligned}&\mathcal {T}_n^d:{} & {} \text { }d\text {-dimensional torus of radius } n \\&C_d:{} & {} \text { constant such that the volume of the }d\text {-dimensional torus } \\{} & {} {}&\text { has } {\text {vol}} (\mathcal {T}_n^d )= C_dn^d\\&W_n^k(r_n):{} & {} \text { number of } k\text {-cliques with at least one edge of length at least } r_n \\&W_n^k(r_n,\varepsilon ):{} & {} \text { number of } k\text {-cliques with exactly }k-1 \text { edges of length at least } r_n \\{} & {} {}&\text { and all other edges of length at most } 1/\varepsilon \\&K_k(\varepsilon _n):{} & {} \text { number of } k\text {-cliques with all interdistances of length at most } \varepsilon _n \\&K_k:{} & {} \text { number of } k\text {-cliques} \\&T_n(r_n):{} & {} \text { number of triangles that a given edge of length } r_n\text { is part of} \\&\mathcal {B}_{y,1/\varepsilon _n}:{} & {} \text { ball of radius } 1/\varepsilon _n \text { around } y \\&A_k(r_n):{} & {} \text { number of } k\text {-cliques that a given edge of length } r_n \text { is part of} \end{aligned}$$