Rare Events in Random Geometric Graphs

This work introduces and compares approaches for estimating rare-event probabilities related to the number of edges in the random geometric graph on a Poisson point process. In the one-dimensional setting, we derive closed-form expressions for a variety of conditional probabilities related to the number of edges in the random geometric graph and develop conditional Monte Carlo algorithms for estimating rare-event probabilities on this basis. We prove rigorously a reduction in variance when compared to the crude Monte Carlo estimators and illustrate the magnitude of the improvements in a simulation study. In higher dimensions, we use conditional Monte Carlo to remove the fluctuations in the estimator coming from the randomness in the Poisson number of nodes. Finally, building on conceptual insights from large-deviations theory, we illustrate that importance sampling using a Gibbsian point process can further substantially reduce the estimation variance.


Introduction
In this paper, we focus on rare events associated with the number of edges in the Gilbert graph G(X) on a homogeneous Poisson point process X = {X i } i≥1 with intensity λ > 0 in R d , for d ≥ 1.Consequently, the nodes of G(X) are the points of X and there is an edge between X i , X j ∈ X if X i − X j ≤ 1, where the upper bound 1 is the threshold of the Gilbert graph and • denotes the Euclidean norm.Our goal is to analyze the probability of the rare event that the number of edges in a bounded sampling window W ⊂ R d deviates considerably from its expected value.More succinctly, we ask: What is the probability that the Gilbert graph has at least twice its expected number of edges?What is the probability that the Gilbert graph has at most half its expected number of edges?
These seemingly innocuous questions have an intriguing connection to large deviations of heavy-tailed sums.Indeed, suppose that {W i } is an equal-volume partition of W such that the diameter of each W i is at most 1.Then, letting Z i := |X ∩ W i | denote the number of Poisson points in W i , the edges entirely inside W i contribute Z i (Z i − 1)/2 to the total edge count.Since Z i follows a Poisson distribution, the tail probability P(Z i ≥ n) is of the order exp(−c n log n) for some constant c > 0. Hence, the tails of Z 2 i are of the order exp(−c √ n log n/2) and therefore Z 2 i does not have exponential moments.This is critical to note, because it means that the problem at hand is tightly related to large deviations of heavy-tailed sums, where large deviations typically come from extreme realizations of the largest summand [1,5,11].
The analysis and simulations in this paper rely on two methods: conditional Monte Carlo (MC) and importance sampling; see Chapters 9.4 and 9.7 of [7], respectively.Importance sampling techniques have been in use for the analysis of rare events in the setting of the Erdős-Rényi graph [3], which is different from the Gilbert graph considered here.Other than that, there has been very little literature on the topic.Note that the same analysis can be extended to Gilbert graphs with the threshold not equal to 1 by modifying the size of the window W and the intensity λ of the Poisson point process.
The rest of the presentation is organized in three parts.First, in Section 2, we explore the potential of conditional Monte Carlo in a one-dimensional setting, where we can frequently derive explicit closed-form expressions.Surprisingly, when moving to the far end of the upper tails, it is possible to avoid simulation altogether, as we derive a fully analytic representation.Then, in Section 3, we move to higher dimensions.Here, we apply conditional MC to remove the randomness coming from the random number of nodes.Finally, in Section 4, we present a further refinement of the conditional MC estimator, by combining it with importance sampling using a Strauss-type Gibbs process.To ease notation, we henceforth identify the Gilbert graph with its edge set, so that |G(X ∩ W )| yields the number of edges in the Gilbert graph on X ∩ W .

Conditional MC in dimension 1
In this section, we consider the one-dimensional setting.More precisely, we consider a line segment W = [0, w] as sampling window.In Section 2.1, we describe a specific conditional MC scheme that leads to estimators for the rare-event probability of the number of edges being small.In a simulation study in Section 2.2, we show that these new estimators are substantially better than a crude MC approach.
Then, Section 2.3 discusses rare events corresponding to the number of missing edges, i.e., the number of point-pairs that are not connected by an edge.The analysis is motivated from the observation that the Erdős-Rényi graph with edge probability p ∈ [0, 1] exhibits a striking duality with its complement.Specifically, the missing edges of this graph again form an Erdős-Rényi graph but now with probability 1 − p.In Section 2.3, we point out that in the Gilbert graph such a duality is much more involved.We still elucidate how to compute the probability of observing no missing edges or precisely one missing edge.
In this section, we assume that the points {X i } i≥1 of the Poisson point process on [0, ∞) are ordered according to their occurrence; that is, X i ≤ X j , whenever i ≤ j.

Few edges
Henceforth, let denote the event that the number of edges in the Gilbert graph on X ∩ [0, w] is at most k ≥ 0. For fixed k and large w, the probability p ≤k := P(E ≤k ) becomes small, and we discuss how to leverage both the natural ordering on the real half-line and the independence property of the Poisson point process to derive a refined estimator.
We focus only on the cases k = 0, 1.In principle, the methods could be extended to cover estimation of probabilities of the form p ≤k for k ≥ 2. However, for large values of k the combinatorial analysis becomes quickly highly involved; see Remark 2 for more details.

No edges
To begin with, let k = 0.That is, we analyze the probability that all vertices in X ∩ [0, w] are isolated in the sense that their vertex degree is 0. The key idea for approaching this probability is to note that E 0 := E ≤0 occurs if and only if X ∩ [X 1 , (X 1 + 1) ∧ w] = ∅ and the Gilbert graph restricted to X ∩ [X 1 + 1, w] does not contain edges; see Figure 1.Here, we adhere to the convention that , the blue interval may not contain points of X and the Gilbert graph restricted to the red interval may not contain edges.
According to the Palm theory for one-dimensional Poisson point processes, the process again forms a homogeneous Poisson point process, which is independent of X 1 ; see [8,Theorem 7.2].In particular, writing F 1 := σ(X 1 , X (1) ) for the σ-algebra generated by X 1 and X (1) allows for a partial computation of p 0 := p ≤0 via conditional MC [7,Chapter 9.4].More precisely, when computing P(E 0 | F 1 ) we explicitly throw away the information from the configuration of X inside the interval [X 1 , X 1 + 1].
Theorem 1 (No edges).Suppose that w ≥ 1.Then, Proof.First, X 1 is isolated if there are no further vertices in the interval [X 1 , (X 1 + 1) ∧ w].Moreover, after conditioning on X 1 , the remaining vertices form a Poisson point process in [(X 1 + 1) ∧ w, w]; see [8,Theorem 7.2].Therefore, In other words, invoking the Rao-Blackwell theorem [4], Theorem 1 showcases the right-hand side of identity (1) as an attractive candidate for estimating p 0 via conditional Monte Carlo.The Rao-Blackwell theorem is a powerful tool in situations where p 0 is not available in closed form.Note that elementary properties of the conditional expectation imply that the conditional MC estimator P(E 0 | F 1 ) is unbiased and exhibits smaller variance than the crude MC estimator ½{E 0 }.Moreover, the right-hand side of identity (1) features another indicator of an isolation event.Hence, it becomes highly attractive to refine the estimator further by proceeding iteratively.To make this precise, we define an increasing sequence X * 1 ≤ X * 2 ≤ • • • of points of X recursively as follows.First, X * 1 = X 1 denotes the left-most point of X. Next, once X * m is available, denotes the first point of X to the right of X * m + 1.Then, the event E 0 occurs if none of the intervals [X * i , X * i + 1] contains points from X; see Figure 2. 0 Figure 2: For G(X ∩ [0, w]) = ∅, the blue intervals may not contain any points.
Of particular interest is the last index where X * i remains inside [0, w], together with the associated point If X 1 > w, we set I * = 0 and X * = w.Let be the σ-algebra generated by {X * i : i ≥ 1}.
Theorem 2 (No edges -iterated).Suppose that w ≥ 1.Then, P(E 0 |F * ) = e −λ((I * −1)++(w−X * )∧1) almost surely, Proof.To prove the claim, we first define the shifted process and write F m := σ X * 1 , . . ., X * m , X (m) for the σ-algebra generated by X * 1 , . . ., X * m and X (m) .Observe that In particular, by the tower property of conditional expectation, Hence, it suffices to show that for every m ≥ 0, because To achieve this goal, we proceed by induction on m.For m = 0 and m = 1 we are in the setting of Theorem 1.To pass from m to m + 1, the induction hypothesis yields that Since X * m+1 is the first point of X after X * m + 1, by applying Theorem 1 to the Poisson point process X (m) , we obtain Combining ( 3) and ( 4) yields the assertion.
The conditional MC estimator from Theorem 2 leads to Algorithm 2.1.Here, Exp(λ) is an exponential random variable with parameter λ that is independent of everything else.output: Conditional MC estimator of p 0 .

At most one edge
Here, let k = 1; i.e., we propose an estimator for the probability p 1 := p ≤1 that the Gilbert graph on X ∩ [0, w] has at most one edge.Let be the index of the first point of X whose predecessor is at distance at most 1.Putting X Moreover, we write for the σ-algebra generated by X 1 , X 2 , . . ., X I + and X + .0 , the blue interval may not contain any points and the Gilbert graph restricted to the red interval may not contain edges.
Theorem 3 (At most one edge).Suppose that w ≥ 1.Then, Proof.Since the proof is very similar to that of Theorem 1, we only point to the most important ideas.Equation ( 5) obviously holds for X I + > w.For the case X I + ≤ w, relying again on the Palm theory of the one-dimensional Poisson process, we have as asserted.
Similarly to Section 2.1.
Remark 1 (Symmetric window).The methods described above could also be applied for a Poisson point process in a symmetric interval of the form [−w/2, w/2].Then, in addition to I * and I + , we would need to take into account the corresponding quantities located to the left of the origin.
Remark 2 (k ≥ 2).The method to handle k = 0, 1 could certainly be extended to larger k ≥ 2, but the configurational analysis would quickly become very involved.For instance, for k = 2 we would first need to require that the interval [X I * −1 , X I * −1 + 1] contains only the point X I * .Next, if X I * +1 ≤ X I * + 1, then there may be no more edges to the right of X I * +1 .On the other hand, the analysis will be different if X I * +1 > X I * + 1, because then there can still be one edge in the graph to the right of X I * +1 .

Simulations
In this section, we illustrate how to estimate the rare-event probabilities p 0 and p 1 via MC.After presenting the crude MC estimator, we illustrate how the conditional MC estimators described in Section 2.1 improve the efficiency drastically.In the simulation study, we estimate p 0 and p 1 for sampling windows of size w ∈ {5, 7.5, 10} and Poisson intensity λ = 2.Both the crude MC and the conditional MC estimator are computed based on N = 10 6 samples.To estimate the rare-event probabilities p ≤k using crude MC, we draw iid samples X (1) , . . ., X (N ) of the Poisson point process on [0, w] and set for the proportion of samples leading to a Gilbert graph with at most k edges.
The estimates reported in Table 1 reveal that as the size of the sampling window grows, the rare-event probabilities decrease rapidly and that the estimators exhibit a high relative error.Next, we estimate p ≤k for k = 0, 1 with the conditional MC methods described in Algorithms 2.1 and 2.2.If P (1) , . . ., P (N ) denote the simulation outputs, then we set The corresponding estimates shown in Table 2 highlight that the theoretical improvements over crude MC predicted from Theorems 2 and 3 also manifest themselves in the simulation study.This is particularly striking in the setting k = 0, where the variance can be reduced by several orders of magnitude.

Few missing edges
We now focus on computing the rare-event probabilities of having few missing edges.More precisely, we write for the number of edges of G(X ∩ [0, w]) that are missing from the complete graph on the vertices X ∩ [0, w].
We write p ′ ≤k := P(M w ≤ k) for the probability that at most k ≥ 0 edges are missing.Surprisingly, this seemingly more complicated task is more accessible than the probability of seeing few edges considered in Section 2.1, as both p ′ 0 = p ′ ≤0 and p ′ ≤1 are amenable to closed-form expressions.
For p ′ 0 , the key insight is to note that M w = 0 if and only if Figure 4: For M w = 0, the red interval may not contain any points.
Next, we compute the probability of observing at most one missing edge.
Proof.We decompose p ′ ≤1 as and compute the second and third probability separately.This corresponds to Case 1 and Case 2 below.Note that under the event {X ∩ [X 1 + 2, w] = ∅}, we have |X ∩ [0, w]| = 2, since more points would imply at least two missing edges.
Conditioning on X 1 , the probability of this event becomes Inserting the void probability for the Poisson process, we arrive at It remains to treat the case X ∩ [X 1 + 2, w] = ∅.First, note that conditioned on X 1 = x 1 , the point process X ∩ [x 1 , w] is Poisson.Thinking of this Poisson point process to be formally extended to −∞ to the left, we write X ′ 1 for the first Poisson point to the left of w.Then, Figure 5 illustrates that the event 1) .Assembling the different cases together concludes the proof.

Conditional MC in higher dimensions
In Section 2, we analyzed rare events related to few edges or few missing edges in a one-dimensional setting.There, the natural ordering of the Poisson points was pivotal to derive closed-form expressions for conditional probabilities.Now, we proceed to higher dimensions and also consider more general deviations from the mean number of edges.First, we again illustrate that substantial variance reductions are possible through a surprisingly simple conditional MC method.Loosely speaking, the Poisson point process consists of 1) an infinite sequence of random points in the window determining the locations of points and 2) a Poisson random variable determining the number of points in the sampling window.We use that after conditioning on the spatial locations, the rare-event probability is available in closed form.This type of Poisson conditioning is novel in a spatial rare-event estimation context, but has strong ties to approaches appearing earlier in seemingly unrelated problems.More precisely, for instance in reliability theory, related conditional MC schemes lead to spectacular variance reductions [9,13].
The rest of this section is organized as follows.First, Section 3.1 describes how to estimate the rare event probabilities related to too few and too many edges relative to their expected number through conditional MC.Then, Section 3.2 presents a simulation study illustrating that this estimator can reduce the estimation variance by several orders of magnitude.

Conditioning on a spatial component
We consider a full-dimensional sampling window W ⊂ R d and rare events of the form where µ := E[|G(X ∩ W )|] denotes the mean (i.e., expected) number of edges, which can be estimated through simulations.Alternatively, if W is so large that edge effects can be neglected, then the Slivnyak-Mecke formula [8,Theorem 4.4] gives the approximation µ ≈ 1 2 |W |λ 2 κ d , where κ d denotes the volume of the unit ball in R d .
The key idea for developing a conditional MC scheme is to use the explicit construction of a Poisson point process in a bounded sampling window.More precisely, let X ∞ = {X n } n≥1 be an iid family of uniform random points in W and K be an independent Poisson random variable with parameter λ|W |.Then, {X n } n≤K is a Poisson point process in W with intensity λ, [8,Theorem 3.6].
In conditional MC, we use the fact that the rare-event probabilities can be computed in closed form after we condition on the locations X ∞ .More precisely, we let Similarly, let denote the largest k such that the Gilbert graph on the nodes {X 1 , . . ., X k } contains at most (1 + a)µ edges.Then, where F Poi : Z ≥0 → [0, 1] denotes the cumulative distribution function of a Poisson random variable with parameter λ|W |.

Numerical results
We sample planar homogeneous Poisson point processes windows of size 20 × 20, 25 × 25, and 30 × 30, respectively.Here, S, M, and L stand for small, medium, and large, respectively.
In Section 1, we mentioned that a major challenge in devising efficient estimators for rare-event probabilities related to the edge count comes from a qualitatively different tail behavior: light on the left, heavy on the right.In other words, we expect that in the left tail, we see changes throughout the sampling window, whereas in the right tail, a singular particular configuration in a small part of the window is sufficient to induce the rare event.We recall that the left tail of a random variable Z refers to the probabilities P(Z ≤ r) for small r and the right tail refers to the probabilities P(Z ≥ r) for large r.
Although on a bounded sampling window, the theoretical difference between the left and the right tail is subtle, we illustrate in Table 3 that it does become visible when considering the quantiles Q α and Q 1−α for the number of edges if α is small.Here, the empirical quantiles for 10 6 samples of the edge counts in a (20 × 20)-window are shown.For instance the 1%-quantile is 16.4% lower than the mean, which is a similar deviation as the 18% exceedance of the 99%-quantile.However, when moving to the 0.01%-quantile, then it is 25.2% lower than the mean, whereas the corresponding 99.99% quantile exceeds the mean by 30.0%.These figures are an early indication of the difference in the tails that will reappear far more pronouncedly in the numerical results concerning the estimation of the rare-event probabilities P(F <0.2 ) and P(F >0.2 ).
Table 3: Empirical quantiles for the number of edges in a (20 × 20)-window: absolute values (upper two rows) and relative deviation from the mean µ (lower two rows).
In the rest of the section, we estimate the rare-event probabilities q <0.2 := P(F <0.2 ) and q >0.2 := P(F >0.2 ) corresponding to 20% deviations from the mean.Here, we draw N = 10 5 samples of X ∞ .Then, taking into account the representation (6), we set The variances of the crude MC estimators are equal to q <0.2 (1 − q <0.2 ).
We report the estimates p Cond <0.2 and p Cond >0.2 in Table 4. First, we see that the exceedance probabilities p Cond >0.2 are always smaller than the corresponding undershoot probabilities p Cond <0.2 .This supports the preliminary impression of the difference in the tail behavior hinted at in Table 3.Moreover, in all examples conditional MC reduces the estimation variance massively and the efficiency gains become more pronounced the rarer the event.
Table 4: Estimates for q <0.2 and q >0.2 based on conditional MC for different window sizes based on N = 10 5 samples.Variance improvements in comparison to the crude estimator in parentheses.

Importance sampling
The conditional MC estimators constructed in Section 3 take into account that an atypically large number of Poisson points leads to a Gilbert graph exhibiting considerably more edges than expected.However, not only the number but also the location of points play a pivotal role.For instance if the points tend to repel each other, then we would typically observe fewer edges.Similarly, clustered point patterns should induce more edges.
In order to implement these insights, we resort to the technique of importance sampling [7,Chapter 9.7].That is, we draw samples from a point process with distribution Q for which the rare event becomes typical and then correct the estimation bias by weighting with likelihood ratios.In Sections 4.1 and 4.2 below, we explain how to implement these steps through configuration-dependent birth-death processes reminiscent of the Strauss process from spatial statistics [10].Finally, in Section 4.3, we illustrate in a simulation study how importance sampling of the spatial locations reduces the estimation variance further.

Lower tails
The key observation is that under the rare event of seeing exceptionally few edges, we expect a repulsion between points.More precisely, the most likely reason for the rare event are changes to the configuration of the underlying Poisson point process throughout the entire sampling window.The large-deviation analysis of [12] suggests to perform importance sampling where, instead of considering the distribution P of the a priori Poisson point process, we draw samples according to a different stationary point process with distribution Q such that under Q, the original rare event becomes typical and whose deviation from P, as measured through the Kullback-Leibler divergence h(Q | P), is minimized.We implement this repulsion by a dependent thinning inspired from the Strauss process.
Here, we start from a realization of the Gilbert graph on n 0 = ⌊λ|W |⌋ iid points {X 1 , . . ., X n0 }, and then, we thin out points successively.An independent thinning of points would give rise to uniformly distributed locations without interactions.In the importance sampling, we thin instead via a configuration-dependent birth-death process; see e.g., Chapter 9.7 of [7].
To describe the death mechanism more precisely, we draw inspiration from the Strauss process and choose the probability p i to remove point X i proportional to γ deg (Xi) , where deg(X i ) denotes the degree of X i in the Gilbert graph and γ > 1 is a parameter of the algorithm.Algorithm 4.1 shows the pseudo-code leading to the importance sampling estimator q IS <a for q <a .To understand the principle behind Algorithm 4.1, we briefly expound on the general approach in importance sampling, and refer the reader to Chapter 9.7 of [7] for an in-depth discussion.
As mentioned above, when thinning out points independently until the number of edges in the Gilbert graph falls below µ(1−a), we would arrive at the random variable K <a from Section 3.However, thinning out according to a configuration-dependent probability distorts its distribution towards a probability measure Q that is in general different from the true distribution P. Still, by construction, Q is absolutely continuous with respect to P, and we let ρ := dP/ dQ be the likelihood ratio.Then, from a conceptual point of view, Algorithm 4.1 first draws N ≥ 1 iid samples K (1) <a , . . ., K (N ) <a from the distorted distribution Q with associated likelihood ratios ρ 1 , . . ., ρ N , and then computes Intuitively, we would like to choose the thinning parameter γ > 1 such that the number of edges in the rare event should match the expected number of edges under the thinning.To develop a heuristic for Algorithm 4.1: Importance sampling estimator q IS <a for q <a input: Number N ≥ 1 of MC runs; parameter γ; Poisson intensity λ; sampling window W . output: Importance sampling estimator q IS <a for q <a .
11 return sum/N this choice, we consider a Strauss process, where we restrict to the two-dimensional setting to allow for an accessible presentation.Thus, where λ Str > 0 and ρ Str : [0, ∞) → [0, ∞) denote the intensity and pair-correlation function of the Strauss process, respectively [10].In contrast to the Poisson setting, neither λ Str nor ρ Str are available in closed form.Still, both quantities admit accurate saddle-point approximations λ PS > 0 and ρ PS : [0, ∞) → [0, ∞) that are ready to implement in the Strauss case [2].
More precisely, λ PS is the unique positive solution of the equation where G = (1 − β)π and β = log(γ), see [2].Then, recalling that we work in a planar setting, we put denotes the intersection area of two unit disks at distance r.Inserting these approximations into (7) and solving the resulting implicit equation yields β ≈ log(1.018)and a value of γ ≈ 1.018.

Upper tails
Similar to the lower tails, we can strengthen the estimator by combining conditional MC with importance sampling on the spatial locations.For the upper tails, it is natural to devise an importance sampling scheme favoring clustering rather than repulsion.From the point of view of large deviations, this phenomenon was considered in [6].There, it is shown that all excess edges come from a ball of size 1 containing a highly dense configuration of Poisson points.More precisely, the method is particularly powerful in settings where the rare event is the epitome of a condensation phenomenon.That is, a peculiar behavior of the point process in a small part in the window becomes the most likely explanation of the rare event.As laid out in [6], at least in the asymptotic regime of large deviations, the rare event of observing too many edges is governed by the above-described condensation phenomenon.
We propose to take the clustering into account via a birth mechanism favoring the generation of points in areas that would lead to a large number of additional edges.To ease implementation, the density of the birth mechanism is discretized and remains constant in bins of a suitably chosen grid.The density in a bin at position x ∈ W is proportional to γ n(x) , where γ > 1 is a parameter governing the strength of the clustering and n(x) denotes the number of Poisson points in a suitable neighborhood around x, such as the bin containing x together with all adjacent bins.Similar to the lower tails case, a subtle point in this approach pertains choosing the parameter γ > 1.
Unfortunately, an attractive Strauss process is ill-defined in the entire Euclidean space, so that the saddlepoint approximation from Section 4.1 does not apply.In Section 4.3 below, we therefore rely on a pilot run suggesting γ = 1.01 as a good choice for further variance reduction.Although in this pilot run, the number of simulations is small, and estimates of the variance are still volatile, we found that it provides a good indication for simulations on a larger scale.

Numerical results
Section 3.2 revealed that even with a sample size of N = 10 5 the conditional MC estimators still exhibit a considerable relative error.Now, we illustrate that importance sampling may be an appealing option to reduce this error.Table 5 reports the estimates q IS <0.2 and q IS >0.2 for different sizes of the sampling window as in Section 3.2.In the left tail, we see massive variance improvements when compared to the crude MC estimator.In the right tail, the gains are substantial, but a little less pronounced.This matches the intuition that the root of the rare events in the right tails should be a condensation phenomenon, which goes against the heuristic of changing the point process homogeneously throughout the window.Table 5: Estimates for q <0.2 and q >0.2 based on importance sampling with Strauss-type processes for different window sizes based on N = 10 5 samples.Variance improvements in comparison to the crude estimator in parentheses.q IS <0.2 q IS >0.2 X S 2.025 × 10 −3 ± 6.22 × 10 −6 (523.3)5.125 × 10 −3 ± 1.57 × 10 −5 (207.9)X M 1.544 × 10 −4 ± 6.16 × 10 −7 (4071.0)6.744 × 10 −4 ± 2.66 × 10 −6 (951.8)X L 6.935 × 10 −6 ± 3.63 × 10 −8 (52, 665.8) 6.240 × 10 −5 ± 3.09 × 10 −7 (6, 537.22)
Algorithm 2.2: Conditional MC estimator for p 1 input: Number N ≥ 1 of MC runs and Poisson intensity λ > 0. output: Conditional MC estimator of p 1 .

Table 2 :
Conditional MC estimates for p 0 = p ≤0 and p ≤1 for sampling intervals of size w ∈ {5, 7.5, 10} and Poisson intensity λ = 2. Variance improvements in comparison to the crude estimator in parentheses.