1 Introduction

In this paper, we focus on rare events associated with the number of edges in the Gilbert graph G(X) on a homogeneous Poisson point process X = {Xi}i≥ 1 with intensity λ > 0 in \(\mathbb {R}^{d}\), for d ≥ 1. Consequently, the nodes of G(X) are the points of X and there is an edge between Xi, XjX if ∥XiXj∥≤ 1, where the upper bound 1 is the threshold of the Gilbert graph and ∥⋅∥ denotes the Euclidean norm. Our goal is to analyze the probability of the rare event that the number of edges in a bounded sampling window \(W \subset \mathbb {R}^{d}\) deviates considerably from its expected value. More succinctly, we ask:

What is the probability that the Gilbert graph has at least twice its expected number of edges? What is the probability that the Gilbert graph has at most half its expected number of edges?

These seemingly innocuous questions have an intriguing connection to large deviations of heavy-tailed sums. Indeed, suppose that {Wi} is an equal-volume partition of W such that the diameter of each Wi is at most 1. Then, letting Zi := |XWi| denote the number of Poisson points in Wi, the edges entirely inside Wi contribute Zi(Zi − 1)/2 to the total edge count. Since Zi follows a Poisson distribution, the tail probability \(\mathbb {P}(Z_{i} \ge n)\) is of the order \(\exp (-c n \log n)\) for some constant c > 0. Hence, the tails of \({Z_{i}^{2}}\) are of the order \(\exp (- c \sqrt n \log n/2)\) and therefore \({Z_{i}^{2}}\) does not have exponential moments. This is critical to note, because it means that the problem at hand is tightly related to large deviations of heavy-tailed sums, where large deviations typically come from extreme realizations of the largest summand (Asmussen and Kroese 2006; Blanchet and Glynn 2008; Rojas-Nandayapa 2013).

The analysis and simulations in this paper rely on two methods: conditional Monte Carlo (MC) and importance sampling; see Chapters 9.4 and 9.7 of Kroese et al. (2013), respectively. Importance sampling techniques have been in use for the analysis of rare events in the setting of the Erdős–Rényi graph (Bhamidi et al. 2015), which is different from the Gilbert graph considered here. Other than that, there has been very little literature on the topic. Note that the same analysis can be extended to Gilbert graphs with the threshold not equal to 1 by modifying the size of the window W and the intensity λ of the Poisson point process.

Seen in a larger context, our work is the first to provide specific simulation-based tools for the estimation of rare events in the Gilbert graph, which is the simplest example of a spatial random network. This is of particular relevance when taking into account the central place that such networks have in models in materials science and wireless networks (Baccelli and Błaszczyszyn 2009; Stenzel et al. 2014). When these models form the basis for applications in security-sensitive contexts, it becomes essential to provide estimates for rare-event probabilities. Indeed, when introducing a new technology, such as for instance device-to-device networks, it is not enough to know that the system works well on average. At the same time, the probabilities for catastrophic events should be estimated sufficiently precisely such that they can be deemed negligible. A prototypical application scenario based on measurement data is outlined by Keeler et al. (2018).

We also mention that it would be possible to derive analytical approximations for rare-event probabilities, for instance by assuming that the quantity of interest is approximately Poisson or normally distributed. The appeal of such an approach would lie in its simplicity, as it only requires first- and second-moment information. However, in the analytical approximations, it is difficult to quantify the estimation error, whereas the simulation-based methods outlined in the present paper are unbiased, and therefore do not suffer from this weakness, provided the number of simulations is large enough.

The rest of the presentation is organized in three parts. First, in Section 2, we explore the potential of conditional Monte Carlo in a one-dimensional setting, where we can frequently derive explicit closed-form expressions. Surprisingly, when moving to the far end of the upper tails, it is possible to avoid simulation altogether, as we derive a fully analytic representation. Then, in Section 3, we move to higher dimensions. Here, we apply conditional MC to remove the randomness coming from the random number of nodes. Finally, in Section 4, we present a further refinement of the conditional MC estimator, by combining it with importance sampling using a Strauss-type Gibbs process. To ease notation, we henceforth identify the Gilbert graph with its edge set, so that |G(XW)| yields the number of edges in the Gilbert graph on XW.

2 Conditional MC in Dimension 1

In this section, we consider the one-dimensional setting. More precisely, we consider a line segment W = [0, w] as sampling window. In Section 2.1, we describe a specific conditional MC scheme that leads to estimators for the rare-event probability of the number of edges being small. In a simulation study in Section 2.2, we show that these new estimators are substantially better than a crude MC approach.

Then, Section 2.3 discusses rare events corresponding to the number of missing edges, i.e., the number of point-pairs that are not connected by an edge. The analysis is motivated from the observation that the Erdős–Rényi graph with edge probability p ∈ [0,1] exhibits a striking duality with its complement. Specifically, the missing edges of this graph again form an Erdős–Rényi graph but now with probability 1 − p. In Section 2.3, we point out that in the Gilbert graph such a duality is much more involved. We still elucidate how to compute the probability of observing no missing edges or precisely one missing edge.

In this section, we assume that the points {Xi}i≥ 1 of the Poisson point process on \([0, \infty )\) are ordered according to their occurrence; that is, XiXj, whenever ij.

2.1 Few Edges

Henceforth, let

$$E_{\leq k} := \{|G(X \cap [0, w])| \le k\}$$

denote the event that the number of edges in the Gilbert graph on X ∩ [0, w] is at most k ≥ 0. For fixed k and large w, the probability

$$p_{\leq k} := \mathbb{P}(E_{\leq k} )$$

becomes small, and we discuss how to use both the natural ordering on the real half-line and the independence property of the Poisson point process to derive a refined estimator.

We focus only on the cases k = 0,1. In principle, the methods could be extended to cover estimation of probabilities of the form pk for k ≥ 2. However, for large values of k the combinatorial analysis becomes quickly highly involved; see Remark 2 for more details.

2.1.1 No Edges

To begin with, let k = 0. That is, we analyze the probability that all vertices in X ∩ [0, w] are isolated in the sense that their vertex degree is 0. The key idea for approaching this probability is to note that E0 := E≤ 0 occurs if and only if X ∩ [X1,(X1 + 1) ∧ w] = and the Gilbert graph restricted to X ∩ [X1 + 1, w] does not contain edges; see Fig. 1. Here, we adhere to the convention that [a, b] = if a > b.

Fig. 1
figure 1

For G(X ∩ [0, w]) = , the blue interval may not contain points of X and the Gilbert graph restricted to the red interval may not contain edges

According to the Palm theory for one-dimensional Poisson point processes, the process

$$X^{(1)} := (X - X_{1}) \cap [1, \infty)$$

again forms a homogeneous Poisson point process, which is independent of X1; see Last and Penrose (2017, Theorem 7.2). In particular, writing

$$\mathcal{F}_{1} := \sigma(X_{1}, X^{(1)})$$

for the σ-algebra generated by X1 and X(1) allows for a partial computation of p0 := p≤ 0 via conditional MC (Kroese et al. 2013, Chapter 9.4). More precisely, when computing \(\mathbb {P}(E_{0} | \mathcal {F}_{1})\) we explicitly throw away the information from the configuration of X inside the interval [X1, X1 + 1].

Theorem 1 (No edges)

Suppose that w ≥ 1. Then,

(1)

Proof

First, X1 is isolated if there are no further vertices in the interval [X1,(X1 + 1) ∧ w]. Moreover, after conditioning on X1, the remaining vertices form a Poisson point process in [(X1 + 1) ∧ w, w]; see Last and Penrose (2017, Theorem 7.2). Therefore,

as asserted. □

In other words, invoking the Rao–Blackwell theorem (Billingsley 1995), Theorem 1 indicates that the right-hand side of identity (1) is an attractive candidate for estimating p0 via conditional Monte Carlo. The Rao–Blackwell theorem is a powerful tool in situations where p0 is not available in closed form. Note that elementary properties of the conditional expectation imply that the conditional MC estimator \(\mathbb {P}(E_{0} | \mathcal {F}_{1})\) is unbiased and exhibits smaller variance than the crude MC estimator .

Moreover, the right-hand side of identity (1) features another indicator of an isolation event. Hence, it becomes highly attractive to refine the estimator further by proceeding iteratively. To make this precise, we define an increasing sequence \(X_{1}^{*} \le X_{2}^{*} \le \cdots \) of points of X recursively as follows. First, \(X_{1}^{*} = X_{1}\) denotes the left-most point of X. Next, once \(X_{m}^{*}\) is available,

$$X_{m + 1}^{*} := \inf\{X_{i} \in X: X_{i} \ge X_{m}^{*} + 1\}$$

denotes the first point of X to the right of \(X_{m}^{*} + 1\). Then, the event E0 occurs if none of the intervals \([X_{i}^{*}, X_{i}^{*} + 1]\) contains points from X; see Fig. 2.

Fig. 2
figure 2

For G(X ∩ [0, w]) = , the blue intervals may not contain any points

Of particular interest is the last index

$$I_{*} := \sup\{i \ge 1: X_{i}^{*} \le w\},$$

where \(X_{i}^{*}\) remains inside [0, w], together with the associated point

$$X_{*} := X_{I_{*}}^{*}.$$

If X1 > w, we set I = 0 and X = w. Let

$$\mathcal{F}^{*} := \sigma\big(X_{1}^{*}, X_{2}^{*}, {\dots} \big),$$

be the σ-algebra generated by \(\{X_{i}^{*} : i \geq 1\}\).

Theorem 2 (No edges – iterated)

Suppose that w ≥ 1. Then,

$$ \mathbb{P}(E_{0} |\mathcal{F}^{*}) = e^{-\lambda((I_{*} - 1)_{+} + (w - X_{*}) \wedge 1)} \quad \text{almost surely}. $$

Proof

To prove the claim, we first define the shifted process

$$X^{(m)} := (X - X_{m}^{*})\cap [1, \infty)$$

and write

$$\mathcal{F}_{m} := \sigma\big(X_{1}^{*},\dots, X_{m}^{*}, X^{(m)}\big)$$

for the σ-algebra generated by \(X_{1}^{*}, \dots , X_{m}^{*}\) and X(m). Observe that \(\mathcal {F}_{1} \supseteq \mathcal {F}_{2} \supseteq {\cdots } \supseteq \mathcal {F}^{*}\). In particular, by the tower property of conditional expectation,

Hence, it suffices to show that for every m ≥ 0,

(2)

because \(X^{(I_{*})} \cap [1, w - X^{*}] = \emptyset \). To achieve this goal, we proceed by induction on m. For m = 0 and m = 1 we are in the setting of Theorem 1. To pass from m to m + 1, the induction hypothesis yields that

(3)

Since \(X_{m + 1}^{*}\) is the first point of X after \(X_{m}^{*} + 1\), by applying Theorem 1 to the Poisson point process X(m), we obtain

(4)

Combining (3) and (4) yields the assertion. □

The conditional MC estimator from Theorem 2 leads to Algorithm 1. Here, Exp(λ) is an exponential random variable with parameter λ that is independent of everything else.

figure h

2.1.2 At Most One Edge

Here, let k = 1; i.e., we propose an estimator for the probability p1 := p≤ 1 that the Gilbert graph on X ∩ [0, w] has at most one edge. Let

$$I^{+} := \inf\{i \ge 2: X_{i} - X_{i - 1} \le 1\}$$

be the index of the first point of X whose predecessor is at distance at most 1. Putting \(X^{+} := (X - X_{I^{+}}) \cap [1, \infty )\), Fig. 3 illustrates that the event E≤ 1 is equal to the intersection of the events \(\{X_{I^{+} + 1} \ge X_{I^{+}} + 1\}\) and \(\{G(X^{+}\cap [1, w - X_{I^{+}}]) = \emptyset \}\).

Fig. 3
figure 3

For |G(X ∩ [0, w])|≤ 1, the blue interval may not contain any points and the Gilbert graph restricted to the red interval may not contain edges

Moreover, we write

$$\mathcal{F}^{+} := \sigma(X_{1}, X_{2}, \dots, X_{I^{+}}, X^{+})$$

for the σ-algebra generated by \(X_{1}, X_{2}, \dots , X_{I^{+}}\) and X+.

Theorem 3 (At most one edge)

Suppose that w ≥ 1. Then,

(5)

Proof

Since the proof is very similar to that of Theorem 1, we only point to the most important ideas. Equation 5 obviously holds for \(X_{I^{+}} > w\). For the case \(X_{I^{+}} \le w\), relying again on the Palm theory of the one-dimensional Poisson process, we have

as asserted. □

Similarly to Section 2.1.2, Theorem 3 yields a conditional MC estimator, which we describe in Algorithm 2.

figure k

Remark 1 (Symmetric window)

The methods described above could also be applied for a Poisson point process in an interval of the form [−w/2, w/2]. Then, in addition to I and I+, we would need to take into account the corresponding quantities located to the left of the origin.

Remark 2 (k ≥ 2)

The method to handle k = 0,1 could certainly be extended to larger k ≥ 2, but the configurational analysis would quickly become very involved. For instance, for k = 2 we would first need to require that the interval \([X_{I_{*} - 1}, X_{I_{*} - 1} + 1]\) contains only the point \(X_{I_{*}}\). Next, if \(X_{I_{*} + 1} \le X_{I_{*}} + 1\), then there may be no more edges to the right of \(X_{I_{*} + 1}\). On the other hand, the analysis will be different if \(X_{I_{*} + 1} > X_{I_{*}} + 1\), because then there can still be one edge in the graph to the right of \(X_{I_{*} + 1}\).

2.2 Simulations

In this section, we illustrate how to estimate the rare-event probabilities p0 and p1 via MC. After presenting the crude MC estimator, we illustrate how the conditional MC estimators described in Section 2.1 improve the efficiency drastically. In the simulation study, we estimate p0 and p1 for sampling windows of size w ∈{5,7.5,10} and Poisson intensity λ = 2. Both the crude MC and the conditional MC estimator are computed based on N = 106 samples.

To estimate the rare-event probabilities pk using crude MC, we draw iid samples \(X^{(1)}, \dots , X^{(N)}\) of the Poisson point process on [0, w] and set

for the proportion of samples leading to a Gilbert graph with at most k edges.

The estimates reported in Table 1 reveal that as the size of the sampling window grows, the rare-event probabilities decrease rapidly and that the estimators exhibit a high relative error.

Table 1 Crude MC estimates for p0 = p≤ 0 and p≤ 1 for sampling intervals of size w ∈{5,7.5,10} and Poisson intensity λ = 2

Next, we estimate pk for k = 0,1 with the conditional MC methods described in Algorithms 1 and 2. If \(P^{(1)}, \dots , P^{(N)}\) denote the simulation outputs, then we set

$$ p^{\textsf{Cond}}_{\leq k} := \frac 1N {\sum}_{i \le N} P^{(i)}.$$

The corresponding estimates shown in Table 2 highlight that the theoretical improvements over crude MC predicted from Theorems 2 and 3 also manifest themselves in the simulation study. This is particularly striking in the setting k = 0, where the variance can be reduced by several orders of magnitude.

Table 2 Conditional MC estimates for p0 = p≤ 0 and p≤ 1 for sampling intervals of size w ∈{5,7.5,10} and Poisson intensity λ = 2

2.3 Few Missing Edges

We now focus on computing the rare-event probabilities of having few missing edges. More precisely, we write

$$M_{w} := \binom{|X\cap [0, w]|}2 - |G(X\cap [0, w])|$$

for the number of edges of G(X ∩ [0, w]) that are missing from the complete graph on the vertices X ∩ [0, w]. We write

$$p^{\prime}_{\leq k} := \mathbb{P}(M_{w} \le k)$$

for the probability that at most k ≥ 0 edges are missing. Surprisingly, this seemingly more complicated task is more accessible than the probability of seeing few edges considered in Section 2.1, as both \(p^{\prime }_{0} = p^{\prime }_{\leq 0}\) and \(p^{\prime }_{\leq 1}\) are amenable to closed-form expressions.

For \(p^{\prime }_{0}\), the key insight is to note that Mw = 0 if and only if X ∩ [X1 + 1, w] = ; see Fig. 4.

Fig. 4
figure 4

For Mw = 0, the red interval may not contain any points

Theorem 4 (No missing edges)

Suppose that w ≥ 1. Then,

$$p^{\prime}_{0} = e^{- \lambda (w - 1)} + (w -1)\lambda e^{-\lambda(w - 1)}.$$

Proof

As observed in the above remark, we need to compute \(\mathbb {P}(X\cap [X_{1} + 1, w] = \emptyset )\). Hence, invoking the void probability for a Poisson point process,

$$ \begin{array}{@{}rcl@{}} p^{\prime}_{0} = {\int}_{0}^{\infty}\lambda e^{-\lambda x_{1}} \mathbb{P}(X\cap [x_{1} + 1, w] = \emptyset) \mathrm{d} x_{1} = e^{- \lambda (w - 1)} + {\int}_{0}^{w - 1}\lambda e^{-\lambda x_{1}} e^{-\lambda(w - 1 - x_{1})} \mathrm{d} x_{1}, \end{array} $$

which equals eλ(w− 1) + (w − 1)λ eλ(w− 1), as asserted. □

Next, we compute the probability of observing at most one missing edge.

Theorem 5 (At most one missing edge)

Suppose that w ≥ 2. Then,

$$p^{\prime}_{\leq 1} = p^{\prime}_{0} + \frac{\lambda^{2}(w - 2)^{2}}2 e^{-\lambda w} + (w - 3/2) \lambda^{2}e^{-\lambda (nw- 1)}.$$

Proof

We decompose \(p^{\prime }_{\leq 1}\) as

$$p^{\prime}_{\leq 1} = p^{\prime}_{0} + \mathbb{P}\big(M_{w} = 1, X\cap [X_{1} + 2, w] \ne \emptyset\big) + \mathbb{P}\big(M_{w} = 1, X\cap [X_{1} + 2, w] = \emptyset\big)$$

and compute the second and third probability separately. This corresponds to Case 1 and Case 2 below. Note that under the event {X ∩ [X1 + 2, w]≠}, we have |X ∩ [0, w]| = 2, since more points would imply at least two missing edges. □

Case 1.

(X ∩ (X1, X1 + 2] = and |X ∩ [X1 + 2, w]| = 1)

Conditioning on X1, the probability of this event becomes

$${\int}_{0}^{w - 2} \lambda e^{-\lambda x_{1}}\mathbb{P}\big(X \cap (x_{1}, x_{1} + 2] = \emptyset, |X \cap[ x_{1} + 2, w]|= 1\big) \mathrm{d} x_{1}.$$

Inserting the void probability for the Poisson process, we arrive at

$$ \begin{array}{@{}rcl@{}} {\int}_{0}^{w - 2} \lambda e^{-\lambda x_{1}}e^{-2\lambda} \lambda (w - 2 - x_{1}) e^{-\lambda (w - 2 - x_{1})} \mathrm{d} x_{1} = \lambda^{2} e^{-\lambda w}{\int}_{0}^{w - 2} (w - 2 - x_{1}) \mathrm{d} x_{1} = \frac{\lambda^{2}(w - 2)^{2}}2 e^{-\lambda w}. \end{array} $$

It remains to treat the case X ∩ [X1 + 2, w] = . First, note that conditioned on X1 = x1, the point process X ∩ [x1, w] is Poisson. Thinking of this Poisson point process to be formally extended to \(-\infty \) to the left, we write \(X_{1}^{\prime }\) for the first Poisson point to the left of w. Then, Fig. 5 illustrates that the event {Mw = 1}∩{X ∩ [X1 + 2, w] = } is equivalent to \(X_{1}^{\prime } \in [X_{1} + 1, X_{1} + 2 ]\) and \(X\cap (X_{1}, X_{1}^{\prime } - 1] = \emptyset \).

Fig. 5
figure 5

Configuration where Mw = 1 and X ∩ [X1 + 2, w] = . Here, \([X_{1}, X_{1}^{\prime } - 1]\) is in red and [X1 + 1, X1 + 2] in blue

For Case 2, we distinguish between the cases X1w − 2 and X1 ∈ [w − 2, w − 1].

Case 2a.

(X1w − 2, \(X_{1}^{\prime } \in [X_{1} + 1, X_{1} + 2]\) and \(X \cap (X_{1}, X_{1}^{\prime } - 1] = \emptyset \)) Then, we compute the desired probability as

$$ \begin{array}{@{}rcl@{}} {\int}_{0}^{w - 2}\lambda e^{-\lambda x_{1}}{\int}_{x_{1} + 1}^{x_{1} + 2}\lambda e^{-\lambda (w - x_{1}^{\prime})}\mathbb{P}\big(X\cap (x_{1}, x_{1}^{\prime} - 1] = \emptyset\big) \mathrm{d} x_{1}^{\prime} \mathrm{d} x_{1} = {\int}_{0}^{w - 2}{\int}_{x_{1} + 1}^{x_{1} + 2}\lambda^{2} e^{-\lambda (w - 1)} \mathrm{d} x_{1}^{\prime} \mathrm{d} x_{1}, \end{array} $$

which equals (w − 2)λ2eλ(w− 1).

Case 2b.

(X1 ∈ [w − 2, w − 1], \(X_{1}^{\prime } \in [X_{1} + 1, w]\) and \(X\cap (X_{1}, X_{1}^{\prime } - 1] = \emptyset \)) Finally,

$$ \begin{array}{@{}rcl@{}} {\int}_{w - 2}^{w - 1}\lambda e^{-\lambda x_{1}}{\int}_{x_{1} + 1}^{w}\lambda e^{-\lambda (w - x_{1}^{\prime})}\mathbb{P}\big(X\cap(x_{1}, x_{1}^{\prime} - 1] = \emptyset\big) \mathrm{d} x_{1}^{\prime} \mathrm{d} x_{1} = {\int}_{w - 2}^{w - 1}{\int}_{x_{1} + 1}^{w}\lambda^{2} e^{-\lambda (w - 1)} \mathrm{d} x_{1}^{\prime} \mathrm{d} x_{1}, \end{array} $$

which equals \(= \frac {\lambda ^{2}}2 e^{-\lambda (w - 1)}.\) Assembling the different cases together concludes the proof.

3 Conditional MC in Higher Dimensions

In Section 2, we analyzed rare events related to few edges or few missing edges in a one-dimensional setting. There, closed-form expressions for conditional probabilities were derived from the natural ordering of the Poisson points. Now, we proceed to higher dimensions and also consider more general deviations from the mean number of edges. First, we again illustrate that substantial variance reductions are possible through a surprisingly simple conditional MC method. Loosely speaking, the Poisson point process consists of 1) an infinite sequence of random points in the window determining the locations of points and 2) a Poisson random variable determining the number of points in the sampling window. We use that after conditioning on the spatial locations, the rare-event probability is available in closed form. This type of Poisson conditioning is novel in a spatial rare-event estimation context, but has strong ties to approaches appearing earlier in seemingly unrelated problems. More precisely, for instance in reliability theory, related conditional MC schemes lead to spectacular variance reductions (Lomonosov and Shpungin 1999; Vaisman et al. 2015).

The rest of this section is organized as follows. First, Section 3.1 describes how to estimate the rare event probabilities related to too few and too many edges relative to their expected number through conditional MC. Then, Section 3.2 presents a simulation study illustrating that this estimator can reduce the estimation variance by several orders of magnitude.

3.1 Conditioning on a Spatial Component

We consider a full-dimensional sampling window \(W \subset \mathbb {R}^{d}\) and rare events of the form

$$F_{< a} := \{|G(X \cap W)| < (1 - a)\mu\}\quad \text{ and } \quad F_{>a} := \{|G(X \cap W)| > (1 + a) \mu\},$$

where \(\mu := \mathbb {E}[|G(X \cap W)|]\) denotes the mean (i.e., expected) number of edges, which can be estimated through simulations. Alternatively, if W is so large that edge effects can be neglected, then the Slivnyak–Mecke formula (Last and Penrose 2017, Theorem 4.4) gives the approximation \(\mu \approx \frac 12|W|\lambda ^{2} \kappa _{d}\), where κd denotes the volume of the unit ball in \(\mathbb {R}^{d}\).

The key idea for developing a conditional MC scheme is to use the explicit construction of a Poisson point process in a bounded sampling window. More precisely, let \(X_{\infty } = \{X_{n}\}_{n \ge 1}\) be an iid family of uniform random points in W and K be an independent Poisson random variable with parameter λ|W|. Then, {Xn}nK is a Poisson point process in W with intensity λ, Last and Penrose (2017, Theorem 3.6).

In conditional MC, we use the fact that the rare-event probabilities can be computed in closed form after we condition on the locations \(X_{\infty }\). More precisely, we let

$$ K_{<a}(X_{\infty}) := \inf\{k \ge 1: |G(\{X_{1},\dots, X_{k}\} \cap W)| \ge (1 - a)\mu\} $$

denote the first time where the Gilbert graph on the nodes \(\{X_{1}, \dots , X_{k}\}\) contains at least (1 − a)μ edges and refer to Fig. 6 for an illustration.

Fig. 6
figure 6

Conceptual illustration of K<a

Similarly, let

$$ K_{>a}(X_{\infty}) := \sup\{k \ge 1: |G(\{X_{1},\dots, X_{k}\} \cap W)| \le (1 + a)\mu\}$$

denote the largest k for which the Gilbert graph on the nodes \(\{X_{1}, \dots , X_{k}\}\) contains at most (1 + a)μ edges. Then,

$$ \begin{aligned} \mathbb{P}(F_{<a}) &= \mathbb{E}[\mathbb{P}(F_{<a} | X_{\infty})] = \mathbb{E}\big[F_{\textsf{Poi}}(K_{<a}(X_{\infty}) - 1)\big],\\ \mathbb{P}(F_{>a}) &= \mathbb{E}[\mathbb{P}(F_{>a} | X_{\infty})] = \mathbb{E}\big[1 - F_{\textsf{Poi}}(K_{>a}(X_{\infty} ))\big], \end{aligned} $$
(6)

where \({F_{\textsf {Poi}}:\mathbb {Z}_{\ge 0} \to [0, 1]}\) denotes the cumulative distribution function of a Poisson random variable with parameter λ|W|.

3.2 Numerical Results

We sample planar homogeneous Poisson point processes \(X^{\textsf {S}} = \{X^{\textsf {S}}_{i}\}_{i \ge 1}\), \(X^{\textsf {M}} = \{X^{\textsf {M}}_{i}\}_{i \ge 1}\) and \(X^{\textsf {L}}= \{X^{\textsf {L}}_{i}\}_{i \ge 1}\) with intensity λ = 2 in windows of size 20 × 20, 25 × 25, and 30 × 30, respectively. Here, S,M, and L stand for small, medium, and large, respectively.

In Section 1, we mentioned that a major challenge in devising efficient estimators for rare-event probabilities related to the edge count comes from a qualitatively different tail behavior: light on the left, heavy on the right. In other words, we expect that in the left tail, we see changes throughout the sampling window, whereas in the right tail, a singular particular configuration in a small part of the window is sufficient to induce the rare event. We recall that the left tail of a random variable Z refers to the probabilities \(\mathbb {P}(Z \le r)\) for small r and the right tail refers to the probabilities \(\mathbb {P}(Z \ge r)\) for large r.

Although on a bounded sampling window, the theoretical difference between the left and the right tail is subtle, we illustrate in Table 3 that it does become visible when considering the quantiles Qα and Q1−α for the number of edges if α is small. Here, the empirical quantiles for 106 samples of the edge counts in a (20 × 20)-window are shown. For instance the 1%-quantile is 16.4% lower than the mean, which is a similar deviation as the 18% exceedance of the 99%-quantile. However, when moving to the 0.01%-quantile, then it is 25.2% lower than the mean, whereas the corresponding 99.99% quantile exceeds the mean by 30.0%. These figures are an early indication of the difference in the tails that will reappear far more pronouncedly in the numerical results concerning the estimation of the rare-event probabilities \(\mathbb {P}(F_{<0.2})\) and \(\mathbb {P}(F_{>0.2})\).

Table 3 Empirical quantiles for the number of edges in a (20 × 20)-window: absolute values (upper two rows) and relative deviation from the mean μ (lower two rows)

In the rest of the section, we estimate the rare-event probabilities

$$q_{< 0.2} := \mathbb{P}(F_{<0.2}) \quad \text{ and }\quad q_{> 0.2} := \mathbb{P}(F_{>0.2})$$

corresponding to 20% deviations from the mean. Here, we draw N = 105 samples of \(X_{\infty }\). Then, taking into account the representation (6), we set

$$ \begin{array}{@{}rcl@{}} p^{\textsf{Cond}}_{< 0.2} &:=& \frac 1N{\sum}_{i \le N}F_{\textsf{Poi}}\big(K_{<a}(X_{\infty}(i))- 1\big),\\ p^{\textsf{Cond}}_{> 0.2} &:=& \frac 1N{\sum}_{i \le N}\big(1 - F_{\textsf{Poi}}\big(K_{>a}(X_{\infty}(i) )\big)\big). \end{array} $$

The variances of the crude MC estimators are equal to q< 0.2(1 − q< 0.2).

We report the estimates \(p^{\textsf {Cond}}_{< 0.2}\) and \(p^{\textsf {Cond}}_{> 0.2}\) in Table 4. First, we see that the exceedance probabilities \(p^{\textsf {Cond}}_{> 0.2}\) are always smaller than the corresponding undershoot probabilities \(p^{\textsf {Cond}}_{< 0.2}\). This supports the preliminary impression of the difference in the tail behavior hinted at in Table 3. Moreover, in all examples conditional MC reduces the estimation variance massively and the efficiency gains become more pronounced the rarer the event.

Table 4 Estimates for q< 0.2 and q> 0.2 based on conditional MC for different window sizes based on N = 105 samples

4 Importance Sampling

The conditional MC estimators constructed in Section 3 take into account that an atypically large number of Poisson points leads to a Gilbert graph exhibiting considerably more edges than expected. However, not only the number but also the location of points play a pivotal role. For instance if the points tend to repel each other, then we would typically observe fewer edges. Similarly, clustered point patterns should induce more edges.

In order to implement these insights, we resort to the technique of importance sampling (Kroese et al. 2013, Chapter 9.7). That is, we draw samples from a point process with distribution \(\mathbb {Q}\) for which the rare event becomes typical and then correct the estimation bias by weighting with likelihood ratios. In Sections 4.1 and 4.2 below, we explain how to implement these steps through configuration-dependent birth-death processes reminiscent of the Strauss process from spatial statistics (Møller and Waagepetersen 2004). Finally, in Section 4.3, we illustrate in a simulation study how importance sampling of the spatial locations reduces the estimation variance further.

4.1 Lower Tails

The key observation is that under the rare event of seeing exceptionally few edges, we expect a repulsion between points. More precisely, the most likely reason for the rare event are changes to the configuration of the underlying Poisson point process throughout the entire sampling window. The large-deviation analysis of Seppäläinen and Yukich (2001) suggests to perform importance sampling where, instead of considering the distribution \(\mathbb {P}\) of the a priori Poisson point process, we draw samples according to a different stationary point process with distribution \(\mathbb {Q}\) such that under \(\mathbb {Q}\), the original rare event becomes typical and whose deviation from \(\mathbb {P}\), as measured through the Kullback–Leibler divergence \(h(\mathbb {Q} | \mathbb {P})\), is minimized. We implement this repulsion by a dependent thinning inspired from the Strauss process.

Here, we start from a realization of the Gilbert graph on n0 = ⌊λ|W|⌋ iid points \(\{X_{1}, \dots , X_{n_{0}}\}\), and then, we thin out points successively. An independent thinning of points would give rise to uniformly distributed locations without interactions. In the importance sampling, we thin instead via a configuration-dependent birth-death process; see e.g., Chapter 9.7 of Kroese et al. (2013).

To describe the death mechanism more precisely, we draw inspiration from the Strauss process and choose the probability pi to remove point Xi proportional to \(\gamma ^{\deg (X_{i})}\), where \(\deg (X_{i})\) denotes the degree of Xi in the Gilbert graph and γ > 1 is a parameter of the algorithm. Algorithm 3 shows the pseudo-code leading to the importance sampling estimator \(q^{\textsf {IS}}_{<a}\) for q<a; in the algorithm we write |X| for the number of points of X. To understand the principle behind Algorithm 3, we briefly expound on the general approach in importance sampling, and refer the reader to Chapter 9.7 of Kroese et al. (2013) for an in-depth discussion.

figure m

As mentioned above, when thinning out points independently until the number of edges in the Gilbert graph falls below μ(1 − a), we would arrive at the random variable \(K_{<a}(X_{\infty })\) from Section 3. However, thinning out according to a configuration-dependent probability distorts its distribution towards a probability measure \(\mathbb {Q}\) that is in general different from the true distribution \(\mathbb {P}\). Still, by construction, \(\mathbb {Q}\) is absolutely continuous with respect to \(\mathbb {P}\), and we let \(\rho := \mathrm {d} \mathbb {P} / \mathrm {d} \mathbb {Q}\) be the likelihood ratio. Then, from a conceptual point of view, Algorithm 3 first draws N ≥ 1 iid samples \(X_{\infty }(1), \dots , X_{\infty }(N)\) from the distorted distribution \(\mathbb {Q}\) with associated likelihood ratios \(\rho _{1}, \dots , \rho _{N}\), and then computes

$$q^{\textsf{IS}}_{<a} := \frac 1N \sum\limits_{i \le N} \rho_{i} F_{\textsf{Poi}}(K_{<a}(X_{\infty}(i)) - 1),$$

where we recall that FPoi denotes the cumulative distribution function of a Poisson random variable with parameter λ|W|.

We now explain why the \(q^{\textsf {IS}}_{<a}\) returned by the algorithm is a good approximation for the probability q<a. The general philosophy is that when performing importance sampling with a proposal distribution that is close to the conditional distribution under the rare event, we can be optimistic that importance sampling decreases the estimation variance. For instance, in Kroese et al. (2013, Section 13.2), it is shown that an unbiased zero-variance estimator could be obtained if the proposal distribution coincided with the conditional distribution under the rare event. In our setting, this means that we can expect \(q^{\textsf {IS}}_{<a}\) to be a good approximation for the probability q<a if the proposal density in the importance sampling is close to the conditional distribution under the event F<a.

To achieve this goal, intuitively, we would like to choose the thinning parameter γ > 1 such that the number of edges in the rare event F<a should match the expected number of edges under the thinning. To develop a heuristic for this choice, we consider a Strauss process, where we restrict to the two-dimensional setting to allow for an accessible presentation. To make the article self-contained, we briefly recall the definition of a Strauss process from Møller and Waagepetersen (2004). The Strauss process in the observation window W is a Gibbs process whose density f(φ) with respect to a unit-intensity Poisson point process on W is given by

$$f(\varphi) = \alpha \beta^{|\varphi|}\gamma^{s_{R}(\varphi)},$$

with parameters R > 0 (interaction radius), β > 0 (interaction parameter) and γ ∈ [0,1) (interaction strength) where α > 0 is a normalizing constant and \(s_{R}(\varphi ) = \#\{\{x,y\} \subseteq \varphi \colon 0 < |x - y| \le R\}\) denotes the number of unordered point pairs in φ within a distance at most R.

Hence, matching the number of edges in the rare event to the expected number of edges leads to the equation

$$ \begin{array}{@{}rcl@{}} (1 - a) \mu = (\lambda^{\textsf{Str}})^{2}{{\int}_{0}^{1}} \pi r \rho^{\textsf{Str}}(r) \mathrm{d} r, \end{array} $$
(7)

where λStr > 0 and \(\rho ^{\textsf {Str}}: [0,\infty ) \to [0,\infty )\) denote the intensity and pair-correlation function of the Strauss process, respectively (Møller and Waagepetersen 2004). In contrast to the Poisson setting, neither λStr nor ρStr are available in closed form. Still, both quantities admit accurate saddle-point approximations λPS > 0 and \(\rho ^{\textsf {PS}}: [0, \infty ) \to [0,\infty )\) that are ready to implement in the Strauss case (Baddeley and Nair 2012).

More precisely, λPS is the unique positive solution of the equation

$$\lambda^{\textsf{PS}} G e^{\lambda^{\textsf{PS}} G} = \lambda G,$$

where G = (1 − β)π and \(\beta = \log (\gamma )\), see Baddeley and Nair (2012). Then, recalling that we work in a planar setting, we put

$$\rho^{\textsf{PS}}(r) := \beta \exp\big((1 - \beta)^{2} b(r/2) \lambda^{\textsf{PS}}\big),$$

where

$$b(r/2) := 2 \cos^{-1}(r/2) - r \sqrt{1 - (r/2)^{2}} $$

denotes the intersection area of two unit disks at distance r. Inserting these approximations into (7) and solving the resulting implicit equation yields \(\beta \approx \log (1.018)\) and a value of γ ≈ 1.018.

4.2 Upper Tails

Similar to the lower tails, we can strengthen the estimator by combining conditional MC with importance sampling on the spatial locations. For the upper tails, it is natural to devise an importance sampling scheme favoring clustering rather than repulsion. From the point of view of large deviations, this phenomenon was considered in Chatterjee and Harel (2020). There, it is shown that all excess edges come from a ball of size 1 containing a highly dense configuration of Poisson points. More precisely, the method is particularly powerful in settings where the rare event is the epitome of a condensation phenomenon. That is, a peculiar behavior of the point process in a small part in the window becomes the most likely explanation of the rare event. As laid out in Chatterjee and Harel (2020), at least in the asymptotic regime of large deviations, the rare event of observing too many edges is governed by the above-described condensation phenomenon.

We propose to take the clustering into account via a birth mechanism favoring the generation of points in areas that would lead to a large number of additional edges. To ease implementation, the density of the birth mechanism is discretized and remains constant in bins of a suitably chosen grid. The density in a bin at position xW is proportional to γn(x), where γ > 1 is a parameter governing the strength of the clustering and n(x) denotes the number of Poisson points in a suitable neighborhood around x, such as the bin containing x together with all adjacent bins.

Similar to the lower tails case, a subtle point in this approach pertains choosing the parameter γ > 1. Unfortunately, an attractive Strauss process is ill-defined in the entire Euclidean space, so that the saddle-point approximation from Section 4.1 does not apply. In Section 4.3 below, we therefore rely on a pilot run suggesting γ = 1.01 as a good choice for further variance reduction. Although in this pilot run, the number of simulations is small, and estimates of the variance are still volatile, we found that it provides a good indication for simulations on a larger scale.

Section 3.2 revealed that even with a sample size of N = 105 the conditional MC estimators still exhibit a considerable relative error. Now, we illustrate that importance sampling may be an appealing option to reduce this error.

Table 5 reports the estimates \(q^{\textsf {IS}}_{<0.2}\) and \(q^{\textsf {IS}}_{>0.2}\) for different sizes of the sampling window as in Section 3.2. In the left tail, we see massive variance improvements when compared to the crude MC estimator. In the right tail, the gains are substantial, but a little less pronounced. This matches the intuition that the root of the rare events in the right tails should be a condensation phenomenon, which goes against the heuristic of changing the point process homogeneously throughout the window.

Table 5 Estimates for q< 0.2 and q> 0.2 based on importance sampling with Strauss-type processes for different window sizes based on N = 105 samples