Resample-smoothing of Voronoi intensity estimators

Voronoi estimators are non-parametric and adaptive estimators of the intensity of a point process. The intensity estimate at a given location is equal to the reciprocal of the size of the Voronoi/Dirichlet cell containing that location. Their major drawback is that they tend to paradoxically under-smooth the data in regions where the point density of the observed point pattern is high, and over-smooth where the point density is low. To remedy this behaviour, we propose to apply an additional smoothing operation to the Voronoi estimator, based on resampling the point pattern by independent random thinning. Through a simulation study we show that our resample-smoothing technique improves the estimation substantially. In addition, we study statistical properties such as unbiasedness and variance, and propose a rule-of-thumb and a data-driven cross-validation approach to choose the amount of smoothing to apply. Finally we apply our proposed intensity estimation scheme to two datasets: locations of pine saplings (planar point pattern) and motor vehicle traffic accidents (linear network point pattern).

spaces, which is based on repeated independent thinnings of the point process/pattern. Through a simulation study we show that our resample-smoothing technique improves the estimation significantly. In addition, we study statistical properties such as unbiasedness and variance, and propose a rule-of-thumb and a data-driven cross-validation approach to choose the amount of thinning/smoothing to apply. We finally apply our proposed intensity estimation scheme to two datasets: locations of pine saplings (planar point pattern) and motor vehicle traffic accidents (linear network point pattern).
Keywords Adaptive intensity estimation · Complete separable metric space · Independent thinning · Point process · Resampling · Voronoi intensity estimator 1 Introduction In point pattern analysis (Van Lieshout, 2000;Diggle, 2014;Baddeley et al., 2015), exploratory data analyses often start with a non-parametric analysis of the spatial intensity of events/data points. The intensity function, which is a first order moment characterisation of the point process assumed to have generated the data, reflects the abundance of points in different regions and may be seen as a "heat map" for the events. For most datasets, assuming that the underlying point process is homogeneous, i.e. that its intensity function is constant, is rarely a realistic assumption. Hence, the natural way to start is to assume inhomogeneity for the underlying point process.
The most prominent approach to non-parametric intensity estimation is undoubtedly kernel estimation (Diggle, 2014;Baddeley et al., 2015;Cronie and Van Lieshout, 2018). A key point with kernel intensity estimation, and kernel-based estimation in general, is that equally much smoothing is applied to the whole dataset. The degree of smoothing is controlled by a smoothing parameter, the so-called bandwidth, and the resulting estimates heavily depend on the choice of bandwidth. A small bandwidth may result in under-smoothing whereas a large bandwidth might result in over-smoothing of the intensity. Regarding the bandwidth selection, for point processes/patterns in Euclidean spaces some progress has been made (Cronie and Van Lieshout, 2018).
Concerning other spatial domains, recently there has been an increasing interest in point patterns on linear networks (Okabe and Sugihara, 2012;Baddeley et al., 2015;Rakshit et al., 2018); examples of data include street crimes or traffic accidents on a road network (of a city). Here the matter of kernel estimation is more delicate due to the geometry of the underlying network and the methodology is still under development. Borruso (2003Borruso ( , 2005Borruso ( , 2008 proposed several methods for kernel smoothing of network data without discussing statistical properties. Xie and Yan (2008) defined a kernel-based intensity estimator for network point patterns without taking the topography of the network into consideration and as a result the estimation errors tended to be large, thus making the estimator heavily biased. Okabe et al. (2009) further introduced a class of so-called equal-split network kernel density estimators which support both continuous and discontinuous schemes. By exploiting properties of diffusion on networks, McSwiggan et al. (2017) developed a kernel estimation method based on the heat kernel, which is the appropriate linear network analogue of the Gaussian kernel. In addition, Moradi et al. (2018) ex-tended the classical spatial edge corrected kernel intensity estimator to point patterns on linear networks.
As a consequence of e.g. covariates, such as demography and human mobility, it is quite common to encounter situations where there are notable abrupt spatial changes in the distribution of the events, with a large number of events in particular parts of the study region and nearly empty parts close by. E.g., street crimes or traffic accidents tend to happen in particular streets/roads/junctions and they are often surrounded by empty neighbourhoods. The classical kernel estimation approach does often not fit such types of data.
We argue, similarly to Barr and Schoenberg (2010), that kernel-based approaches may be unsatisfactory when there are sharp boundaries between parts with high and low intensity. Indeed, a fixed kernel smoothing bandwidth results in over-smoothing in parts with high intensity and under-smoothing in low-intensity parts . In addition, choosing a fixed bandwidth is itself a well studied and challenging problem Cronie and Van Lieshout, 2018). By considering an adaptive estimator, i.e. an estimator that adapts locally to the distribution of events, we may reduce such problems when estimating the intensity function Diggle, 2014;Silverman, 1986).
A first idea would be to consider adaptive kernel estimators, which use an individual bandwidth for each point of the point pattern or a spatially varying bandwidth function. In the planar case, i.e. in the 2-dimensional Euclidean setting, some efforts have been made (Davies and Hazelton, 2010;Diggle, 2014;Davies et al., 2016;Davies and Baddeley, 2018). The issue with adaptive kernel estimation, however, is that optimal bandwidth selection becomes even more challenging and important (Diggle, 2014;Silverman, 1986).
As an alternative, one could consider an approach without any choice of tuning parameters, e.g. a tessellationbased approach (Van Lieshout, 2012;Schaap, 2007). One such approach is provided by Voronoi intensity estimation (Ord, 1978;Barr and Schoenberg, 2010;Okabe and Sugihara, 2012), defined such that within a given Voronoi cell of the point pattern the intensity estimate is set to the reciprocal of the size of that Voronoi cell (Okabe et al., 2000). When employing the Voronoi intensity estimator, one thing that quickly becomes evident is that it often accentuates local features too much, in particular in regions with high event density. This reflects a previously observed phenomenon: adaptive estimators, such as the Voronoi intensity estimator, may smooth too little whereas kernel estimators may smooth too much in dense regions (Baddeley et al., 2015, Section 6.5.2). Hence, one should be able to find some mid-dle ground and we here aim at providing a contribution to that.
Our idea is simple. In dense parts surrounded by empty neighbourhoods, Voronoi intensity estimators tend to smooth too little, thus generating excessive peaks in the intensity estimate in those parts. By removing points in such a dense part we reduce the peaks, which results in a smoother intensity estimate, with a shape more similar to the true intensity function. However, the problem of doing this "manually" is twofold: 1) it is not clear which specific points we should remove, and 2) we need to compensate for the reduced total mass. To solve these issues, we propose to independently thin the original point pattern some m ≥ 1 times, according to some retention probability, in order to obtain m different point patterns and thereby m different Voronoi intensity estimates. In order to compensate for the reduced mass, we then scale each of the m estimates by the applied retention probability and use the corresponding average as final estimate of the intensity function. We propose this technique for point patterns in arbitrary metric spaces.
The paper is structured as follows. In Section 2 we give a short background on point processes and intensity estimation. In Section 3 we introduce our resamplesmoothing technique, study its statistical properties and discuss ways to choose the amount of smoothing, i.e. thinning, to apply. In Section 4 we evaluate our approach numerically for a few different planar point processes and in Section 5 we apply our methodology to two datasets: a planar point pattern and a linear network point pattern. Section 6 contains a discussion and some directions for future work and in the Appendix we provide the proofs of the theoretical results in the paper as well as bias and variance plots for the simulation study in Section 4.

Preliminaries
Let X be a simple point process in an arbitrary space S, by which we here mean a complete separable metric space with associated metric/distance d(·, ·) (Daley and Vere-Jones, 2008). Throughout, all subsets A ⊆ S considered are Borel sets and we endow S with some suitable locally finite Borel reference measure A → |A| ≥ 0, A ⊆ S; we denote integration with respect to this measure by du.
A realisation x = {x 1 , . . . , x n } ⊆ S, n ≥ 0, of X, i.e. an almost surely (a.s.) finite over bounded Borel sets (locally finite) collection of distinct points in S, will be referred to as a point pattern.
The cardinality of the set X ∩ A, A ⊆ S, will be denoted by N (X ∩ A) ∈ {0, 1, . . .} and we note that by definition we a.s. have N (X ∩ A) < ∞ for bounded A ⊆ S and N (X ∩ {u}) ∈ {0, 1} for any u ∈ S. This setup is the usual one in the general study of point processes and common examples include: Van Lieshout, 2000;Diggle, 2014;Baddeley et al., 2015).
-The underlying space is given by a linear network, i.e., a union is the shortest path distance, which gives the shortest length of any path joining u, v ∈ L (Okabe and Sugihara, 2012;Rakshit et al., 2017). Treated as a graph with vertices given by the intersections and endpoints of the line segments, one assumes that L is connected. The measure |·| here corresponds to integration with respect to arc length. -The point process X generates collections of points on the sphere S = αS d−1 = {x ∈ R d : x d = α}, α > 0, d ≥ 1, where d(·, ·) is the great circle distance and | · | is the spherical measure (Lawrence et al., 2016;Møller and Rubak, 2016).
We emphasise that in each of the above cases there exist other metrics and measures which may be more suited for a particular context.
At times, we will assume that X is stationary, or invariant. More specifically, there is a family of transformations/shifts {θ s : s ∈ S}, θ s : S → S, along S, which induces a so-called flow, under which the distribution of θ s X = {θ s (x) : x ∈ X} coincides with that of X for any s ∈ S. The underlying assumption will be that S is a so-called (unimodular) homogeneous space with a fixed origin o ∈ S, with d(·, ·) chosen such that it metrizes S and | · | chosen to be the associated (left) Haar measure (Last, 2010;Schneider and Weil, 2008). To exemplify, in Euclidean spaces with | · | chosen to be Lebesgue measure, we let θ s (u) = u+s ∈ R d , u, s ∈ R d , which yields the classical notion of stationarity, and on a sphere with the corresponding spherical measure we consider the orthogonal group of rotations. Note that a more general setting is also possible (Kallenberg, 2017, Chapter 7).

Intensity functions
To characterise the first moment of X, i.e. the marginal distributional properties of its points, we consider its intensity function ρ : S → [0, ∞). It may be defined through the Campbell formula (Daley and Vere-Jones, 2008) which states that for any measurable function In particular, for any A ⊆ S and if X is stationary then ρ(u) ≡ ρ ∈ (0, ∞) for any u ∈ S. Since for any infinitesimal neighbourhood du ⊆ S of u ∈ S with size/measure du, we have that E[N (X ∩ du)] coincides with the probability of finding a point of X in du, this probability is given by ρ(u)du.

Independent thinning
A key ingredient in our smoothing technique is the notion of independent thinning (Chiu et al., 2013, Section 5.1): given some measurable retention probability function p(u) ∈ (0, 1], u ∈ S, we run through the points of X and delete a point x ∈ X with probability 1 − p(x), independently of the deletions carried out for the other points of X. It follows that the resulting thinned process has intensity where ρ(·) is the intensity of the original process X (Chiu et al., 2013, Section 5.1). For further details on the thinning of point processes, see e.g. Møller and Schoenberg (2010) and Daley and Vere-Jones (2008, Section 11.3.).
It is worth mentioning that a Poisson process stays Poissonian after independent thinning (Daley and Vere-Jones, 2008, Exercise 11.3.1) and, in addition, the independent thinning of an arbitrary point process X with low retention probability results in a point process which, from a distributional point of view, is approximately a Poisson process , Section 9.2.2).

Voronoi tessellations
The next key ingredient in our estimation scheme is the Voronoi/Dirichlet tessellation of a point pattern x = {x 1 , . . . , x n } contained in some subset W ⊆ S (Chiu et al., 2013;Okabe et al., 2000). Generally speaking, a tessellation of W is a tiling such that i) the union of all tiles constitutes all of W , and ii) the interiors of any two tiles have empty intersections.
The Voronoi/Dirichlet cell V x associated with x ∈ x consists of all u ∈ S which are closer to x than any y ∈ x \ {x}, i.e.
The tiling {V x } x∈x is referred to as the Voronoi/Dirichlet tessellation generated by x. Clearly, the shape of each V x depends on the distance d(·, ·) chosen for S and its size, |V x |, depends on the chosen reference measure | · |.

Intensity estimation
Given a point pattern x = {x 1 , . . . , x n } in some study region W ⊆ S, |W | > 0, we next set out to estimate ρ(u), u ∈ W , under the assumption that x is a realisation of X ∩ W . Before going into details about specific estimators, we briefly mention how different estimators' performances may be evaluated and compared. To evaluate the performance of an estimator ρ(·) = ρ(·; X, W ) of ρ(u), u ∈ W , it is common practice to employ the Mean Integrated Square Error (MISE): where bias( ρ(u)) = E[ ρ(u)] − ρ(u). Given k ≥ 1 realisations of X ∩ W , to obtain an estimate of MISE we average over the integrated square errors generated by each of the k realisations. Alternatively, we may find estimates of the functions Var( ρ(u)) and bias( ρ(u)), u ∈ W , based on the k patterns and integrate these over W . This is the setup chosen for the numerical evaluations presented in Section 4.

Voronoi intensity estimation
In practice, it is often the case that events mainly occur in specific parts of the study region, e.g. that accidents often happen in more crowded streets or on specific parts of a highway, or that trees tend to grow mainly in specific parts of a forest. In other words, there are sharp boundaries between parts with high and low intensities. We argue, similarly to Barr and Schoenberg (2010), that in order not to blur such boundaries, it is preferable to employ an adaptive intensity estimation scheme, which adapts locally to changes in the spatial distribution of the events.
We here choose to focus on a particular kind of adaptive intensity estimator, namely Voronoi intensity estimators. Recalling the Voronoi cells in expression (1), we formally define the Voronoi intensity estimator of the intensity function of X ⊆ S as follows.
Definition 1 For a point process X with intensity function ρ(·), the Voronoi intensity estimator of ρ(u), u ∈ W ⊆ S, |W | > 0, is given by where It should be noted that the points of X which lie outside W may interact with those inside W . Indeed, due to the way we define the Voronoi cells in expression (1), the Voronoi intensity estimator neglects possible edge effects. The Voronoi intensity estimator, which was introduced by Brown (1965) and Ord (1978) in the context of Euclidean spaces, has been considered by e.g. Baddeley (2007); Ogata (2011); Barr and Schoenberg (2010); Van Lieshout (2012). Ebeling and Wiedenmann (1993) have used it to study local spatial concentration of photons, Duyckaerts et al. (1994) and Duyckaerts and Godefroy (2000) have employed it to estimate neuronal density, and it has been applied in the setting of statistical seismology by Ogata (2011) and Baddeley et al. (2015). In the context of linear networks, Okabe and Sugihara (2012) briefly discussed a Voronoi based density estimator, the network Voronoi cell histogram, for the purpose of non-parametric density estimation on linear networks. They further discussed geometric properties of Voronoi tessellations on linear networks. Barr and Schoenberg (2010) focused on the planar case and particular statistical properties.

Resample-smoothing of intensity estimators
Barr and Schoenberg (2010) pointed out that when there are abrupt changes in the intensity, kernel-based estimators may yield substantial bias and high variance, and they showed that the Voronoi estimator can alleviate these problems. Unfortunately, they tend to undersmooth in very dense areas surrounded by nearly empty neighbourhoods. This may be said about adaptive estimators in general; there is a tendency of adapting too much to the particular features of the observed point pattern x, rather than reflecting the features of the intensity function of the underlying point process X. To see how the under-smoothing, i.e. the over accentuating of local features of the Voronoi intensity estimator occurs, note that for a pattern x, if x ∈ x is located in a very dense part then its Voronoi cell becomes small and, consequently, ρ V (u) = 1/|V x | becomes very large for u ∈ V x . A further issue with the Voronoi intensity estimator is that its variance tends to be quite large, thus resulting in quite unreliable estimates.
One may further ask the adequate question whether there are other tessellations which perform better than the Voronoi intensity estimator. Even so, the question then still remains how to explicitly generate a better one. In addition, an advantage of the kernel estimation approach is arguably in that it generates a smoothly varying intensity estimate, at least when using certain kernels, as opposed to the possibly unnatural "jumps" generated by the Voronoi estimator.
As a remedy for these issues, one suggestion is to follow Barr and Schoenberg (2010) by considering the so-called centroidal Voronoi intensity estimator. A further idea, which seems appealing, is to introduce some smoothing procedure for ρ V (·), which would reduce the unnaturally extreme peaks while smoothing out the "jumps". We next propose such a smoothing procedure, which we refer to as resampling-smoothing.
Recall the independent thinning operation in Section 2.2. We will here focus on the simple case where p(u) ≡ p ∈ (0, 1], u ∈ W , which is referred to as pthinning (Chiu et al., 2013, Section 5.1); we identify the case p = 1 with the unthinned process X. From Section 2.2 we have that where we recall the intensity ρ th (·) of the thinned process X p . Hence, dividing by p is exactly what is needed to compensate for the reduced intensity caused by removing points. We exploit this relationship in the fol-lowing way. Given a point pattern x and an estimator ρ(·) of ρ(u), u ∈ W , fix some p ∈ (0, 1] and thin the point pattern m ≥ 1 times, each time with retention probability p. This results in the thinned patterns x 1 p , . . . , x m p , each for which the intensity is estimated. We now let the average of these m estimated intensity functions, divided by p, be reported as the final estimate; note the similarity with the approach considered by Baddeley (2007). The resample-smoothed Voronoi intensity estimator is formally defined as follows.
Definition 2 Consider a point process X ⊆ S with intensity function ρ(·). Given some p ∈ (0, 1] and m ≥ 1, the resample-smoothed Voronoi intensity estimator of ρ(u), u ∈ W ⊆ S, |W | > 0, is given by where Reflecting on the effect of the thinning procedure, for each thinned version we obtain new Voronoi cells and consequently different locations of the jumps in the corresponding intensity estimate ρ V i (·). This is what results in the "smoothing" and it is also the remedy for choosing the specific tiling in a possibly wrong/rigid way. Note also that we in fact simply are considering the average of m different estimates of ρ(·).

Theoretical properties
We next look closer at some statistical properties of resample-smoothed Voronoi intensity estimators. The proofs of all the results presented can be found in the Appendix.
We stress that in the case of the restriction X ∩ W of a point process X to a (bounded) region W = S, the Voronoi cells V x (X, W ) are different than when W = S. Hereby, distributional properties of ρ V p,m (·) may be different depending on how W is chosen.
We start by considering the asymptotic scenario where the number of thinned patterns, m ≥ 1, in the estimator (4) tends to infinity. Note that by the result below, we have that the limit lim m→∞ ρ V p,m (u; X, W ) a.s. exists for a point process X.
Lemma 1 Given fixed p ∈ (0, 1] and k ≥ 1, for any point pattern In turn, we have that lim m→∞ ρ V p,m (u; x, W ) a.s. exists.

Bias
Turning to the first order properties of ρ V p,m (·), we note that Hence, when p = 1 we have preservation of mass, i.e.
Taking expectations on both sides in (5), we obtain i.e., for any m ≥ 1 and p ∈ is unbiased for the estimation of the intensity of X if and only if the original Voronoi intensity estimator is unbiased for the estimation of the intensity of an arbitrary thinning X p . There is unfortunately not much more to be said without explicitly assuming something about the distributional properties of X.
When X is stationary, all Voronoi cells have the same distribution and we may speak of the typical Voronoi for any x ∈ X; here θ −x denotes the transformation/shift such that x is taken to the origin o ∈ S. In particular, we have that ρ V p,m (u) and ρ V p,m (v) have the same distribution for any u, v ∈ S and it turns out that unbiasedness holds.
Theorem 1 For a stationary point process X ⊆ W = S with intensity function ρ > 0, the resample-smoothed Voronoi intensity estimator (4) is unbiased for any choice of p ∈ (0, 1] and m ≥ 1.
As our main interest lies in estimating non-constant intensity functions, stationary models are of limited practical interest. We next turn to inhomogeneous Poisson processes in Euclidean spaces.
Theorem 2 Let X ⊆ W = S = R d , d ≥ 1, be a Poisson process with intensity function ρ(u), u ∈ R d , which satisfies the Lipschitz condition that for some µ u > 0, u, ε) and ε > 0 sufficiently small; B(u, ε) denotes the Euclidean ball with centre u and radius ε > 0. Denoting by C u (X) the Voronoi cell containing u ∈ R d , assume further that m κ : for some C > 0 that depends on the intensity. The right hand side tends to 0 as the intensity tends to infinity.
Remark 1 The moment condition, and the Lipschitz assumption on ρ can be relaxed to weaker versions and still have the left hand side go to 0, but the rate would be different. It has been conjectured that the size of the typical cell of a homogeneous Poisson process follows a (generalised) Gamma distribution (see e.g. (Chiu et al., 2013)); note in particular Lemma 2 below. The moment condition in the statement of the above result, i.e. m κ < ∞, would be satisfied if this is indeed the case. Under such a conjectured distribution, Barr and Schoenberg (2010) showed that in the planar case the original Voronoi intensity estimator is ratio-unbiased for a given class of intensity functions.

Variance
Regarding the variance of ρ V p,m (u), the next result shows that in an infinite sized study region W , by thinning as much as possible we also obtain a variance of the resample-smoothed Voronoi estimator which is close to 0. In addition, letting m → ∞ has the same effect. Hence, for cases where the estimator is unbiased we should, in theory, smooth as much as possible, in combination with choosing m as large as possible. The problem in practice, however, is that point patterns are sampled in bounded regions W and we have to resort to finite m. This motivates the data-driven approaches suggested in Section 3.2.
Turning to the stationary case, from the proof of Theorem 1 (see the Appendix) we have that the pthinning X p of a stationary point process X with intensity ρ > 0 is again stationary, but with intensity pρ. For X p , the distributionP p (·) of the size of the cell that covers u is the same for any u ∈ S and it is given by (see Last (2010, Section 8) and Schneider and Weil (2008, Theorem 10.4 where P |Vo(Xp)| (·) is the distribution of the typical cell size. Besides giving us the unbiasedness in Theorem 1, i.e.
the relationship (6) further yields Through the proof of Theorem 3 (see the Appendix) we obtain that the variance of ρ V p,m (u) is given by where Corr denotes correlation. Unfortunately, we cannot get much further in the general setup; the problem lies in that P |Vo| (·) typically is not known. There is, however, one particular case where we can say a bit more and that is for Poisson processes on R.
Lemma 2 For a Poisson process on R with intensity ρ > 0, for any p ∈ (0, 1] and m ≥ 1 the typical cell size of X p follows an Erlang/Gamma distribution with shape and rate parameters 2 and 2pρ, respectively. Hereby, Empirically, we have consistently observed that for a fixed m, the variance of ρ V p,m (u) decreases with p, for u ∈ W located a given distance from the boundary of W ⊆ S. As this is partly supported by Theorem 3, we are lead to the following conjecture.
Conjecture 1 For an arbitrary point process X ⊆ S and any m, the variance of ρ V p,m (u) is a decreasing function of p ∈ (0, 1]. In particular, if ρ V p,m (u) is unbiased, this means that MISE is decreasing with p.

Choosing the smoothing parameters
When using the resample-smoothed Voronoi intensity estimator (4) in practice, one needs to specify the smoothing parameters m ≥ 1 and p ∈ (0, 1] prior to finding the intensity estimate. We next discuss how to obtain proper choices for m and p.

Choosing the number of thinnings
Lemma 1 tells us that for a fixed p ∈ (0, 1], k ≥ 1 and any point pattern The question that remains, however, is for which m ≥ 1 we have that | ρ V p,m (u; x, W ) − ρ V p,m+k (u; x, W )| is sufficiently small. In our numerical evaluations in Section 4 we illustrate that the estimated bias and variance of ρ V p,m (u) do not change significantly for m ≥ 200. Hence, we propose to fix m = 200 and then proceed by finding a proper choice for p ∈ (0, 1].

Choosing retention probability
The selection of p ∈ (0, 1] is clearly the more delicate matter here; essentially we are faced with problems similar to those of choosing bandwidths in kernel estimation. Through our numerical evaluations (see Section 4) we have found that the choice p ∈ [0.1, 0.3] always seems to generate the best intensity estimates in the sense that the variance-bias-tradeoff is taken into account by keeping both the bias and variance relatively small. The lower limit 0.1 is based on our observation that the removal of more than 90% of the points per thinning may generate too flat estimates in some cases; from the looks of Section 4, Theorem 3 and Conjecture 1 it may seem that the smaller the p, the better the estimate. We refer to the choice m = 200 and p ∈ [0.1, 0.3] as our rule-of-thumb.
If one prefers a data-driven approach over the ruleof-thumb, we also propose a cross-validation approach to select p. Recalling a previous comment in Section 2.2 about independent thinnings yielding approximate Poissonian distributional properties of the resulting processes, a natural approach to choosing p when the number of thinned patterns, m, is fixed is to consider Poisson process likelihood cross-validation. This method has a long history in the literature of point processes and has e.g. been frequently used for bandwidth selection in kernel-based estimation (Silverman, 1986;Loader, 1999). More specifically, given a point pattern x = {x 1 , . . . , x n } ⊆ W ⊆ S and some fixed m ≥ 1, we choose the corresponding resampling/retention probability as a maximiser of the cross-validation criterion Note that ρ V p,m (·; x \ {x i }, W ) is the leave-one-out version of ρ V p,m (·; x, W ), i.e. the resample-smoothed Voronoi intensity estimator based on the reduced sample x \ {x i }. As the computation of CV (p), p ∈ (0, 1], can be quite computationally costly, in practice we may exclude the integral term in its expression since it approximately equals the number of points in the pattern. Moreover, in practice we calculate CV (p j ), j = 1, . . . , k, 0 < p j−1 < p j ≤ 1, sequentially by first generating X i p k and then iteratively generating X i pj−1 = (X i pj ) pj−1/pj , i = 1 . . . , m, j = 2, . . . , k. Note that for small m the graph of CV (p) may not be smooth and might contain local extrema.
Finally, if the value obtained for p through the crossvalidation would deviate too much from the rule-ofthumb, we advise to proceed with the rule-of-thumb; see the log-Gaussian Cox process example in Section 4 for a situation where this occurs.

Large scale data and sparsity
In general, when the number of events, n, of an observed point pattern x = {x 1 , . . . , x n } is very large, it is often natural to consider an adaptive intensity estimator as the scales of intensity likely vary a lot.
It may not be computationally feasible to compute ρ V p,m (·), p ∈ (0, 1], for an arbitrary m ≥ 1 (or any other intensity estimator for that matter). An alternative way of exploiting the proposed setup is to consider ρ V p0,m (·) for some p 0 ∈ [0.1, 0.3] and m = 1. This means that we would introduce sparsity by only having to generate Voronoi cells for 10-30% of the original number of points. The results in Section 4 indicate how good an estimate one would typically obtain. Moreover, if the computation of ρ V p0,1 (·) is reasonably quick, one could generate a further estimate ρ V p0,1 (·) and average over these to obtain ρ V p0,2 (·). One could then continue like this in a stepwise fashion, given a total computation timeframe.

Numerical evaluations
As previously pointed out, we evaluate our intensity estimation approach numerically, which we choose to do in the Euclidean setting.
In our simulation study, we consider four different types of models with varying degrees of variation in intensity and spatial interaction; clustering, spatial randomness and regularity. For each model we use 500 realisations on W = [0, 1] 2 to generate numerical estimates of relevant quantities such as bias, variance, Integrated Variance (IV), Integrated Square Bias (ISB) and Integrated Absolute Bias (IAB) for ρ V p,m (u), u ∈ W ; recall that Mean Integrated Square Error (MISE) is obtained as the sum of IV and ISB. To carry out the analysis, we make use of the R package spatstat . For each model considered, in the Appendix, we provide plots of the estimated bias and variance for m = 200 and a range of values of p ∈ (0, 1], together with the estimated biases and variances obtained through kernel estimation.
The overall conclusion is that we clearly reduce the estimation errors by resample-smoothing the Voronoi intensity estimator. Moreover, the cross-validation approach to selecting p on average yields slightly poorer intensity estimates than the rule-of-thumb, in particular if the model is clustered.

Homogeneous Poisson process
We here consider a homogeneous Poisson process X ⊆ W = [0, 1] 2 with intensity ρ = 60. Table 1 provides estimates of IAB, ISB and IV for ρ V p,m (u), u ∈ W , m = 200, 300, 400, p = 0.1, . . . , 1; recall that we use 500 realisations of X. Indeed, the bias seems fairly stable over the range of values for p and the variance is clearly decreasing with p; choosing p according to the rule-ofthumb keeps MISE small. For illustrational purposes, in Figure 1 we provide estimation error plots for one of the realisations, for p = 0.2 and p = 1 with m = 200. One can clearly see the gain of the resample-smoothing; note that the under-estimation occurs in the empty regions. In addition, in the Appendix we provide plots of the estimated bias and variance for p = 0.1, 0.3, 0.5, 0.7, 0.9, 1 and m = 200, and they essentially confirm what has been observed in Table 1.
Turning to the cross-validation approach to selecting p, with m = 200, based on 100 realisations of the model we obtain IAB = 4.9, ISB = 30.3 and IV = 255 which are in the range of what one obtains when p is fixed in (0.1, 0.3). In Table 2 we further provide the 100 selected values for p and we see that the majority of them fall within the range of our rule-of-thumb. Comparing with kernel estimation under uniform, or global, edge correction, using Poisson likelihood crossvalidation (Loader, 1999;Baddeley et al., 2015) to select the bandwidth, we obtain IAB = 0.24, ISB = 0.11 and IV = 126.05. By instead employing the bandwidth selection method of Cronie and Van Lieshout (2018), we obtain IAB = 0.87, ISB = 1.12 and IV = 688.25.

Inhomogeneous Poisson process
More interestingly, we next consider 500 realisations of an inhomogeneous Poisson process X ⊆ W = [0, 1] 2 with intensity ρ(x, y) = |10 + 90 sin(16x)|; the expected total point count is 58.6. Table 3 provides estimates of IAB, ISB and IV for ρ V p,m (u), u ∈ W , m = 200, 300, 400, p = 0.1, . . . , 1. Moreover, in Figure 2 we provide estimation error plots for one of the realisations, for p = 0.2 and p = 1 with m = 200, and in the Appendix, we provide plots of the estimated bias and variance for p = 0.1, 0.3, 0.5, 0.7, 0.9, 1 and m = 200. Turning to the cross-validation approach to selecting p, based on m = 200 and 100 realisations of the model, we obtain IAB = 25.5, ISB = 885.2 and IV = 218.5, with the majority of the selected p's coinciding with the rule-of-thumb (see Table 4).
Hence, the conclusions here are essentially the same as for the homogeneous Poisson process in Section 4.1, with the main difference arguably being that inhomogeneity enforces slightly harder thinning in the crossvalidation.
The cross-validation approach to selecting p, based on m = 200 and 100 realisations of the model, yields IAB = 28.4, ISB = 1118.2 and IV = 17207.5, which may be comparable to the choice p ≈ 0.5. In Table 6 we further provide the 100 selected values for p. The phenomenon that too little smoothing tends to be applied (p is mainly chosen large) is not extremely surprising; as our cross-validation approach is based on a Poisson process likelihood function, it treats a realisation x of X as a realisation of a Poisson process which has the corresponding realisation of the driving (random) intensity field as intensity function. In other words, it tries to perform state estimation, i.e. it tries to reconstruct each realisation of the driving intensity field through x. This phenomenon, and that the Poisson process likelihood cross-validation approach is not performing well for clustered inhomogeneous point processes, has previously been observed in the context of kernel intensity estimation (Cronie and Van Lieshout, 2018). Hence, if one suspects that there is clustering in addition to inhomogeneity, or if the cross-validation generates large values for p, then it is wiser to stick with the proposed rule-of-thumb, p ∈ [0.1, 0.3]. In fact, cross-validationgenerated deviations from the rule-of-thumb may be seen as a possible indication of clustering or inhibition. Comparing with kernel estimation under uniform, or global, edge correction, using Poisson likelihood crossvalidation (Loader, 1999;Baddeley et al., 2015) to select the bandwidth, we obtain IAB = 27.75, ISB = 1031.03 and IV = 9952.85. By instead employing the bandwidth selection method of Cronie and Van Lieshout (2018), we obtain IAB = 28.97, ISB = 1117.94 and IV = 3856.79.

Thinned simple sequential inhibition point process
To study inhomogeneity in combination with inhibition, we consider a simple sequential inhibition point process in W = [0, 1] 2 with a total point count of 450 and inhibition distance 0.3, which we thin according the retention probability function p(x, y) This results in an inhomogeneous point process with intensity ρ(x, y) = 450p(x, y), which yields an expected total point count of 53.6. Table 7 provides estimates of IAB, ISB and IV for ρ V p,m (u), u ∈ W , m = 200, 300, 400, p = 0.1, . . . , 1. Just as for the previous models, we argue that p should be chosen within the range of the rule-of-thumb.
In Figure 4 we provide estimation error plots for one of the realisations, for p = 0.2 and p = 1 with m = 200. Plots of the estimated bias and variance, for p = 0.1, 0.3, 0.5, 0.7, 0.9, 1 and m = 200, can be found in the Appendix. Also here the improvements caused by the resample-smoothing are visually clear.
The cross-validation approach to selecting p based on m = 200 and 100 realisations of the model yields IAB = 25.1, ISB = 932.9 and IV = 595.2, which is comparable to choosing p ≈ 0.5. Moreover, Table 8 lists the selected values for p and we see that they tend to be either very large or very small. It thus seems that approximately half of the time the cross-validation performs as it should do and approximately half of the time it chooses p too large.

Data analysis
We next apply our proposed intensity estimator (4) to two real datasets, in two types of spaces. We first visit   a linear network dataset of traffic accidents in an area of Houston, USA, and then a planar dataset of spatial locations of Finish pines.

Houston motor vehicle traffic accidents
The dataset consists of motor vehicle traffic accident in a given area of Houston, USA, during the month of April 1999. The linear network L describing the road network in question (see Figure 5) has a total length of 708, 301.7 feet, 187 vertices, i.e. crossings/intersections, with a maximum vertex degree of 4, and 253 line segments, i.e. pieces of streets connecting the intersections. Figure 5 (left) shows the reference points of the 249 accidents over the street network. The data have been collected by individual police departments in the Houston metropolitan area and compiled by the Texas Department of Public Safety. The compiled data have been obtained by the Houston-Galveston Area Council and then geocoded by N. Levine. Between 1999 and2001, in the eight-county region considered, there were 252, 241 serious accidents, with an average of 84, 080 per year. From these accidents, 1, 882 were person related. See Levine (2006Levine ( , 2009 for details.
In Figure 5 (right) we also provide the resamplesmoothed Voronoi intensity estimate obtained for m = 200 and p = 0.15. The specific choice p = 0.15 has been motivated by the rule-of-thumb p ∈ [0.1, 0.3] and Table 9, which shows the selected values for p ∈ (0, 1] obtained by carrying out cross-validation for the sequence m = 100, 110, . . . , 200. We see that most of the selected values for p are given by 0.15.
Visually, there seems to be a good correspondence between the observed pattern and the obtained estimate. Note that for bigger values of p, in the right panel of Figure 5 we would have obtained more significant blobs in the parts corresponding to the dense parts in the left panel of Figure 5.

Finish pines
The dataset, which consists of the locations of 126 pine saplings in a Finnish forest, within a rectangular window W = [−5, 5] × [−8, 2] (metres), can be found in the R package spatstat . It has been recorded by S. Kellomaki, Faculty of Forestry, University of Joensuu, Finland, and further processed by A. Penttinen, Department of Statistics, University of Jyväskylä, Finland.
In Figure 6 we illustrate the estimate ρ V p,m (u), u ∈ W , m = 200, for p = 0.2 and p = 0.5, together with the locations of the saplings. We further provide the crossvalidation results for the sequence m = 100, 110, . . . , 200 in Table 10; it suggests the choice p = 0.5. We argue that p = 0.2 is the preferable choice since it better respects the global features of the data.

Discussion and future work
We have proposed a general approach for resampling, or additional smoothing, of Voronoi intensity estimators. It is based on averaging over intensity estimators generated by a set of thinned samples. We believe that   (2012)), centred at the point, each time we thin the pattern we change the support of that kernel. Having averaged over the thinned estimators, in essence we end up using an "average" support for each such kernel.
It may be noted that we alternatively may employ some retention probability function p(u), u ∈ W , other than p(u) ≡ p ∈ (0, 1]. It is, however, not clear what the benefits of such a change would be, other than possibly decreasing the computational time. Also, how to make a good choice for the function p(·) is not evident.

Future work and extensions
Regarding future work, it would also be relevant and interesting to study the proposed setup when we replace the Voronoi tessellation by some other tessellation, generated by the point pattern in question. One such example is provided by Delaunay tessellations, as they give rise to more tractable distributional properties.
Below follow some further possible extensions.

Sequential resample-smoothing
Since choosing the smoothing parameter p ∈ (0, 1] according to the cross-validation approach in Section 3.2 can be quite computationally demanding, and thereby also time consuming, we propose an alternative and simpler version of the estimator in (4).
The challenge here is clearly how to choose the sequence p m ; we have seen that more weight clearly should be put on smaller retention probability values so an equally spaced grid over (0, 1] may not be the best choice. By proposing some stepwise sequencing of (0, 1], where we at each step m ≥ 1 obtain some p m = (p 1 , . . . , p m ) ∈ (0, 1] m , one could keep going until sup

Edge correction in the linear network case
Although we have neglected edge effects here, it still seems that the smoothing takes care of a significant part of the edge effects (Chiu et al., 2013). But, as noted in the data analysis, even after applying the smoothing there may be a need for edge correction Cronie and Särkkä, 2011). In the case where X is sampled on L, and is a subset of a process on a larger network, in which L is a sub-network, edge effects come into play since the points closest to the boundary have their Voronoi cells cut off through the mapping/sampling of L and the points. In Definition 4 below we propose an edge correction approach, which could be viewed as a version of Ripley's edge correction idea.
Definition 4 Given a point pattern x on a linear network L, for each boundary point u ∈ ∂L of L ⊆ S, first find its closest neighbour x u = arg min x∈x d (u, x) in terms of the shortest path distance d(·, ·). If β u = min x∈x\{xu} d(x u , x)/2 − d(u, x u ) > 0, extend L by a new (set of) non-overlapping edge(s) connected to the node u, with total length β u . Denote the resulting extended network by L(x) and treat x as a linear network point pattern on/restricted to L(x). The edge corrected resample-smoothed Voronoi intensity estimate is given by ρ V p,m (u; x, L) = ρ V p,m (u; x, L(x)) for u ∈ W . Note that p = 1 results in an edge corrected version of ρ V (·).     Fig. 12 Estimated variance for ρ V p,m (u), u ∈ W = [0, 1] 2 , m = 200, and kernel estimators, based on 500 realisations of a log-Gaussian Cox process X ⊆ W = [0, 1] 2 where the driving Gaussian random field has mean function (x, y) → log(40| sin(20x)|) and covariance function ((x 1 , y 1 ), (x 2 , y 2 )) → 2 exp{− (x 1 , y 1 ) − (x 2 , y 2 ) /0.1}. From top-left to bottom-right: ρ V p,m (u) with p = 0.1, 0.3, 0.5, 0.7, 0.9, 1; kernel estimators with bandwidths selected using Poisson likelihood cross-validation Loader, 1999) (left) and the method of Cronie and Van Lieshout (2018) (right) are on the last row.