We congratulate Professors Eckardt and Moradi on their fine paper in which they survey the state of the art of summary statistics for marked point processes. The authors focus on non-stationary point processes and distinguish between discrete, real-valued and object-valued marks. The case of point processes in an Euclidean space with discrete marks is by far the most well studied. Here, centre stage is taken by the cross K, H and J statistics. After a brief detour into the frequency domain, the authors consider point processes on linear networks. The concept of intensity-reweighted moment pseudo-stationarity (IRMPS) is introduced and used to define counterparts of the cross statistics just mentioned.

From a measure-theoretic point of view, point processes are counting measures. It is therefore interesting to note that the cross K, H and J statistics have been generalised to multivariate random measures as well (Van Lieshout 2018). The basic ideas are to replace \(\lambda ^{(m)}\) in (3) by \(p_m\), the m-point coverage function, and to let the role of the generating functional be taken over by the Laplace transform. For example,

$$\begin{aligned} H_{ij}^{\textrm{inhom}}(r) = 1 - {\mathbb {E}}^{(0,i)} \exp \left[ - \int _{B(0,r)} \frac{1}{p_1(x,j)} \text {d}\Psi _j(x) \right] \end{aligned}$$

under coverage-reweighted moment stationarity assumptions that are satisfied, for instance, by the lognormal random field models of Ballani et al. (2012) for which

$$\begin{aligned} \Psi _i(B) = \int _B \Gamma _i (x) {\textbf{1}} \{ x \in X_i \} \text {d}x. \end{aligned}$$

Here, \(\Gamma _i = e^{Z_i}\) is lognormally distributed, sufficiently smooth and such that the covariance functions of \(Z = (Z_i)_i\) are translation invariant, and \((X_i)_i\) is a non-trivial multivariate stationary random closed set. Note that \(H_{ij}^{\textrm{inhom}}\) is related to the hitting intensity proposed by Stoyan and Ohser (1982) for the stationary case. In practice, such extensions are useful when complete mapping of events is either too costly or too impractical but smoothed quadrat counts are available. For an application to the quantification of biodiversity in a tropical rain forest, we refer to (De Jongh and Van Lieshout 2022; Van Lieshout 2018).

Point processes with real-valued marks are less well studied. For stationary point processes, a few summary statistics are available such as Stoyan’s mark correlation function \(k_{mm}\) or mark variogram. The construction of inhomogeneous versions appears to be an open problem and is not discussed at length in the paper. Do the authors believe that, say, \(k_{mm}\) is inherently tied to stationarity? Or do they see a way forward?

A second question concerns models. In the plane, a wide range of models is available that are either stationary or intensity-reweighted moment stationary, and in many cases the summary statistics can be calculated explicitly. For point processes on linear networks, the situation seems to be somewhat different. Can the authors comment on this point. In particular, are there flexible classes of models known that satisfy the pseudo-stationarity or IRMPS assumptions? For which models can (20)–(31) be calculated?

The paper focuses on methodology. In practice, one would need to estimate the summary statistics. For example, consider \(H_{ij}^{\textrm{inhom}}\) for \(i\ne j\). Then,

$$\begin{aligned} \widehat{H_{ij}^{\textrm{inhom}}}(r) = \left( \sum _{x_i \in X_i \cap W_{\ominus r}} \frac{1}{\lambda _i(x_i)} \right) ^{-1} \sum _{x_i \in X_i \cap W_{\ominus r}} \frac{1}{\lambda _i(x_i)} \prod _{x_j \in X_j \cap B(x_i,r)} \bigg [1- \frac{\inf {\lambda _j}}{\lambda _j(x_j)} \bigg ] \end{aligned}$$
(1)

is a ratio-unbiased estimator based on the Hamilton principle (Stoyan and Stoyan 2000). Since \(\lambda _i\) and \(\lambda _j\), the intensity functions of \(X_i\) and \(X_j\), are unknown, they must be estimated. By far the most widely used technique to do so is kernel estimation (Diggle 1985). Specifically, for \(x\in W\), the observation window, set

$$\begin{aligned} {{\widehat{\lambda }}}_i(x; h, X_i) = \frac{1}{h^d} \sum _{x_i\in X_i\cap W} w(x, x_i, h) \, \kappa \left( \frac{x - x_i}{h}\right) , \end{aligned}$$
(2)

where \(w(x, x_i, h)\) is some edge correction factor and \(\kappa \) a kernel, for example, the Gaussian one. The crucial parameter is the bandwidth \(h>0\) that rules the amount of smoothing applied. Leave-one-out cross-validation (Loader 1999) assumes that \(X_i\) is an inhomogeneous Poisson process. Its leave-one-out cross-validation log likelihood reads

$$\begin{aligned} L(h; X_i)= \sum _{x_i\in X_i \cap W} \log {{\widehat{\lambda }}}_i(x_i; h, X_i \setminus \{ x_i \}) - \int _W {{\widehat{\lambda }}}_i( u; h, X_i) \, \text {d}u, \end{aligned}$$
(3)

which is then maximised over h to select the bandwidth. An alternative is the method of Cronie and Van Lieshout (2018), which minimises the difference \( F(h; X_i) = \left| T(h; X_i) - |W| \right| \) between the volume of W and the test function

$$\begin{aligned} T(h; X_i) = \sum _{x_i\in X_i\cap W} \frac{1}{{{\widehat{\lambda }}}_i(x_i;h, X_i)}, \end{aligned}$$
(4)

when \(X_i \cap W\) is non-empty and |W| otherwise. In both cases, conditions must be imposed to ensure that the function \({{\widehat{\lambda }}}_i\) is strictly positive.

When the density of points is highly spatially varying, using a fixed bandwidth throughout the window may lead to oversmoothing in dense regions and spurious hot spots in sparser ones. A solution is to apply adaptive kernel estimators (Abramson 1982; Davies et al. 2018; Van Lieshout 2021) defined as

$$\begin{aligned} {{\widehat{\lambda }}}_i^A(x; h, X_i) = \sum _{x_i \in X_i \cap W} \frac{w(x,x_i,h)}{c(x_i)^d \, h^d} \kappa \left( \frac{x-x_i}{h\, c(x_i) } \right) \end{aligned}$$
(5)

where

$$\begin{aligned} c(y) = \left( \frac{\lambda _i(y)}{ \prod _{x_i \in X_i\cap W} \lambda _i(x_i )^{1/N(X_i \cap W)} } \right) ^{-1/2}. \end{aligned}$$
(6)

Thus, \(h\, c(y)\) is larger when \(\lambda _i(y)\) is smaller. A two-step approach to bandwidth selection would run as follows. In the first step, given a realisation of \(X_i\), (3) or (4) is used to select a global bandwidth \(h_g\). This \(h_g\) is then used to calculate an edge-corrected pilot estimate \({{\widehat{\lambda }}}_p\), which in turn is plugged into (6) to obtain \({{\widehat{c}}}(y)\). In the second step, (3) or (4) is applied to \({{\widehat{\lambda }}}_i^A\) and optimised over h to find the best local bandwidth \(h_a\) with which to calculate the final estimate (5). From a computational point of view, (3) is more costly than (4). The practical efficacy of both techniques for a range of point process models is investigated by Van Lieshout (2023).