1 Introduction

Hierarchical relationships or interactions, where a plant species affects the locations or intensity of another species but not vice versa, often occur in ecological communities (e.g. Dieckmann et al. 2000). An example of such a hierarchical relationship is that proximity of large trees affects the intensity of seedling either positively, e.g. by protecting against wind, or negatively by giving too much shade. Mathematically, we can describe such plant communities by two point processes, \(Y\) and \(X\), where one (\(X\)) is affecting the other (\(Y\)) but not vice versa.

The hierarchical interaction assumption affects the inference for \(Y\) and \(X\) greatly since \(X\) can be modeled independently of \(Y\) and \(Y\) is modeled conditionally on \(X\). A realization of the point process \(X\) acts then as a source of heterogeneity in the distribution of \(Y\). Högmander and Särkkä (1999) modeled interaction between two territorial ant species using Gibbs point processes under such an assumption. A similar hierarchical Gibbs point process approach was used in Grabarnik and Särkkä (2009) and Genet et al. (2014). Furthermore, Illian et al. (2009) modeled the spatial pattern of resprouter species (\(Y\)) given the locations of seeders (\(X\)) in a hierarchical set-up having an inhomogeneous Poisson process as a model for the resprouters.

Here, we model the intensity of new seedlings in a spruce-dominated uneven-aged (boreal) forest given the locations and diameters at breast height (dbh) of large trees. Thus, our \(X\) process of large trees is a marked point process, where the mark of a tree is the dbh. The data consist of 14 sample plots from an experiment of continuous cover forestry involving single-tree selection in four nearby areas in Southern Finland (Fig. 1). The system relies on the natural emergence of new seedlings, and a continuous recruitment is necessary for long-term sustainability in a wide sense (e.g. Eerikäinen et al. 2014; Kuusinen et al. 2019). While a sufficient number of seedlings is necessary for the success of regeneration, our focus here is on the spatial distribution of the seedlings within the plots, and the effect of large trees on it.

Like in the resprouter and seeder case above, an inhomogeneous Poisson process would be a reasonable model since the effect of large trees could be added in the model as an explanatory variable. However, already visual inspection of the patterns of seedlings \(\varvec{y}\) indicates that the patterns tend to be rather clustered, beyond the clustering that may be explained by the patterns of large trees \(\varvec{x}\). Due to such unexplained clustering, a log Gaussian Cox process (LGCP) (Møller et al. 1998) is a more appropriate model for the conditional point process of seedlings given large trees.

To model the effect of the large trees \(X\), we assume that each tree \(x\in X\) emits a signal or impulse that describes the effect of the tree on its neighborhood. We assume that this effect decreases with the distance from the tree \(x\). In general, the size of the effect as well as the range of the effect could depend on the size or other properties of the tree, e.g. its dbh. Because we do not have precise a priori information on the size and range of the effects, we use parametric signals similar to the ones found in the literature (Adler 1996; Pommerening et al. 2011; Häbel et al. 2019; Pommerening and Grabarnik 2019). The individual signals are then superimposed to form an influence field, which describes the overall influence of the points of \(X\) on any location s in the observation window W. These kinds of models have been used to model, for example, effect of neighboring individuals on the growth of a subject tree, survival of seedlings and ground vegetation in different contexts (e.g. Wu et al. 1985; Miina and Pukkala 2002; Pommerening et al. 2011; Häbel et al. 2019; Kuuluvainen and Pukkala 1989; Kühlmann-Berenzon et al. 2005).

Our idea here is to include the superimposed individual signals in the log intensity function of the LGCP model. Using parametric models for the signals, the intensity of the LGCP is a non-linear function of the model parameters. According to Pommerening and Sánchez Meador (2018) the signals are aggregated additively or multiplicatively and there is no evidence to prefer either of these ways. We follow Pommerening et al. (2011) and Illian et al. (2009) and aggregate the signals additively.

Fig. 1
figure 1

Trees with dbh at least 7 cm (open circles with radii relative to the dbh of the tree) and new seedlings (red crosses) in areas of size \(40\text { m}\times 40\text { m}\). The headings give abbreviations for the plot locations and numbers

Our Bayesian inference algorithm is based on Markov chain Monte Carlo (MCMC) sampling for parameters, and a Laplace approximation is used for the latent random field of the LGCP to avoid high-dimensional MCMC sampling. Laplace approximations are widely used for inference of latent Gaussian fields, for instance within the popular INLA method (Rue et al. 2009). However, in contrast to INLA, MCMC is more robust, and can cope with multimodal parameter posteriors.

The large tree process typically extends beyond the borders of the sample plot. However, we have observed the process in the same observation window as the seedlings. Thus, the influence field computed only from the observed trees is weaker near the borders than the field computed from the fully observed large tree process would be. In order to account for the unobserved trees outside the observation window, we compute the influence field using an edge correction method similar to that suggested in Kühlmann-Berenzon et al. (2005): the unobserved trees are imputed based on the assumption that the locations of large trees are distributed according to a Poisson process. This rather simple edge correction method can be efficiently implemented within the Bayesian inference, in contrast to alternatives where the locations (and sizes) of unobserved large trees would be included in the Bayesian inference as unknowns and simulated within the MCMC approach.

The rest of the paper is organized as follows. In Sect. 2, we give some examples of influence kernels and introduce the conditional LGCP model. The Bayesian estimation approach, including the edge correction, is described in Sect. 3. Section 4 presents the results of a simulation experiment that was conducted to explore the performance of the proposed estimation and edge correction methods. Finally, the forestry data are described in further detail and studied in Sect. 5. Section 6 is for discussion.

2 Conditional log Gaussian Cox process model

Let us have a bivariate point process in \({\mathbb {R}}^2\) consisting of an unmarked point process \(Y\) and an unmarked or a marked point process \(X\). Let us further assume that we have observed a realization of process \(Y\), namely \(\varvec{y}=\{y_i\}\), in a bounded window \(W\subset {\mathbb {R}}^2\). Our primary interest is in the spatial pattern \(\varvec{y}\) which is affected by a realization \(\varvec{x}\) of the spatial point process \(X\). The spatial pattern \(\varvec{x}\) can consist only of the point locations \(x_j\) or of the point locations and marks, \([x_j,m_j]\), if some characteristics (marks) \(m_j\) of the points \(x_j\) are available. In our forestry application, \(\varvec{y}\) consists of the locations of seedlings, while \(\varvec{x}\) is the pattern of locations and dbh’s of large trees.

In our approach, the effect of \(\varvec{x}\) on \(\varvec{y}\) is modeled using the influence kernels around the points of \(\varvec{x}\) that are explained in Sect. 2.1. To account for the clustering in the pattern \(\varvec{y}\) not explained by \(\varvec{x}\), the LGCP model is proposed and defined in Sect. 2.2. Replicated point patterns are discussed in Sect. 2.3.

2.1 Influence kernels and influence field

We assume that each point \([x_j,m_j]\) of the process \(X\) introduces an influence kernel around its location. We focus on isotropic influence kernels of the form \(c(h; m_j, \varvec{\theta }_I)\), where \(h=\Vert s-x_j\Vert \) is the distance between the location s of interest and \(x_j\). Many kernels have been suggested in the literature for different applications (e.g. Adler 1996; Illian et al. 2008; Pommerening et al. 2011; Pommerening and Maleki 2014; Schneider et al. 2006). We used a mark independent Gaussian kernel

$$\begin{aligned} c(h; \theta ) = \exp \left( -(h/\theta )^2\right) , \end{aligned}$$
(1)

where \(\theta >0\) is an unknown influence range parameter. Here the influence of a point gradually decreases with the distance from the point.

A mark dependent generalization of (1) is given by

$$\begin{aligned} c(h, m; \varvec{\theta }_I) = m^{\alpha } \exp \left( - \left( \frac{h}{\theta m^{\delta }} \right) ^2 \right) \end{aligned}$$
(2)

with \(\varvec{\theta }_I=(\theta , \delta , \alpha )\), where \(\theta > 0\), \(\delta > 0\), and \(\alpha \ge 0\). If \(\alpha = 0\), the mark affects only the range of influence and if \(\alpha > 0\), it affects both the range and the strength (e.g. Pommerening et al. 2011).

The influence field of the process \(X\) can then be defined as a superposition of the individual influence kernels,

$$\begin{aligned} C(s; \varvec{\theta }_I, X) = \sum _{[x_j,m_j] \in X} c(\Vert s-x_j\Vert , m_j; \varvec{\theta }_I). \end{aligned}$$

2.2 Conditional model

Since \(\varvec{y}\) is affected by \(\varvec{x}\), we introduce a conditional point process model for \(\varvec{y}\) given \(X=\varvec{x}\), where the intensity of \(Y\) is affected by the influence field of \(\varvec{x}\). This conditional model is a LGCP with the intensity

$$\begin{aligned} \varLambda (s; \varvec{\beta }, \varvec{\theta }_I, \varvec{x}, Z) = \exp ( \beta _0 + \beta _1 C(s; \varvec{\theta }_I, \varvec{x}) + Z(s) ), \end{aligned}$$
(3)

where \(C(s; \varvec{\theta }_I, \varvec{x})\) is a parametric influence field, \(\varvec{\beta }=(\beta _{0}, \beta _1)\) and the unknown coefficients \(\beta _0\in \mathbb {R}\) and \(\beta _1\in \mathbb {R}\) are the intercept and the strength of the influence field, respectively. If \(\beta _1<0\), \(\varvec{x}\) affects the intensity of \(Y\) negatively and the influence field \(C(s;\varvec{\theta }_I, \varvec{x})\) can be interpreted as a thinning of the LGCP process with intensity \(\varLambda (s) = \exp (\beta _0 + Z(s))\). If, however, \(\beta _1>0\), \(\varvec{x}\) has a positive effect on the intensity of \(Y\) and there are more points of \(Y\) in areas with a high value of \(C(s;\varvec{\theta }_I, \varvec{x})\). Furthermore, \(Z := \{Z(s): s\in {\mathbb {R}}^2\}\) is a zero-mean stationary Gaussian random field with a covariance function \(C_Z(r; \varvec{\theta }_Z)\) and independent of the influence field. In our application below, we use the Matérn covariance function

$$\begin{aligned} C_Z(r; \varvec{\theta }_Z, \nu ) = \sigma _Z^{2} \frac{2^{1-\nu }}{\varGamma (\nu )} \left( \sqrt{2\nu }\frac{r}{\rho _Z}\right) ^{\nu }K_{\nu } \left( \sqrt{2\nu }\frac{r}{\rho _Z}\right) , \quad r>0, \end{aligned}$$
(4)

with the smoothness parameter \(\nu =2\) and \(\varvec{\theta }_Z = (\sigma _Z^2, \rho _Z)\), where \(\sigma _Z^2\) and \(\rho _Z\) are the variance and range parameters, respectively, and \(K_\nu \) is the modified Bessel function of the second kind (e.g. Cressie 1993; Chilés and Delfiner 1999; Banerjee et al. 2004). The choice \(\nu =2\) was made since we expect that the unobserved environmental conditions that affect the clustering of \(\varvec{y}\) in our application vary rather smoothly and since it is computationally convenient (Lindgren et al. 2011).

2.3 Replicates

Assume that we have several independent replicated point patterns \(\varvec{y}_k\), \(k = 1, \dots , N\), from the conditional distribution of the point process \(Y\) given \(X=\varvec{x}_k\), \(k = 1, \dots , N\). Conditionally on \(X=\varvec{x}_k\), the model for \(\varvec{y}_k\) is a LGCP with the intensity \(\varLambda (s; \beta _{0}, \beta _{1}, \varvec{\theta }_I, \varvec{x}_k, Z_k)\) in (3), where \(Z_k\), \(k = 1, \dots , N\), are independent replicates of the Gaussian random field with parameters \(\varvec{\theta }_Z\). For our data, it is not reasonable to assume that all replicates have the same \(\beta _{0}\), which controls the number of points of \(Y\), and we let each pattern \(\varvec{y}_k\) have its own intercept parameter \(\beta _{0}\), i.e. \(\beta _{0k}\) for \(\varvec{y}_k\), \(k=1,\dots ,N\). Consequently, in our application below, the pattern \(\varvec{y}_k\) is assumed to be a realization of the LGCP model with the intensity \(\varLambda (s; \beta _{0k}, \beta _{1}, \varvec{\theta }_I, \varvec{x}_k, Z_k)\).

3 Inference

The likelihood of the conditional LGCP model for a point pattern \(\varvec{y}\) with n points observed in W is

$$\begin{aligned} p(\varvec{y}; \varvec{\beta }, \varvec{\theta }_I, \varvec{\theta }_Z, \varvec{x}) = \mathbf {E}_{\varvec{\theta }_Z} \prod _{i = 1}^{n} \varLambda (y_{i}; \varvec{\beta }, \varvec{\theta }_I, \varvec{x}, Z) \exp \left( -\int _{W} \varLambda (u; \varvec{\beta }, \varvec{\theta }_I, \varvec{x}, Z)\mathrm{d}u\right) ,\nonumber \\ \end{aligned}$$
(5)

where \(\varvec{\beta }\), \(\varvec{\theta }_I\), \(\varvec{\theta }_Z\) are the model parameters, Z denotes the Gaussian random field and the expectation is over Z given \(\varvec{\theta }_Z\). As we use Bayesian inference we need to be able to evaluate the likelihood (5) efficiently. Below, we describe the approximations needed: discretization of the observation window (Sect. 3.1), an edge-corrected influence field (Sect. 3.2), and approximations related to the Gaussian field (Sect. 3.3), which include approximating the field by a Gaussian Markov random field and using the Laplace approximation to evaluate the likelihood. Finally, the approximated likelihood based on replicates is given in Sect. 3.4 and the MCMC algorithm is described in Sect. 3.5.

3.1 Discretization

To be able to make inference on LGCP models, the observation window W of the point pattern \(\varvec{y}\) is discretized using a regular grid in a similar manner as in Rue et al. (2009) and Møller et al. (1998). Namely, the observation window W is divided into G disjoint cells \(\{w_g\}\) with center locations \(\xi _g\) and area A. Furthermore, we let \(n^{y}_g\) denote the number of observations \(\varvec{y}\) within \(w_g\) in W and \({\mathbf {n}}^{y} = (n^{y}_1,\dots ,n^{y}_G)\). A piecewise constant approximation is used for the Gaussian field Z and the competition field C and the locations of \(\varvec{y}\) are replaced by the counts \(n^{y}_g\). The approximate likelihood for \({\mathbf {n}}^{y}\) is

$$\begin{aligned} p({\mathbf {n}}^{y}; \varvec{\beta }, \varvec{\theta }_I, \varvec{\theta }_Z, \varvec{x}) = \mathbf {E}_{\varvec{\theta }_Z} p({\mathbf {n}}^y; \varvec{\beta }, \varvec{\theta }_I, \varvec{x}, Z^D), \end{aligned}$$
(6)

where

$$\begin{aligned} p({\mathbf {n}}^{y}; \varvec{\beta }, \varvec{\theta }_I, \varvec{x}, Z^D) = \prod _{g=1}^G \text {Pois}(n^{y}_g; \varLambda _g), \end{aligned}$$

\(\varLambda _g = A\exp ( \beta _0 + \beta _1 C^D(\xi _g; \varvec{\theta }_I, \varvec{x}) + Z^D(\xi _g; \varvec{\theta }_Z) )\), and \(C^D\) and \(Z^D\) are the piecewise constant approximations of C and Z.

3.2 Edge correction

The large tree process X is only partially observed, and generating the influence field only based on the observed large trees would result in too weak influence near the borders. Therefore, we propose an imputation type approach, similar to the one proposed by Kühlmann-Berenzon et al. (2005), to correct for the unobserved points of \(X\). Specifically, we propose to replace the influence generated by the unobserved trees with the expected influence generated assuming that the whole process \(X\) is an independently marked homogeneous Poisson process. In the unmarked case, \(X\) is assumed to be a homogeneous Poisson process. In general, the point pattern outside the window would depend on the pattern inside the window, but this is not the case for the Poisson process.

Let \(\lambda \) and F be the intensity and mark distribution of \(X\), and \(X_{W^c}\) the restriction of X to \(W^c\), the complement of W. Using the Campbell theorem (e.g. Chiu et al. 2013) we can write

$$\begin{aligned} \mathbf {E}C(s; \varvec{\theta }_I, X_{W^c})&= \mathbf {E}\sum _{[x_j,m_j] \in X_{W^c}} c(\Vert s-x_j\Vert , m_j; \varvec{\theta }_I) \\&=\int _{R_+}\int _{R^2} c(s-x, m; \varvec{\theta }_I) \mathbf {1}_{W^c}(x) \lambda \mathrm {d}x \mathrm {d}F(m), \end{aligned}$$

where \(\mathbf {1}_{W^c}\) is the indicator function of the set \(W^c\), i.e. \(\mathbf {1}_{W^c}(x)=1\) if \(x\in W^c\), and 0 otherwise. By changing the order of the integrals we find that

$$\begin{aligned} \mathbf {E}C(s; \varvec{\theta }_I, X_{W^c})&= \int _{R^2} f(s-y) \mathbf {1}_{W^c}(y) \lambda \mathrm {d}y\\&= \int _{R^2} f(s-y) \lambda \mathrm {d}x - \int _{R^2} f(s-x) \mathbf {1}_W(x) \lambda \mathrm {d}x, \end{aligned}$$

where \(f(x) = \int _{R_+}c(\Vert x\Vert , m; \varvec{\theta }_I)\mathrm {d}F(m)\). By changing to polar coordinates and with a slight abuse of notation

$$\begin{aligned} \int _{R^2} f(s-x) \lambda \mathrm {d}x = \lambda 2\pi \int _0^\infty rf(r) \mathrm {d}r, \end{aligned}$$

which can be computed using numerical integration. Since we are only interested in locations \(s \in W\), we can replace the function f with \(f\mathbf {1}_{W^S}\), the restriction of f to the set \(W^S = \{s-x: s \in W, x \in W\}\), and

$$\begin{aligned} \int _{R^2} f(s-x) \mathbf {1}_W(x)\mathrm {d}x = \int _{R^2} (f\mathbf {1}_{W^S})(s-x) \mathbf {1}_W(x) \lambda \mathrm {d}x = (f\mathbf {1}_{W^S} * \mathbf {1}_W)(s). \end{aligned}$$

The discrete convolution of the piecewise constant approximations of \(f\mathbf {1}_{W^S}\), and \(\mathbf {1}_W\) can be efficiently computed using discrete Fourier transforms (Oppenheim et al. 1999; Frigo and Johnson 2005). For F, we use the empirical distribution of marks in the sample plot under study.

The edge-corrected influence field value at any location \(s\in W\) is then obtained as the sum of the influence field calculated from the observed \(\varvec{x}_{W}\), \(C(s; \varvec{\theta }_I, \varvec{x}_{W})\), and the expected influence load of the unobserved \(X_{W^c}\). In general, we use the numerical approximation explained above, but for the special case of the Gaussian influence kernel (1) and a rectangular observation window, it is easy to compute the edge correction by hand.

3.3 Approximations related to the Gaussian field

We use Laplace approximation (Tierney and Kadane 1986; Rue et al. 2009) to approximate the likelihood (6) and obtain

$$\begin{aligned} \mathbf {E}_{\varvec{\theta }_Z} p({\mathbf {n}}^y; \varvec{\beta }, \varvec{\theta }_I, \varvec{x}, Z^D) \approx \sqrt{\frac{(2\pi )^d}{\det (-{\mathbf {H}}(\hat{{\mathbf {z}}}))}} p({\mathbf {n}}^y; \varvec{\beta }, \varvec{\theta }_I, \varvec{x}, \hat{{\mathbf {z}}})p(\hat{{\mathbf {z}}} ; \varvec{\theta }_Z), \end{aligned}$$
(7)

where \({\mathbf {H}}\) and \(\hat{{\mathbf {z}}}\) are the Hessian and maximizer of \(\log p({\mathbf {n}}^y; \varvec{\beta }, \varvec{\theta }_I, \varvec{x}, {\mathbf {z}})p({\mathbf {z}} ; \varvec{\theta }_Z)\), respectively, and \(p({\mathbf {z}} ; \varvec{\theta }_Z)\) is the probability density of the vector \({\mathbf {Z}}^D\) which contains the values of \(Z^D\) at grid cells.

Since the Gaussian random field Z is assumed to have mean zero and the Matérn covariance function (4) with \(\nu = 2\), we can utilize the explicit link between Gaussian fields and Markov random fields (Lindgren et al. 2011), which tells us that the distribution of \({\mathbf {Z}}^D\) should be approximated with a Gaussian distribution with a precision matrix given by Lindgren et al. (2011).

3.4 Replicates

Since the point patterns are assumed to be conditionally independent, the likelihoods (5) for each replicate \(\varvec{y}_k\) can be multiplied to yield the final likelihood

$$\begin{aligned} p(\varvec{y}_1, \dots , \varvec{y}_N ;\varvec{\beta }, \varvec{\theta }_I, \varvec{\theta }_Z, \varvec{x}_1, \dots , \varvec{x}_N) = \prod _{k=1}^Np(\varvec{y}_k; \beta _{0k}, \beta _1, \varvec{\theta }_I, \varvec{x}_k), \end{aligned}$$
(8)

where now \(\varvec{\beta }\) contains all the regression coefficients, i.e. \(\varvec{\beta }= (\beta _{01}, \dots , \beta _{0N}, \beta _1)\). To obtain an approximation of (8), the approximations (6) and (7) are applied to each pattern separately.

3.5 MCMC

Combining the likelihood (8) with the prior \(p(\varvec{\beta }, \varvec{\theta }_I, \varvec{\theta }_Z)\) yields the approximate posterior distribution. To sample from this distribution, we use Robust Adaptive Metropolis algorithm (Vihola 2012, 2020), which uses a Gaussian random-walk proposal distribution, whose covariance is updated adaptively. The limiting proposal covariance matches the shape of the posterior, such that an average acceptance rate of 0.234 is attained, following the theoretical findings presented e.g. in Roberts et al. (1997).

4 Simulation experiment

We made a simulation experiment to study the performance of the inference approach and the edge correction method suggested above. The point pattern \(\varvec{x}\) was a realization of either a Poisson process or a regular Strauss process. The Strauss process (e.g. Illian et al. 2008) was included to see whether the edge correction based on the Poisson assumption of \(X\) would work even in a more regular case. We did not include any cluster process since in our application, the large tree patterns \(\varvec{x}\) are regular. Also, based on a small simulation study (results not shown here), it is unlikely that the Poisson correction would work well when the \(\varvec{x}\) pattern is strongly clustered. We did not include marks in the simulation experiment.

4.1 Set-up

The intensity parameters of the Poisson and Strauss processes were chosen such that they result in approximately 60 points in the observation window \(W=[0,40]\times [0,40]\). In the Strauss process (parametrized as in Baddeley et al. 2015), the intensity related parameter was 0.06, the interaction strength 0.1, and the interaction radius 2, making the resulting patterns rather regular. The \(\varvec{y}\) patterns were generated on W and the \(\varvec{x}\) patterns on the extended window \(W_{\text {ext}}=[-20,60]\times [-20,60]\) to be able to use plus sampling which represents the ideal situation where no imputation is needed as the complete pattern is known. The Gaussian kernel (1) was used as the influence kernel. Initially, the parameters of the competition field and of the Gaussian field were set to the estimates found in Sect. 5 and the intercept \(\beta _0\) was chosen such that the resulting LGCP model would have 600 points on average. First we used the estimated values \(\beta _1=-0.7\) and \(\theta =2.1\), called “estimated” in Fig. 2. In addition, we used either the values \(\beta _1=-3\), and \(\theta =2.1\) corresponding to a much stronger effect of the influence kernel (\(\beta _1\)) (“strong” in Fig. 2) or the values \(\beta _1=-0.7\), and \(\theta =6\) corresponding to a much larger range of influence \(\theta \) (“wide” in Fig. 2) than in the data. In all cases, \(\sigma _Z=1.6\) and \(\rho _Z=2.6\). We generated 100 replicates of each \(X\) process and one \(\varvec{y}\) pattern for each \(\varvec{x}\). The random intensity of the Cox process was approximated by a piecewise constant function using 0.1 m \(\times \) 0.1 m cells.

We fitted the conditional LGCP model to the simulated point patterns. We discretized the observation windows to pixels of size 1 m \(\times \) 1 m and set weakly informative independent priors for all model parameters as follows: For the parameters in \(\varvec{\beta }\), we used Gaussian distributions with mean zero and standard deviation 10. For the range parameters \(\rho _Z\) and \(\theta \), very small and very large values do not make sense based on the discretization and window used. Thus, we set the prior to be the Gamma distribution with shape parameter 2.4 and scale parameter 1.8, implying that approximately 90% of the prior probability is between 1 m and 10 m. Furthermore, the prior for the standard deviation of the Gaussian field \(\sigma _Z\) was the exponential distribution with expectation 10, slightly favoring small values.

For each point pattern, we then ran the MCMC scheme using (a) no edge correction, (b) the Poisson edge correction and (c) plus sampling edge correction with 100,000 updates using the true parameter values as the starting values. For each chain we discarded 20,000 first samples as burn-in and saved every 10th sample. When influence was strong, most chains converged and mixed well. However, there were problems with mixing if the influence was not so strong. In this case the effective sample size was estimated to be less than 1000 in half of the chains. Upon closer inspection multi-modality was often the cause. We used posterior means of each chain in the comparisons. Using posterior modes led to identical conclusions.

4.2 Results

First, we investigated the performance of the Bayesian inference approach. To avoid edge effects, we estimated the parameters using plus sampling, utilizing the true pattern \(\varvec{x}\) in the extended window. Based on the distributions of the posterior means for the plus sampling method (see Plus in Fig. 2), we can see that the Bayesian MCMC approach with the approximations used performed reasonably well for the main parameters \(\beta _1\) and \(\theta \). However, the less interesting random field parameters were clearly biased. As expected, the distribution of the \(X\) pattern did not affect the performance of the inference.

Fig. 2
figure 2

Quantiles (0.05, 0.25, 0.5, 0.75, 0.95) of differences between posterior means and reference values. For each row of the figure, we display the \(X\) process and the competition effect on the right and within each subfigure, we label the three different edge corrections (left). The quantiles are based on 100 replicates

Second, we investigated the performance of the Poisson edge correction. An example of the expected intensity field with and without edge correction for the conditional LGCP model with the parameters estimated from the EVO02 pattern and Strauss pattern \(\varvec{x}\) is shown in top row of Fig. 3. It can be seen that the Poisson corrected and the plus sampling corrected intensities are quite similar to each other. The bottom row of Fig. 3 further shows the components of the influence field for the Poisson correction, namely the contribution of the observed points (left) and the expected contribution of the unobserved points under the Poisson assumption (middle). The contribution of unobserved points is shown for comparison (right). The Poisson correction simply approximates the contribution of the unobserved points.

Fig. 3
figure 3

Top row: expected intensity of the conditional LGCP with parameters estimated from the EVO02 pattern using no edge correction (left), Poisson correction (middle) and plus sampling (right). Bottom row: influence field induced by the observed points (left), expected influence field caused by the unobserved points under the Poisson assumption (middle) and influence field caused by the unobserved points (right). The \(\varvec{x}\) pattern is a realization of a Strauss process with interaction parameter 0.1, interaction range 2, and with on average 60 points. Dark color means low intensity/high influence

To assess the performance of the proposed edge correction method, we compared the posterior means of the model parameters \(\beta _0, \beta _1 \text { and } \theta \), obtained by using plus sampling to the estimates obtained by using the Poisson correction and those obtained by using no edge correction. The distribution of the posterior means is shown in Fig. 2. It can be seen that the estimates of the different methods are very similar when the influence of the large trees was not too wide, for both \(X\) processes. However, when the influence was wide, the proposed Poisson correction produced estimates that were closer to the plus sampling based estimates than the uncorrected estimates were. The results were altogether very similar for the Poisson and Strauss processes. Thus, the edge correction plays a role if the range of influence of the \(\varvec{x}\) points on the intensity of \(Y\) is wide.

5 Application

The data shown in Fig. 1 have been collected on 40 m \(\times \) 40 m squares in southern Finland. They are part of a larger data set collected for studies on tree and stand development in managed, uneven-aged Norway spruce forests conducted under the ERIKA research project at the Natural Resources Institute Finland (Eerikäinen et al. 2007, 2014; Saksa and Valkonen 2011). Using the conditional LGCP model, we studied the effect of large trees \(\varvec{x}_i\) (black circles) to the seedling patterns \(\varvec{y}_i\) (red crosses). The patterns \(\varvec{x}_i\) consist of trees which had a vital crown with no damages and with a dbh at least 7 cm in 1991. Most trees (78% of trees, 70% basal area) were Norway spruces and the remaining ones either Scots pines or broadleaves. The seedlings were naturally generated with height at least 10 cm in 1996 and had reached this height after the data collection in 1991. The seedlings were mostly Norway spruces (98%).

We fitted the conditional LGCP model using different mark dependent and mark independent Gaussian influence fields: the full mark dependent model (2), the two reduced models where either of the mark specific parameters, namely \(\delta \) or \(\alpha \) were set to zero, the mark independent model (1), and a model without an influence field. The mark was always the dbh. We used the same discretization of the observation window (1 m \(\times \) 1 m pixel size) and the same priors as in the simulation experiment (Sect. 4.1). The pixel size 1 m \(\times \) 1 m was chosen because variations in smaller units are practically unimportant in forests. The priors for \(\alpha \) and \(\delta \) were both the exponential distribution with expectation 10. We then ran the MCMC scheme using the Poisson edge correction with 120,000 updates, leaving out the first 20,000 observations of the chains as the burn-in.

To compare the models, we used the posterior predictive model assessment based on various summary characteristics, namely the L-function (variance stabilizing version of Ripley’s K), the empty space function F, and the nearest neighbour distribution function G summarizing the spatial pattern \(\varvec{y}\) and, to investigate the relationship between the large trees and seedlings, the cross L-function, \(L_{12}\) (e.g. Illian et al. 2008; Diggle 2013). We used the standard estimators of these functions with translational (L, \(L_{12}\)) and Kaplan-Meier edge correction methods (F, G) (Baddeley and Gill 1997). For each plot, we generated 10,000 patterns of seedlings from the posterior predictive distributions of the conditional LGCP models given the observed \(\varvec{x}\) and calculated the summary functions for the data and for each of the generated patterns. The posterior predictive simulations were made using a discretization with 0.2 m \(\times \) 0.2 m cell size.

Figure 4 shows the empirical \(L_{12}\) functions together with the 95% global extreme rank length envelopes (Myllymäki et al. 2017; Myllymäki and Mrkvička 2020) constructed from the \(L_{12}\) summary functions of the simulations of the fitted model with mark independent influence kernel (1) (shaded region), mark dependent influence kernel (2) (dotted lines), and no influence kernel (dashed line) separately for each plot. The observed \(L_{12}\) function is distinctly better covered by the envelopes based on the models with influence field than without. While the envelopes of the model without an influence field are centred around zero, i.e., no interaction between trees and seedlings, the empirical \(L_{12}\) functions have the tendency to go below zero in most plots, indicating repulsion or inhibition of trees and seedlings, and the envelopes of the models with influence kernels are shifted downwards as well. The difference between the two models with influence kernels is, however, minor. Other summary functions (L, F, G) produced very similar envelopes regardless of the type or lack of influence field, see figures in Appendix 1. The empirical functions were inside the envelopes, except the nearest neighbor distance distribution functions of four sample plots VES07, VES13, VES14 and VES16, which were slightly outside the envelopes at distances less than 1 m, i.e. less than the pixel size used in the discretization. This may suggest that the spatial distribution of the seedlings is not Poisson at a very small scale, but we did not investigate this further.

The envelopes for the models with mark dependent kernels with either \(\delta \) or \(\alpha \) set to zero are omitted because they were very similar to the envelopes of the other two influence kernels.

Fig. 4
figure 4

Empirical \(L_{12}\) functions (solid line) together with the 95% global envelopes constructed from 10,000 simulations from the posterior predictive distribution of the fitted conditional LGCP models for the 14 plots in Fig. 1 with mark independent (1) (grey shade), mark dependent (2) (dotted lines), and no (dashed lines) influence

Fig. 5
figure 5

Simulated seedling pattern (red crosses) and observed large trees (black circles, radius relative to dbh). The simulation was done using the posterior predictive distribution of the fitted conditional LGCP models for the 14 plots in Fig. 1 with mark independent influence of large trees

Based on the analysis above, it is clear that an influence kernel is needed. However, since all the models with an influence kernel fitted the data equally well, we report the results of the simplest model (1). The marginal posterior distributions of the model parameters of this model are shown in Fig. 6. The influence of the large trees on the seedlings (\(\beta _1\)) is clearly negative meaning that the seedlings avoid locations in the close vicinity of the large trees. The range of influence \(\theta \) of the large trees was estimated to be around 2.1 m, indicating that the influence of a large tree decreases from its maximum influence (at the tree location) to 37% of it at distance 2.1 m from the tree, or to 5% of it at distance 3.6 m. However, there is a lot of unexplained variability, as the quite wide envelopes in Fig. 4 and Appendix 1 show.

Figure 5 shows for each plot one realization drawn from the posterior predictive distribution of the model with mark independent influence kernel. It is difficult to detect the relationship between trees and seedlings by eye, but one can compare the clustering of the seedling patterns to the observed patterns (Fig. 1). The patterns in Figs. 1 and 5 look rather similar, and according to the envelope tests (see Fig. 4 and Appendix 1) the model captures small scale structures up to 5 m distances.

Fig. 6
figure 6

Posterior quantiles (0.05, 0.25, 0.5, 0.75, 0.95) of the common parameters (top) and the sample plot specific intercepts \(\beta _0\) (bottom)

6 Discussion

We proposed a LGCP model to investigate the effect of large trees on the intensity of seedlings under the presence of unexplained clustering. The influence of large trees was modeled by using parametric influence kernels around them. Our analysis suggests that tree regeneration is affected by the pattern of large trees in the studied data. Namely, the large trees were found to have negative effect on the seedling density in the vicinity of large trees. Further, the LGCP model could capture much of the unexplained clustering. For parameter estimation, we constructed a Bayesian approach using MCMC and Laplace approximation. All computations were implemented in Julia language (Bezanson et al. 2017), while graphics were done using ggplot2 (Wickham 2016).

Estimation of the influence field parameters worked well in our simulation experiment and we did not observe any problems due to possible confounding between the influence field and the spatial random effect as reported in the literature (e.g. Dupont et al. 2020). However, the random field parameter estimates were biased. We suspect that this is caused by weak identifiability (cf. Anderes 2010; Zhang 2004) or discretization bias coupled with the Laplace approximation. We used replicates to help with the weak identifiability which was necessary for the plots with very few seedlings. In our further experiments with finer discretizations (results not shown), we observed issues with the approximation. In particular, when the number of points per cell was small, the approximate posterior appeared to degenerate. We are unaware of exact inference methods that would be feasible in our setting, but we are currently investigating new methods that could allow for more detailed investigation of this issue.

There are many alternative approaches to inference with log Gaussian Cox processes. For example, the R package INLA (Rue et al. 2009) uses Laplace approximation in a similar fashion as we did, but is somewhat restricted to linear models. Indeed, INLA can in principle accommodate our model using the rgeneric class (personal communication with Håvard Rue). However, we faced some computational difficulties in estimation. The R package lgcp (Taylor et al. 2015) uses MALA algorithm for efficient Bayesian inference for the full model including the latent field. The use of full MCMC might lead to better estimation of the random field parameters. However, the lgcp package is also restricted to linear models, whereby we were not able to apply it directly to our model. For Stan (Stan Development Team 2018) our random field model appears to be too complicated, however there are some recent advances see e.g. Margossian et al. (2020). Also, inlabru (Bachl et al. 2019) could be further investigated.

Since the large trees outside the sampling window may affect the intensity of the seedlings within the window, an edge correction assuming that the large trees were from a Poisson process was included in the estimation procedure. We demonstrated by a small simulation study that this edge correction can work well even when the large trees are from a regular process. Compared to no edge correction, it improved the parameter estimates when the range of influence was rather wide.

An obvious alternative strategy would be to include the locations of the unobserved trees outside the observation window to the MCMC estimation in a similar manner as considered in the inference for Neyman-Scott point processes (Møller and Waagepetersen 2004). This approach would allow incorporating prior information on the large tree process in the edge correction at the cost of increased complexity. Since we did not have important prior information and the effect of edge correction appeared minor, we did not explore this approach further. Ideas from Geyer (1999) or Gabriel et al. (2017) could be used to find further alternative edge correction methods.

Our proposed edge correction method can be efficiently implemented when the influence field is constructed as the sum of individual signals. In principle, a similar edge correction could be applied with different combination rules, such as product (e.g. Wu et al. 1985; Miina and Pukkala 2002) or max-fields (e.g. Penttinen and Niemi 2007). If also the influence kernel is binary, e.g. \(c(h; \theta ) = {\mathbf {1}}( h \le \theta )\), then the max-field is split into two phases as well, namely influence and influence-free zones. However, we note that our proposed calculation of the expected influence of trees outside the observation window W does not generalize directly to other combination rules.

The models introduced in this paper could be useful even for natural (e.g. Abellanas and Pérez-Moreno 2018) or urban forests (Hauru et al. 2012). Furthermore, they could be used in an experimental setup, where realizations of seedlings would be generated for different large tree patterns and the success of regeneration evaluated by some spatial summary functions such as the empty space function. In a similar manner, the effect of different thinning strategies on the regeneration of trees could be evaluated.

It could be argued that, since the management was the same for all plots and the geographical differences minor, the plots should have had a common intercept parameter. However, this was clearly not the case due to large variation in numbers of seedlings from plot to plot. Since we used plot specific intercepts, it could be argued that all other parameters should be plot specific too. This was not possible in practice due to the problems with the random field parameters. We did not explore the alternative where the intercept and the influence field parameters would be plot specific but the random field parameters shared since the envelope tests already suggested adequate fit of the model.

The observed and simulated seedling patterns in the Figures 1 and 5, respectively, are very similar in several aspects while quite different in others. For example, the clusters seemed to be clustered in the VES13 plot. Although the envelope tests suggest that the model was able to capture the variability in the data, it depends on the specific application if the model is adequate. To best of our knowledge, this is the first point process model accounting for clustering of the seedlings in these uneven-aged forests.

There are many other factors than the vicinity of large trees that may affect the intensity of seedlings (Valkonen and Maguire 2005; Kuusinen et al. 2019). Therefore, the model could be further improved and unexplained variability decreased if some covariate information on local conditions within plots would be available to be included in the model. Further, plot level covariate effects could be added to the model in order to explain the numbers of seedlings in different plots. Finally, we modeled the influence of large trees as a function of the dbh, whose effect on the influence was, however, minor in our data. Other possibly useful marks could be the height, crown ratio or crown width of the tree, for example.