In this section, we address the Bayesian inference problem of our spatio-temporal GP-ETAS model. The objective is to estimate the joint posterior density \(p(\mu ,\varvec{\theta }_\varphi | \mathcal {D})\), which encodes the knowledge (including uncertainties) about \(\mu \) and \(\varvec{\theta }_{\varphi }\) after having seen the data. This is because, the posterior density combines information about \(\mu \) and \(\varvec{\theta }_{\varphi }\) contained in the data (via the likelihood function) and prior knowledge (information before seeing the data) about \(\mu \) and \(\varvec{\theta }_{\varphi }\). Here, \(\mu \) denotes the entire random field of the background intensity as in (13d) and \(\varvec{\theta }_{\varphi }\) are the parameters of the triggering function.
The likelihood of observing a point pattern \(\mathcal {D}=\{(t_i,\varvec{x}_i, m_i)\}_{i=1}^{N_{\mathcal {D}}}\) under the GP-ETAS model (11) is given by the point process likelihood
$$\begin{aligned} p(\mathcal {D}|\mu ,\varvec{\theta }_\varphi )&\!=\! \prod _{i=1}^{N_{\mathcal {D}}}\lambda (t_i,\varvec{x}_i|\mu (\varvec{x}_i),\varvec{\theta }_\varphi ) \exp \left( \!-\int _{\mathcal {T}}\int _{\mathcal {X}} \lambda (t,\varvec{x}|\mu (\varvec{x}),\varvec{\theta }_\varphi )\mathop {}\!\mathrm {d}\varvec{x}\mathop {}\!\mathrm {d}t \!\right) ,\! \end{aligned}$$
(15)
where the intensity \(\lambda (\cdot )\) is given by (11), and the dependencies on \(H_t\), \(H_{t_i}\) are omitted for notational convenience.
Assuming a joint prior distribution denoted here by \(p(\mu ,\varvec{\theta }_\varphi )\) for simplicity, the posterior distribution becomes
$$\begin{aligned} p(\mu ,\varvec{\theta }_\varphi | \mathcal {D}) \propto p(\mathcal {D}| \mu ,\varvec{\theta }_\varphi ) p(\mu ,\varvec{\theta }_\varphi ). \end{aligned}$$
(16)
This posterior is intractable in practice and hence standard inference techniques are not directly applicable. More precisely, the following three main challenges arise:
-
(i)
The background intensity \(\mu \) and triggering function \(\varphi (\cdot |\varvec{\theta }_\varphi )\) cannot be treated separately in the likelihood function (15).
-
(ii)
The likelihood (15) includes an intractable integral inside the exponential term due to the GP prior on f in (11), that is the integral of f over \(\mathcal {X}\). Furthermore, normalisation of (16) requires an intractable marginalisation over \(\mu \) and \(\varvec{\theta }_\varphi \). Thus, the posterior distribution is doubly intractable (Murray et al. 2006).
-
(iii)
We assume a Gaussian process prior for modelling the background rate. However, the point process likelihood (15) is non-Gaussian, which makes the functional form of the posterior nontrivial to treat in practice.
We approach these challenges by data augmentation based on the work of Hawkes and Oakes (1974), Veen and Schoenberg (2008), Adams et al. (2009), Polson et al. (2013), Donner and Opper (2018). We will find that this augmentation simplifies the inference problem substantially. The following three auxiliary random variables are introduced:
-
(1)
A latent branching structure Z, as described in Sect. 2.1.1, decouples \(\mu \) and \(\varvec{\theta }_\varphi \) in the likelihood function (e.g., Veen and Schoenberg 2008). (See Sect. 4.1 and Eq. (17) for details.)
-
(2)
A latent Poisson process \(\Pi \) enables an unbiased estimation of the integral term in the likelihood function that depends on \(\mu \), as the joint distribution of latent and observed data results in a homogeneous Poisson process with constant integral term. (See Sect. 4.2, paragraph Augmentation by a latent Poisson process and Eq. (21) for details.)
-
(3)
We make use of the fact, that the logistic sigmoid function can be written as an infinite scale mixture of Gaussians using latent Pólya–Gamma random variables \(\omega \sim p_{\scriptscriptstyle \mathrm {PG}}(\omega )\) (Polson et al. 2013), defined in Appendix 2. This leads to a likelihood representation, which is conditional conjugate to all the priors including the Gaussian process prior for the background component of the likelihood function (Donner and Opper 2018). (See Sect. 4.2, paragraph Augmentation by Pólya–Gamma random variables and Eqs. (23, 24) for details.)
These three augmentations allow one to implement a Gibbs sampling procedure (Geman and Geman 1984) that produces samples from the posterior distribution in (16). More precisely, random samples are generated in a Gibbs sampler by drawing one variable (or a block of variables) from the conditional posterior given all the other variables. Hence, we need to derive the required conditional posterior distributions as outlined next.
The suggested sampler consists of three modules using the solutions (data augmentations) sketched above: sampling the latent branching structure, inference of the background \(\mu \), and inference of the triggering \(\varvec{\theta }_\varphi \). Our overall Gibbs sampling algorithm of the posterior distribution is summarised in Algorithm 1. After an initial burn-in (a sufficiently long run of the three modules (Sects. 4.1–4.3), the generated samples converge to the desired joint posterior distribution \(p(\mu ,\varvec{\theta }_\varphi |\mathcal {D})\).
In the following, we discuss some important aspects of the three modules of the Gibbs sampler which the sampler runs repeatedly trough.
Sampling the latent branching structure
Augmentation by the latent branching structure. We consider an auxiliary variable \(z_i\) for each data point i, which represents the latent branching structure as defined in Sect. 2.1.1. Recall that it gives the time index of the parent event. If \(z_i=0\) then the event is a spontaneous background event. Further we define \(Z=\{z_i\}_{i=1}^{N_{\mathcal {D}}}\), which is the overall branching structure of the data \(\mathcal {D}\). The likelihood \(p(\mathcal {D},Z| \mu ,\varvec{\theta }_\varphi )\) of the augmented model can be written as in 17,
$$\begin{aligned}&p(\mathcal {D},Z| \mu ,\varvec{\theta }_\varphi ) = \underbrace{\prod _{i=1}^{N_{\mathcal {D}}} \mu (\varvec{x}_i)^{{{\,\mathrm {\mathbb {I}}\,}}(z_i=0)} \exp \left( -|\mathcal {T}|\int _{\mathcal {X}}\mu (\varvec{x}) \mathop {}\!\mathrm {d}\varvec{x}\right) }_{\mathrm{(a)}=p(\mathcal {D}_0 \vert Z,\mu )} \nonumber \\&\quad \times \underbrace{\prod _{i=1}^{N_{\mathcal {D}}} \prod _{j=1}^{i-1} \varphi _{ij}(\varvec{\theta }_\varphi ) ^{{{\,\mathrm {\mathbb {I}}\,}}(z_i=j)} \prod _{i=1}^{N_{\mathcal {D}}} \exp \left( \!-\int _{\mathcal {T}_i}\int _{\mathcal {X}}\! \varphi _{i}(\varvec{\theta }_\varphi ) \mathop {}\!\mathrm {d}\varvec{x}\mathop {}\!\mathrm {d}t\!\right) }_ {\mathrm{(b)=p(\mathcal {D}\vert Z, }{\varvec{\theta }_\varphi )}}p(Z),\! \end{aligned}$$
(17)
where \({{\,\mathrm{\mathbb {I}}\,}}(\cdot )\) denotes the indicator function, i.e., \({{\,\mathrm{\mathbb {I}}\,}}(z_i=j)\) takes the value 1 for all \(z_i=j\) and 0 otherwise, \(\varphi _{ij}(\varvec{\theta }_\varphi )=\varphi (t_i-t_j,\varvec{x}_i-\varvec{x}_j | m_j,\varvec{\theta }_\varphi )\), \(\varphi _{i}(\varvec{\theta }_\varphi )=\varphi (t-t_i,\varvec{x}-\varvec{x}_i | m_i,\varvec{\theta }_\varphi )\), \(\mathcal {T}_i =[t_i,|\mathcal {T}|] \subset \mathcal {T}\), and all possible branching structures are equally likely, i.e. \(p(Z) = \text{ const }\). Furthermore, \(\mathcal {D}_0=\{\varvec{x}_i\}_{i:z_i=0}\) denotes the set of \(N_{\mathcal {D}_0}\) background events. Note, that marginalizing over Z in (17) recovers (15), because \(\sum _{z_i=0}^{i-1}\mu (\varvec{x}_i)^{{{\,\mathrm{\mathbb {I}}\,}}(z_i=0)}\prod _{j=1}^{i-1} \varphi _{ij}(\varvec{\theta }_\varphi ) ^{{{\,\mathrm{\mathbb {I}}\,}}(z_i=j)}=\lambda (t_i,\varvec{x}_i\vert \mu (\varvec{x}_i),\varvec{\theta }_\phi )\). The augmented likelihood factorises into two independent components, (a) a likelihood component for the background intensity which depends on \(\mu \) (first two terms on the rhs of (17)) and (b) a likelihood component of the triggering function which depends on \(\varvec{\theta }_\varphi \) (last two terms on the rhs of (17)).
From (17) one can derive the conditional distribution of \(z_i\) given all the other variables. Note that all \(z_i\)’s are independent. The conditional distribution is proportional to a categorical distribution,
$$\begin{aligned} p(z_i| \mathcal {D},\mu (\varvec{x}_i),\varvec{\theta }_\varphi )&\propto \left[ \mu (\varvec{x}_i) \right] ^{{{\,\mathrm{\mathbb {I}}\,}}(z_i=0)} \prod _{j=1}^{i-1} \left[ \varphi _{ij}(\varvec{\theta }_\varphi ) \right] ^{{{\,\mathrm{\mathbb {I}}\,}}(z_i=j)}\nonumber \\&= \prod _{j=0}^{i-1} p_{ij}^{{{\,\mathrm{\mathbb {I}}\,}}(z_i=j)}, \end{aligned}$$
(18)
with the probabilities \(p_{ij}\) given by (3) and (4) which we collect in a vector \(\varvec{p}_i\in \mathbb {R}^i\).
From (18) one can see that the latent branching structure at the kth iteration of the Gibbs sampler is sampled from a categorical distribution, \(\forall i=1,\ldots ,N_{\mathcal {D}}\)
$$\begin{aligned}&z_i^{(k)}|\mathcal {D},(\mu (\varvec{x}_i),\varvec{\theta }_\varphi ) ^{(k-1)} \sim \mathrm {Categorical}(\varvec{p}_i). \end{aligned}$$
(19)
Here \((\mu (\varvec{x}_i),\varvec{\theta }_\varphi )^{(k-1)}\) denotes the values of \(\mu (\varvec{x}_i)\) and \(\varvec{\theta }_\varphi \) from the previous iteration.
Inference for the background intensity
Given an instance of a branching structure Z, the background intensity in (17) depends on events i for which \(z_i=0\) only. One finds that the resulting term is a Poisson likelihood of the form
$$\begin{aligned} p(\mathcal {D}_0\vert f,\bar{\lambda }, Z)&= \prod _{i=1:z_i=0}^{N_\mathcal {D}}\bar{\lambda }\sigma (f_i) \exp \left( -|\mathcal {T}|\int _{\mathcal {X}}\bar{\lambda }\sigma (f(\varvec{x}))\mathop {}\!\mathrm {d}\varvec{x}\right) ,\! \end{aligned}$$
(20)
where \(\mu (\varvec{x})\) has been replaced by (11) and \(f_i=f(\varvec{x}_i)\) has been used for notational convenience.
Because of the aforementioned problems in Sect. 4, sampling the conditional posterior \(p(f,\bar{\lambda }|\mathcal {D}_0, Z)\) is still non-trivial and requires further augmentations which we describe next.
Augmentation by a latent Poisson process. We can resolve issue (ii) from Sect. 4 by introducing an independent latent Poisson process \(\Pi =\{\varvec{x}_l\}_{l=N_\mathcal {D}+ 1}^{N_{\mathcal {D}\cup \Pi }}\) on the data space with rate \(\hat{\lambda }(\varvec{x})=\bar{\lambda }(1-\sigma (f(\varvec{x})))=\bar{\lambda }(\sigma (-f(\varvec{x})))\) using \(1-\sigma (z)=\sigma (-z)\). The points in \(\mathcal {D}\), \(\Pi \) form the joint set \(\mathcal {D}\cup \Pi \) with cardinality \(N_{\mathcal {D}\cup \Pi }\). Note, that the number of elements in \({\Pi }\), i.e. \(N_{\Pi }\), is also a random variable. The joint likelihood of \(\mathcal {D}_0\) and the new random variable \(\Pi \) is,
$$\begin{aligned} \begin{aligned} p(\mathcal {D}_0,{\Pi }\vert f, \bar{\lambda }, Z)&=\!\! \prod _{i=1:z_i=0}^{N_\mathcal {D}} \bar{\lambda }\sigma (f_i)\!\! \prod _{l=N_\mathcal {D}+ 1}^{N_{\mathcal {D}\cup {\Pi }}} \bar{\lambda }\sigma (-f_l) \exp \left( -|\mathcal {X}||\mathcal {T}|\bar{\lambda }\right) \!, \end{aligned} \end{aligned}$$
(21)
where \(f_l=f(\varvec{x}_l)\). Thus, by introducing the latent Poisson process \(\Pi \), we obtain a likelihood representation of the augmented system, where the former intractable integral inside the exponential term disappears, i.e. reduces to a constant.
We can gain some intuition by reminding ourselves of the aforementioned thinning algorithm (Lewis and Shedler 1976) in Sect. 3.2. Considering \(\mathcal {D}_0\) as a resulting set of this algorithm, we wish to find the set \({\Pi }\), such that the joint set \(\mathcal {D}_0\cup {\Pi }\) is coming from a homogeneous Poisson process with rate \(\bar{\lambda }\). Because \(\mathcal {D}_0\) is a sample of a Poisson process with rate \(\bar{\lambda }\sigma (f)\) and the superposition theorem of Poisson processes (Kingman 1993), one finds that if \({\Pi }\) is distributed according to a Poisson process with rate \(\bar{\lambda }\sigma (-f)\), the joint set \(\mathcal {D}_0\cup {\Pi }\) has the rate \(\bar{\lambda }\sigma (f)+\bar{\lambda }\sigma (-f)=\bar{\lambda }\). As we will see later, for the augmented model only the cardinality \(\vert \mathcal {D}_0\cup {\Pi }\vert \) will determine the posterior distribution of \(\bar{\lambda }\).
Having a closer look at the augmented likelihood (21) and considering only terms depending on the function f, one can find a resemblance with a classical classification problem, namely logistic regression. Having the joint set, \(\mathcal {D}_0\cup {\Pi }\) the probability of a point belonging to \(\mathcal {D}_0\) is \(\sigma (f)\) and to \({\Pi }\) it is \(1-\sigma (f)=\sigma (-f)\). Since we know, which points belong to which set, the aim is to find the function f, which best classifies/separates these two sets.
While above we provided some intuition, rigorously one can derive the latent Poisson process \({\Pi }\) following Donner and Opper (2018). Note that (20) implies
$$\begin{aligned}&\exp \left( -|\mathcal {T}|\int _{\mathcal {X}}\bar{\lambda }\sigma (f(\varvec{x}))\mathop {}\!\mathrm {d}\varvec{x}\right) \nonumber \\&\quad =\exp \left( \int _{\mathcal {T}}\int _{\mathcal {X}}\bar{\lambda }(\sigma (-f(\varvec{x}))-1)\mathop {}\!\mathrm {d}\varvec{x}\mathop {}\!\mathrm {d}t\right) \nonumber \\&\quad = \mathbb {E}_{\bar{\lambda }}\left[ \prod _{\varvec{x}_l\in {\Pi }} \sigma (-f(\varvec{x}_l))\right] , \end{aligned}$$
(22)
where the expectation is over random sets \({\Pi }\) with respect to a Poisson process measure with rate \(\bar{\lambda }\) on the space-time window of the data \(\mathcal {T}\times \mathcal {X}\). Here, one uses Campbell’s theorem (Kingman 1993). Writing the likelihood parts depending on f and \(\bar{\lambda }\) in (17) in terms of the new random variable \({\Pi }\) we get (21). Note that marginalisation over the augmented variable \({\Pi }\) leads back to the background likelihood in (20) conditioned on the branching structure Z.
Note, that at this stage, with the augmentation in this section, our inference problem became tractable, because the augmented likelihood (21) depends on function f only at a finite set of points. In principle at this stage we could employ acceptance rejection algorithms as in Adams et al. (2009). However, to improve efficiency we will introduce one more variable augmentation in the next paragraph, that will allow rejection-free sampling of f.
Augmentation by Pólya–Gamma random variables. Investigating the augmented likelihood (21) issue (iii) from Sect. 4 is still present, because it is nonconjugate to the GP prior that we assume for function f in the GP-ETAS model. However, we noted before, the relation to a logistic regression problem. Polson et al. (2013) introduced the so-called Pólya–Gamma random variables, that allows to efficiently solve the inference problem of logistic GP classifiction (Wenzel et al. 2019). Here, we utilize the same methodology, where make use of the fact that the sigmoid function can be written an infinite scale mixture of Gaussians using latent (Polson et al. 2013), that is,
$$\begin{aligned} \sigma (z)&= \frac{e^\frac{z}{2}}{2\cosh (\frac{z}{2})} = \frac{1}{2}e^\frac{z}{2}\int _{0}^\infty e^{-\frac{z^2}{2}\omega }p_{\scriptscriptstyle \mathrm {PG}}(\omega \vert 1,0) \mathop {}\!\mathrm {d}\omega , \end{aligned}$$
(23)
where the new random Pólya–Gamma variable \(\omega \) is distributed according to the Pólya–Gamma density \(p_{\scriptscriptstyle \mathrm {PG}}(\omega \vert 1,0)\), see Appendix 2. Inserting the Pólya–Gamma representation of the sigmoid function (23) into (21) yields
$$\begin{aligned} \begin{aligned}&p(\mathcal {D}_0,\varvec{\omega }_{\mathcal {D}},{\Pi },\varvec{\omega }_{{\Pi }} \vert \varvec{f}, \bar{\lambda }, Z)\\&\quad =\!\!\prod _{\begin{subarray}{l} i:z_i=0 \end{subarray}}^{N_\mathcal {D}} \frac{\bar{\lambda }}{2} e^{\frac{f_i}{2}-\frac{f_i^2}{2}\omega _i} p_{\scriptscriptstyle \mathrm {PG}}(\omega _i\vert 1,0)\! \prod _{l=N_\mathcal {D}+1}^{N_{\mathcal {D}\cup {\Pi }}} \frac{\bar{\lambda }}{2} e^{-\frac{f_l}{2}-\frac{f_l^2}{2}\omega _l} p_{\scriptscriptstyle \mathrm {PG}}(\omega _l\vert 1,0) \\&\qquad \times \exp \left( -\bar{\lambda }|\mathcal {X}|T\right) , \end{aligned} \end{aligned}$$
(24)
where we set the Pólya-Gamma variables of all events \(\varvec{\omega }_{\mathcal {D}}=(\omega _1,\ldots ,\omega _{N_\mathcal {D}})\) to \(\omega _i=0\) if \(z_i\ne 0\). For the latent Poisson process the Pólya–Gamma variables are denoted by \(\varvec{\omega }_{{\Pi }}=(\omega _{N_\mathcal {D}+1},\ldots ,\omega _{N_{\mathcal {D}\cup {\Pi }}})\) . The likelihood representation of the augmented system (24) has a Gaussian form with respect to \(\varvec{f}\) (that is, only linear or quadratic terms of \(\varvec{f}\) appear in the exponential function) and is therefore conditionally conjugate to the GP prior denoted by \(p(\varvec{f})\). Hence, we can implement an efficient Gibbs sampler for the background intensity function.
Employing a Gaussian process prior over \(\varvec{f}\) and a Gamma distributed prior over \(\bar{\lambda }\), one gets from (24) the following conditional posteriors for the kth Gibbs iteration:
$$\begin{aligned}&{\Pi }^{(k)} \ | \ (\bar{\lambda },\varvec{f})^{(k-1)} \sim \mathrm {PP}(\bar{\lambda }(\sigma (-f(\varvec{x}))) \end{aligned}$$
(25a)
$$\begin{aligned}&\forall \ l :N_ \mathcal {D}+1,\ldots ,N_{\mathcal {D}\cup {\Pi }} \ \ \ \ \ \ \ \nonumber \\&\omega _l^{(k)}\ | \ f_l^{(k-1)},{\Pi }^{(k)} \sim p_{\scriptscriptstyle \mathrm {PG}}(1, |f_l|) \end{aligned}$$
(25b)
$$\begin{aligned}&\forall \ i :z_i=0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \nonumber \\&\omega _i^{(k)}\ | \ f_i^{(k-1)},\mathcal {D},Z^{(k)} \sim p_{\scriptscriptstyle \mathrm {PG}}(1, |f_i|) \end{aligned}$$
(25c)
$$\begin{aligned}&\bar{\lambda }^{(k)} \ | \ Z^{(k)},{\Pi }^{(k)} \sim \mathrm {Gamma}\bigg (N_{\mathcal {D}_0\cup {\Pi }} +\alpha _0,|\mathcal {X}||\mathcal {T}|+\beta _0\bigg ) \end{aligned}$$
(25d)
$$\begin{aligned}&\varvec{f}^{(k)} \ | \ \mathcal {D}, (\varvec{\omega }_{\mathcal {D}},{\Pi },\varvec{\omega }_{{\Pi }},Z)^{(k)}\nonumber \\&\quad \sim \mathcal {N}\bigg ((\varvec{\Omega }+\varvec{K}^{-1})^{-1}\varvec{u}, (\varvec{\Omega }+\varvec{K}^{-1})^{-1}\bigg ) \end{aligned}$$
(25e)
where \(\varvec{f}=(\varvec{f}_{\mathcal {D}},\varvec{f}_{\Pi }) \in \mathbb {R}^{N_{\mathcal {D}\cup {\Pi }}}\) is the Gaussian process at the data locations \(\mathcal {D}\) and \({\Pi }\); and PP(\(\cdot \)) denotes an inhomogeneous Poisson process with intensity \(\bar{\lambda }(\sigma (-f(\varvec{x})))\); \(\varvec{\Omega }\) is a diagonal matrix with \((\varvec{\omega }_{\mathcal {D}},\varvec{\omega }_{{\Pi }})\) as diagonal entries. \(\varvec{K}\in \mathbb {R}^{N_{\mathcal {D}\cup {\Pi }} \times N_{\mathcal {D}\cup {\Pi }}}\) is the covariance matrix of the Gaussian process prior at positions \(\mathcal {D}\) and \({\Pi }^{(k)}\). It can be shown that, the vector \(\varvec{u}\) is 1/2 for all entries in \(\mathcal {D}_0\), zero for all entries of the remaining data \(\mathcal {D}\backslash \mathcal {D}_0\), and \(-1/2\) for the corresponding entries of \({\Pi }\). Gamma\((\cdot )\) is a Gamma distribution, where the Gamma prior has shape and rate parameters \(\alpha _0,\beta _0\). We used \(e^{-\frac{c^2}{2}\omega } p_{\scriptscriptstyle \mathrm {PG}}(\omega \vert 1,0)\propto p_{\scriptscriptstyle \mathrm {PG}}(\omega \vert 1,c)\) due to the definition of a tilted Pólya–Gamma density (34) as given in (Polson et al. 2013), see Appendix 2. Note that one does not need an explicit form of the Pólya–Gamma density for our inference approach since it is sampling based. In other words, we only need an efficient way to sample from the tilted \(p_{\scriptscriptstyle \mathrm {PG}}\) density (34) which was provided by Windle et al. (2014); Polson et al. (2013). Several \(p_{\scriptscriptstyle \mathrm {PG}}\) samplers are freely available for different computer languages.
In summary, we first introduced a latent Poisson process \({\Pi }\) to render the inference problem of f tractable. The additional Pólya–Gamma augmentation allows us to sample f rejection free, given samples of the augmented sets \({\Pi },\varvec{\omega }_\mathcal {D}, \varvec{\omega }_{\Pi }\). A detailed step-by-step derivation of the conditional distributions is given in the Appendix 3.
Hyperparameters. The Gaussian process covariance kernel given in (12) depends on the hyperparameters \(\varvec{\nu }\). Compare Sect. 3. We use exponentially distributed priors on \(p(\nu _i)=p_{\nu _i}\), and we sample \(\varvec{\nu }\) using a standard MH algorithm as there is no closed form for the conditional posterior available. The only terms where \(\varvec{\nu }\) enter are in the Gaussian process prior and hence the relevant terms are
$$\begin{aligned}&\ln p(\varvec{\nu }|\varvec{f},\mathcal {D},{\Pi },\varvec{\omega }_{\mathcal {D}},\varvec{\omega }_{{\Pi }})\nonumber \\&\quad = - \frac{1}{2}\varvec{f}^\top \varvec{K}_{\varvec{\nu }}^{-1}\varvec{f}- \frac{1}{2}\ln \det \varvec{K}_{\varvec{\nu }} + \ln p(\varvec{\nu }) +\mathrm {const.}, \end{aligned}$$
(26)
where \(\varvec{K}_{\varvec{\nu }}\) is the Gaussian process prior covariance matrix depending on \(\varvec{\nu }\) via (12).
Conditional predictive posterior distribution of the background intensity
Given the kth posterior sample \((\bar{\lambda }^{(k)},\varvec{f}^{(k)},\varvec{\nu }^{(k)})\), the background intensity \(\mu (\varvec{x}^*)^{(k)}\) at any set of positions \(\{\varvec{x}_i^*\} \in \mathcal {X}\) (predictive conditional posterior) can be obtained in the following way, see (13d). Conditioned on \(\varvec{f}^{(k)}\) and hyperparameters \(\varvec{\nu }^{(k)}\) the latent function values \(\varvec{f}^*\) can be sampled via the conditional prior \(p(\varvec{f}^*|\varvec{f}^{(k)},\varvec{\nu }^{(k)})\) using (43) with covariance function given in (12) (Williams and Rasmussen 2006). Using (11) one gets \(\mu (\varvec{x}^*)^{(k)}=\bar{\lambda }^{(k)} \sigma (\varvec{f}^*)\).
Inference for the parameters of the triggering function
Given an instance of a branching structure Z, the likelihood function in (17) factorises in terms involving \(\mu \) and terms involving \(\varvec{\theta }_\varphi \). The relevant terms related to \(\varvec{\theta }_\varphi \) are
$$\begin{aligned} \begin{aligned} p( \mathcal {D}\vert Z ,\varvec{\theta }_\varphi ) =&\prod _{i=1:z_i\ne 0}^{N_\mathcal {D}} \varphi (t_i-t_{z_i},\varvec{x}_i-\varvec{x}_{z_i}|m_{z_i},\varvec{\theta }_\varphi ) \\&\times \prod _{i=1}^{N_{\mathcal {D}}} \exp \bigg (-\int _{\mathcal {T}_i}\int _{\mathcal {X}} \varphi (t-t_i,\varvec{x}-\varvec{x}_i | m_i,\varvec{\theta }_\varphi ) \mathop {}\!\mathrm {d}\varvec{x}\mathop {}\!\mathrm {d}t\bigg ). \end{aligned} \end{aligned}$$
(27)
The conditional posterior \(p(\varvec{\theta }_\varphi \vert \mathcal {D},Z)\propto p( \mathcal {D}\vert Z, \varvec{\theta }_\varphi )p(\varvec{\theta }_\varphi )\) with prior \(p(\varvec{\theta }_\varphi )\) has no closed form. The dimension of \(\varvec{\theta }_\varphi \) is usually small (\(\le 7\)). We employ MH sampling (Hastings 1970), which can be considered a nested step within the overall Gibbs sampler. We use a random walk MH where proposals are generated by a Gaussian in log space. The acceptance probability of \(\varvec{\theta }_\varphi ^{(k)}\) based on (27) is given by
$$\begin{aligned} p_\mathrm{accept} = \min \left\{ 1, \frac{p(\mathcal {D}\vert Z^{(k)},\varvec{\theta }_\varphi ^\mathrm{proposed})p(\varvec{\theta }_\varphi ^\mathrm{proposed})}{p(\mathcal {D}\vert Z^{(k)},\varvec{\theta }_\varphi ^{(k-1)})p(\varvec{\theta }_\varphi ^{(k-1)})}\right\} . \end{aligned}$$
(28)
We take 10 proposals before we return to the overall Gibbs sampler, that is, to step in Sect. 4.1.