Inference for a class of partially observed point process models
Authors
- First Online:
- Received:
- Revised:
DOI: 10.1007/s10463-012-0375-8
- Cite this article as:
- Martin, J.S., Jasra, A. & McCoy, E. Ann Inst Stat Math (2013) 65: 413. doi:10.1007/s10463-012-0375-8
- 1 Citations
- 218 Views
Abstract
This paper presents a simulation-based framework for sequential inference from partially and discretely observed point process models with static parameters. Taking on a Bayesian perspective for the static parameters, we build upon sequential Monte Carlo methods, investigating the problems of performing sequential filtering and smoothing in complex examples, where current methods often fail. We consider various approaches for approximating posterior distributions using SMC. Our approaches, with some theoretical discussion are illustrated on a doubly stochastic point process applied in the context of finance.
Keywords
Point processesSequential Monte CarloIntensity estimation1 Introduction
Partially observed point processes provide a rich class of models to describe real data. For example, such models are used for stochastic volatility (Barndorff-Nielsen and Shephard 2001) in finance, descriptions of queuing data in operations research (Fearnhead 2004), important seismological models (Daley and Vere-Jones 1988) and applications in nuclear physics (Snyder and Miller 1998). For complex dynamic models, i.e., when data arrive sequentially in time, studies date back to at least Snyder (1972). However, fitting Bayesian models requires sequential Monte Carlo (SMC) (e.g. Doucet et al. 2001) and Markov chain Monte Carlo (MCMC) methods. The main developments in this field include the work of Centanni and Minozzo (2006a, b), Green (1995), Del Moral et al. (2006, 2007), Doucet et al. (2006), Roberts et al. (2004), Rydberg and Shephard (2000), see also Whiteley et al. (2011). As we describe below, the SMC methodology may fail in some scenarios and we will describe methodology to deal with the problems that will be outlined.
One of the first works applying computational methods to PP models was Rydberg and Shephard (2000). They focus upon a Cox model where the unobserved PP parameterizes the intensity of the observations. Rydberg and Shephard (2000) used the auxiliary particle filter (Pitt and Shephard 1997) to simulate from the posterior density of the intensity at a given time point. This was superseded by Centanni and Minozzo (2006a, b), which allows one to infer the intensity at any given time, up to the current observation. Centanni and Minozzo (2006a, b) perform an MCMC-type filtering algorithm, estimating static parameters using stochastic EM. The methodology cannot easily be adapted to the case where the static parameters are given a prior distribution. In addition, the theoretical validity of the approach has not been established, this is verified in Proposition 1.
- 1.
the sequence of distributions,
- 2.
the mechanism by which particles are propagated.
Two solutions are proposed. The first is to saturate the state-space, it is supposed that the observation interval, \([0,T]\), of the PP is known a priori. The sequence of target distributions is then defined on the whole interval and one sequentially introduces likelihood terms, i.e. the sequence of target distributions is initially the prior distribution with the unobserved process allowed to lie on \([0,T]\). As the likelihood can be written as a product of \(r_T\) terms, each subsequent target (up-to proportionality) is the old one, multiplied by the density of the next data-point in the sequence. This idea circumvents the problem of extending the space, at an extra computational cost. Inference for the original density of interest can be achieved by importance sampling (IS). This approach cannot be used if \(T\) is unknown. In the second approach, entitled data-point tempering, the sequence of target distributions is defined by sequentially introducing likelihood terms, as above, except that the hidden process can only lie on \([0,t_n]\). This is achieved as follows: given that the PP has been sampled on \([0,t_n]\) the target is extended onto \([0,t_{n+1}]\) by sampling the missing part of the PP. Then one introduces likelihood terms into the target that correspond to the data (as in Chopin 2002). Once all of the data have been introduced, the target density is (1). It should be noted that neither of the methods is online, but some simple fixes are detailed.
Section 2 introduces a doubly stochastic PP model from finance which serves as a running example. In Sect. 3, the ideas of Centanni and Minozzo (2006a, b) are discussed; it is established that the method is theoretically valid under some assumptions. The difficulty of extending the state space is also demonstrated. In Sect. 4, we introduce our SMC methods. In Sect. 5 our methods are illustrated on the running example. In Sect. 6, we detail extensions to our work.
Some notations are introduced. We consider a sequence of probability measures \(\{\varpi _n\}_{1\le n \le m^*}\) on spaces \(\{(G_n,\mathcal G _n)\}_{1\le n\le m^*}\), with dominating \(\sigma \)-finite measures. Bounded and measurable functions on \(G_n\), \(f_n:G_n\rightarrow \mathbb R \), are written \(\mathcal B _b(G_n)\) and \(\Vert f_n\Vert =\sup _{x\in G_n}|f_n(x)|\). \(\varpi _n\) will refer to either the probability measure \(\varpi _n({\text{ d}}x)\) or the density \(\varpi _n(x)\).
2 Model
The model we use to illustrate our ideas is from statistical finance. An important type of financial data is ultra high-frequency data which consist of the irregularly spaced times of financial transactions and their corresponding monetary value. Standard models for the fitting of such data have relied upon stochastic differential equations driven by Wiener dynamics, a debatable assumption due to the continuity of the sample paths. As noted in Centanni and Minozzo (2006b), it is more appropriate to model the data as a Cox process. Due to the high frequency of the data, it is important to be able to perform sequential/on-line inference. Data are observed in \([0,T]\). In the context of finance, the assumption that \(T\) be fixed is entirely reasonable. For example, when the model is used in the context of equities, the model is run for the trading day; indeed due to different (deterministic) patterns in financial trading, it is likely that the fixed parameters below are varied according to the day.
It is of interest to compute expectations w.r.t. the \(\{\pi _n\}_{1\le n\le m^*}\), and this is possible, using the SMC methods below (Sect. 3.1). However, such algorithms are not of fixed computational cost; the sequence of spaces over which the \(\{\pi _n\}_{1\le n\le m^*}\) lie is increasing. These methods can also be used to draw inference from the marginal posterior of the process, over \((t_{n-1},t_n]\); such algorithms can be designed to be of fixed computational complexity, for example by constraining any simulation to a fixed-size state-space. This idea is considered further in Sect. 4.3.
3 Previous approaches
The above algorithm can be justified, theoretically, using the Poisson equation (e.g. Glynn and Meyn 1996) and induction arguments. Below the assumption (A) is made; see appendix for the assumption (A) as well as the proof. The expectation below is w.r.t. the simulated process discussed above, given the observed data.
Proposition 1
This result helps to establish the theoretical validity of the method in Centanni and Minozzo (2006a), which to our knowledge had not been established in that paper or elsewhere. In addition, it allows us to understand where and when the method may be of use; this is discussed in Sect. 3.2.
3.1 SMC methods
SMC samplers aim to approximate a sequence of related probability measures \(\{\pi _n\}_{0\le n \le m^*}\) defined upon a common space \((E,\mathcal E )\). Note that \(m^*>1\) can depend upon the data and may not be known prior to simulation. For partially observed PPs the probability measures are defined upon nested state-spaces; this case can be similarly handled with minor modification. SMC samplers introduce a sequence of auxiliary probability measures \(\{\widetilde{\pi }_n\}_{0\le n \le m^*}\) on state-spaces of increasing dimension \((E_{[0,n]}:=E_0\times \cdots \times E_n,\mathcal E _{[0,n]}:=\mathcal E _0\otimes \cdots \otimes \mathcal E _n)\), such that they admit the \(\{\pi _n\}_{0\le n\le m^*}\) as marginals.
The ESS in Algorithm 1 refers to the effective sample size (ESS) (Liu 2001). This measures the weight degeneracy of the algorithm; if the ESS is close to \(N\), then this indicates that all of the samples are approximately independent. This is a standard metric by which to assess the performance of the algorithm. The resampling method used throughout the paper is systematic resampling.
One generic approach is to set \(K_n\) as an MCMC kernel of invariant distribution \(\pi _n\) and \(L_{n-1}\) as the reversal kernel \( L_{n-1}(x_n,x_{n-1}) = \pi _n(x_{n-1})K_n(x_{n-1},x_n)/\pi _n(x_n) \) which we term the standard reversal kernel. One can iterate the MCMC kernels, by which we use the positive integer \(M\) to denote the number of iterates. It is also possible to apply the algorithm when \(K_n\) is a mixture of kernels; see Del Moral et al. (2006) for details.
Algorithm 1 A generic SMC sampler. Note that \(T(N)\) is termed a threshold function such that \(1\le T(N) \le N\) and ESS is the effective sample size
3.1.1 Nested spaces
- Birth. A new jump is sampled uniformly in \([\phi _{k_{t_{n-1}}},t_n]\) and a new mark from the prior. The incremental weight is$$\begin{aligned} W_n(\bar{x}_{n-1:n},\mu ,\sigma ) \propto \frac{\pi _n(\bar{x}_n,\mu ,\sigma |\bar{y}_n)(t_n-\phi _{k_{t_{n-1}}})}{\pi _{n-1}(\bar{x}_{n-1},\mu ,\sigma |\bar{y}_n)\mathsf p (\zeta _{k_{t_n}})}. \end{aligned}$$
- Extend. A new jump is generated according to a Markov kernel that corresponds to the random walk:with \(Z\sim \mathcal N (0,1)\), \(\vartheta >0\). The new mark is sampled from the prior. The backward kernel and incremental weight are discussed in Del Moral et al. (2007), Sect. 4.3.$$\begin{aligned} \log \left\{ \frac{\phi _{k_{t_n}} - \phi _{k_{t_n}-1}}{t_n-\phi _{k_{t_n}}}\right\} = \vartheta Z + \log \left\{ \frac{\phi _{k_{t_{n-1}}} - \phi _{k_{t_{n-1}}-1}}{t_n-\phi _{k_{t_{n-1}}}}\right\} \end{aligned}$$
In addition to the above steps an MCMC sweep is included after the decision of whether or not to resample the particles is taken (see step 1. of Algorithm 1): an MCMC kernel of invariant measure \(\pi _n\) is applied. The kernel is much the same as in Green (1995).
3.1.2 Simulation experiment
We applied the benchmark sampler, as detailed above, to some synthetic data in order to monitor the performance of the algorithm. Standard practice in the reporting of financial data is to represent the time of a trade as a positive real number, with the integer part representing the number of days passed since January 1st 1900 and the non-integer part representing the fraction of 24 h that has passed during that day; thus, 1 min corresponds to an interval of length 1/1,440. Therefore we use a synthetic data set with intensity of order of magnitude \(10^3\). The ticks \(\omega _i\) were generated from a specified intensity process \(\left\{ \lambda _t\right\} \) that varied smoothly between three levels of constant intensity at \(\lambda =6{,}000, \lambda =2{,}000\) and \(\lambda =4{,}000\). The log returns \(\xi _i\) were sampled from the Cauchy-distribution, location \(\mu =0\) and scale \(\sigma =2.5\times 10^{-4}\). The entire data set was of size \(r_T=3{,}206\), \([0,T]=[0,0.9]\) with \(t_n= n *0.003\). The intensity from which they were generated had constant levels at 6,000 in the interval [0.05, 0.18]; at 4,000 in the interval [0.51, 0.68]; and at 2,000 in the intervals [0.28, 0.42] and [0.78, 0.90].
The sampler was implemented with all combinations \(\{(M,N)\}\) for \(N\in \{100, 1{,}000\}\) and \(M\in \{1, 5, 20\}\), resampling whenever the effective sample size fell below \(N/2\) (recall \(N\) is the number of particles and \(M\) the MCMC iterations). When performing statistical inference, the intensity (3) used parameters \(\gamma =0.001, \nu =150\) and \(s=20\).
3.2 Discussion
We have reviewed two existing techniques for the analysis of partially observed PPs. It should be noted that there are other methods, for example in Varini (2007). In that paper, the intensity has a finite number of functional forms and the uncertainty is related to the type of form at each inference time \(t_n\).
3.3 Possible solutions to the problems of extending the state-space
An important remark associated with the simulations in Sect. 3.1.2 is that it cannot be expected that simply increasing the number of particles will necessarily a significantly better estimation procedure. The algorithm completely crashes to a single particle and it seems that naively increasing computation will not improve the simulations.
As discussed above, the inherent difficulty of sampling from the given sequence of distributions is that of extending the state-space. It is known that conditional on all parameters except the final jump, the optimal importance distribution is the full conditional density (Del Moral et al. 2006). In practice, for many problems it is either not possible to sample from this density or to evaluate it exactly (which is required). In the case that it is possible to sample from the full conditional, but the normalizing constant is unknown, the normalizing constant problem can be dealt with via the random weight idea (Rousset and Doucet 2006). In the context of this problem we found that the simulation from the full conditional density of \(\phi _{k_{t_n}}\) was difficult, to the extent that sensible rejection algorithms and approximations for the random weight technique were extremely poor.
Another solution, in Del Moral et al. (2007), consists of stopping the algorithm when the ESS drops and using an additional SMC sampler to facilitate the extension of the state-space. However, in this example, the ESS is so low, that it cannot be expected to help. Due to above discussion, it is clear that a new technique is required to sample from the sequence of distributions; two ideas are presented below. One idea, in the context of estimating static parameters, that could be adopted is SMC\(^2\) (Chopin et al. 2012) which has appeared after the first versions of this article.
4 Proposed methods
In the following section, two approaches are presented to deal with the problems in Sect. 3.1.2. First, a state-space saturation approach, where sampling of PP trajectories is performed over a state space corresponding to a fixed observation interval. Second, a data-point tempering approach. In this approach, as the time parameter increases, the (artificial) target in the new region is simply the prior and the data are then sequentially added to the likelihood, softening the state-space extension problem. Both of these procedures use the basic structure of Algorithm 1, with some refinements, that are mentioned in the text. As for the procedure in Algorithm 1 we add dynamic resampling steps; when MCMC kernels are used, one can resample before sampling—see Del Moral et al. (2006) for details.
4.1 Saturating the state-space
A simple idea, which has been used in the context of reversible jump, is to saturate the state-space. The idea relies upon knowing the observation period of the PP (\([0,T]\)) a priori to the simulation. This is realistic in a variety of applications. For example, in Sect. 2, often we may only be interested in performing inference for a day of trading and thus can set \([0,T]\).
4.2 Data-point tempering
A simple solution to the state-space extension problem, which allows data to be incorporated sequentially, albeit not being of fixed computational complexity is as follows. When the time parameter increases, the new part of the process is simulated according to the prior. Then each new data point is added to the likelihood in a sequential manner. In other words if there are \(n\) data points, then there are \(m^* = n + \widetilde{m}\) time-steps of the algorithm.
The potential advantage of this idea is that, when extending the state-space, there is no extra data, to potentially complicate the likelihood. Thus, it is expected that if the prior does not propose a significant number of new jumps that the incremental weights should be of relatively low variance. The subsequent steps, when considering the jumps in \([t_n,t_{n+1})\) are performed on a common state-space and hence should not be subject to as substantial variability as when the state-space changes. This idea could also be adapted to the case that the likelihood on the new interval is tempered instead (e.g. Jasra et al. 2007).
Proposition 2
The upper-bound does not grow with the number of data. That is, by increasing the computational complexity linearly in the number of data, one has an algorithm whose error does not grow as more data (and regions) are added. This is similar to the observation of Beskos et al. (2011), when increasing the dimension of the target density. We note that the result is derived under exceptionally strong assumptions. In general, when one considers \({r_{t_{_{1}}}}\) growing, one requires sharper tools than the Dobrushin coefficients used here (e.g. Eberle and Marinelli 2012); this is beyond the scope of the current article and our result above is illustrative (and hence potentially over-optimistic).
4.3 Online implementation
A key characteristic that has not yet been addressed is the fact that each approach has a computational complexity that is increasing with time. In a procedure that would otherwise be well suited to providing online inference, this is an unattractive feature. A large contribution to this increasing computational budget derives from the MCMC sweeps at the end of each iteration. As the space over which the invariant MCMC kernel is being applied is increased, so does the expense of the algorithm. An improvement to the computational demand of the samplers can therefore be made by keeping the space over which the MCMC kernel is applied constant. The reduced computational complexity (RCC) alternative to each of the samplers is also designed by amending the algorithms such that, at time \(t_n\), the MCMC sweep operates over, at most, 20 changepoints, i.e. over the interval \([\phi _{k_{t_n}-19},t_n)\). Due to the well-known path degeneracy problem in SMC (see Kantas et al. 2011), the estimates will be poor approximations of the true values, when including static parameters and extending the space of the point process for a long time. We note, at least for our application, it is reasonable to consider \(T\) fixed and thus, this is less problematic.
5 The finance problem revisited
We now return to the example from Sect. 2 and the settings as in Sect. 3.1.2.
5.1 Simulated data
The saturated and tempered samplers, as well as their RCC alternatives, were implemented using the simulated data set (in Sect. 3.1.2), in order to compare their respective performances against the benchmark sampler and to compare the accuracy of the resulting intensity estimates against an observed intensity process. All of the alternative samplers were implemented under the same conditions, using the algorithm and model parameters as described for the implementation of the benchmark sampler. All results are averaged over 10 runs of the algorithm.
Table showing the resampling rates of each of the three SMC samplers and their reduced computational complexity alternatives, for the six algorithm parameterizations that were tested
| M = 1 (%) | M = 5 (%) | M = 20 (%) | |||
---|---|---|---|---|---|---|
\(N=100\) | \( N=1{,}000\) | \(N=100\) | \(N=1{,}000\) | \(N=100\) | \(N=1{,}000\) | |
Benchmark | 31.3 | 52.0 | 42.3 | 94.4 | 74.0 | 99.7 |
Benchmark-RCC | 37.6 | 88.1 | 69.0 | 99.7 | 99.4 | 99.7 |
Saturated | 21.0 | 21.3 | 19.7 | 20.1 | 18.2 | 17.6 |
Saturated-RCC | 20.7 | 20.7 | 18.5 | 18.8 | 15.4 | 15.4 |
Tempered | 2.0 | 2.0 | 1.9 | 1.9 | 1.7 | 1.7 |
Tempered-RCC | 2.0 | 2.0 | 1.7 | 1.8 | 1.4 | 1.4 |
Table showing the minimum ESS encountered during implementation by each of the three SMC samplers and their reduced computational complexity alternatives, for the six algorithm parameterizations that were tested
| M = 1 | M = 5 | M = 20 | |||
---|---|---|---|---|---|---|
\(N=100\) | \( N=1{,}000\) | \(N=100\) | \(N=1{,}000\) | \(N=100\) | \(N=1{,}000\) | |
Benchmark | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
Benchmark-RCC | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
Saturated | 38.1 | 410.2 | 38.6 | 397.0 | 38.6 | 398.9 |
Saturated-RCC | 38.5 | 401.2 | 40.6 | 394.4 | 43.0 | 425.9 |
Tempered | 47.6 | 484.7 | 47.7 | 475.5 | 47.9 | 483.4 |
Tempered-RCC | 47.8 | 475.7 | 48.4 | 481.7 | 48.3 | 486.6 |
Table showing the processing time, in seconds, for each of the three samplers and their reduced computational complexity alternatives, for the six algorithm parameterizations that were tested
| M = 1 | M = 5 | M = 20 | |||
---|---|---|---|---|---|---|
\(N=100\) | \( N=1{,}000\) | \(N=100\) | \(N=1{,}000\) | \(N=100\) | \(N=1{,}000\) | |
Benchmark | 612.9 | 9,689.1 | 2,849.7 | 45,690.4 | 13,352.1 | 144,621.3 |
Benchmark-RCC | 449.0 | 7,910.9 | 1,132.7 | 10,657.6 | 3,106.2 | 31,208.5 |
Saturated | 1,125.3 | 10,667.8 | 3,234.3 | 39,061.1 | 15,381.9 | 141,817.3 |
Saturated-RCC | 637.5 | 6,215.2 | 1,200.7 | 11,412.6 | 4,391.9 | 47,662.8 |
Tempered | 1,160.2 | 10,633.4 | 3,138.4 | 38,679.6 | 14,086.7 | 130,899.1 |
Tempered-RCC | 666.0 | 6,424.4 | 1,156.3 | 11,209.1 | 3,231.3 | 34,795.3 |
We use the posterior medians to report intensities. Since we have access to a ‘true’ intensity process, the accuracy of these estimated intensity process is measured using the root mean square error (RMSE). Table 4 presents the RMSEs of the intensity estimates (given the data up to \(t_n\), averaged over each \(t_n\)) and Table 5 presents the RMSEs of the smoothed (conditional upon the entire data set) intensity estimates resulting from each of the three samplers and their RCC alternatives. The most important result to note is the performance of the saturated and tempered samplers in comparison with the benchmark sampler. As can be seen in terms of accuracy for intensity estimates, the two proposed alterations to the sampler improve the performance consistently and significantly. Looking at the resampling rates and processing times, in Tables 1 and 3, respectively, we can see that, as expected, although the tempered sampler resampled the particles significantly less than the benchmark sampler, the individual incorporation of each data point resulted in a greater computational cost. These two aspects of the benchmark and tempered samplers appear to have countered each other, resulting in their processing times being largely similar.
Table showing the root mean square error of the intensity
| M = 1 | M = 5 | M = 20 | |||
---|---|---|---|---|---|---|
\(N=100\) | \( N=1{,}000\) | \(N=100\) | \(N=1{,}000\) | \(N=100\) | \(N=1{,}000\) | |
Benchmark | 688.561 | 1,116.639 | 620.432 | 1,942.992 | 1,330.232 | 1,501.263 |
Benchmark-RCC | 676.932 | 2,026.956 | 880.824 | 2,247.313 | 1,472.126 | 1,264.533 |
Saturated | 242.834 | 192.580 | 228.390 | 193.778 | 237.315 | 198.223 |
Saturated-RCC | 229.449 | 189.279 | 224.692 | 193.379 | 225.592 | 194.623 |
Tempered | 254.396 | 196.928 | 247.754 | 201.681 | 248.367 | 202.501 |
Tempered-RCC | 256.012 | 191.407 | 227.241 | 197.043 | 230.805 | 200.227 |
Table showing the smoothed root mean square error of the intensity
| M = 1 | M = 5 | M = 20 | |||
---|---|---|---|---|---|---|
\(N=100\) | \( N=1{,}000\) | \(N=100\) | \(N=1{,}000\) | \(N=100\) | \(N=1{,}000\) | |
Benchmark | 768.702 | 670.656 | 495.019 | 627.909 | 489.243 | 571.107 |
Benchmark-RCC | 698.640 | 1,034.890 | 572.794 | 572.841 | 535.004 | 599.031 |
Saturated | 360.794 | 264.331 | 296.953 | 114.064 | 153.444 | 89.397 |
Saturated-RCC | 478.871 | 265.477 | 405.767 | 266.980 | 468.853 | 205.243 |
Tempered | 350.015 | 170.321 | 271.712 | 128.078 | 157.709 | 81.666 |
Tempered-RCC | 485.825 | 249.529 | 475.348 | 193.898 | 514.107 | 180.914 |
Finally, using the simulated data, we consider the performance of the samplers when limiting the space over which the invariant MCMC kernels are applied, i.e. the RCC alternatives. As can be seen from Table 4, the RCC alteration does not sacrifice any accuracy in the estimates of the intensity (given the data up to each time \(t_n\)); however, it can be seen from Table 5 that the accuracy of the smoothed intensity estimates is rather poor. This is to be expected, due to path degeneracy; we note that one cannot estimate static parameters with the RCC approach unless the time window \(T\) is quite small.
5.2 Real data
All three samplers were also tested on real financial data, with the RCC alternatives also being used to generate intensity estimates, given the data up to \(t_n\): the share price of ARM Holdings, plc., traded on the LSE was used. The entire data set was of size \(r_T=1819\), \([0,T]=[0,0.3]\) (represents 3/10 of a trading day, that is, 3/10 of 24 hours; the first trade is just after 9 am and the last around 16:15.) with \(t_n= n*0.001\). Genuine financial data are likely to correspond to a more volatile latent intensity process than that which was used to generate the synthetic data set, and so the parameterization of the target posterior should be chosen such that large jumps in the intensity process are possible, and such that the intensity may also revert quickly to a lower intensity level. Hence, we specify: \(\{\gamma ,\nu ,s\}=\{0.001,500,250\}\). Each of the samplers were run using \(N=1{,}000\) particles, applying \(M=5\) MCMC sweeps at each iteration, whilst the resampling rates and the minimum ESS obtained for each procedure were monitored to ensure that the algorithms did not collapse.
Table showing the root mean square prediction errors for the intensity estimates [given data up to time \(t_n\) and entire data (smoothed)] given by each of the three samplers for the parameter values \(N=1{,}000\), \(M=5\)
Data \(t_n\) RMSPEs | Smoothed RMSPEs | Processing times (s) | Resampling rates (%) | |
---|---|---|---|---|
Saturated | 2.18876 | 2.13479 | 4,064.5 | 39.5 |
Saturated-RCC | 2.19112 | – | 2,193.1 | 39.9 |
Tempered | 2.34671 | 2.11468 | 4,605.5 | 19.8 |
Tempered-RCC | 2.42776 | – | 2,237.3 | 19.9 |
Table 6 presents the RMSPEs for the intensity estimates resulting from the samplers and the RCC alternatives. It was observed that, in calculating the RMSPEs for lag indices \(i=1,\ldots ,100\) using each sampler, both the saturated and the tempered samplers displayed the smallest error at \(i=1\), i.e. their respective one-step-ahead predictions were more accurate than those made for lags up to 2.64 h (each observation interval corresponds to 0.0264 days = 1.584 min).
The RCC samplers provide significant computational savings and do not seem to degrade substantially, w.r.t. the error criteria. Again, we remark that, in general, one should not trust the estimates of the RCC, but as seen here, they can provide a guideline for the intensity values.
6 Summary
In this paper, we have considered SMC simulation for partially observed point processes and implemented them for a particular doubly stochastic PP. Two solutions were given, one based upon saturating the state-space, which is suitable in a wide variety of applications and data-point tempering which can be used in sequential problems. We also discussed RCC versions of these algorithms, which reduce computation, but will be subject to the path degeneracy problem when including static parameters and considering the smoothing distribution. We saw that the methods can be successful, in terms of weight degeneracy versus the benchmark approach detailed in Del Moral et al. (2007). In addition, for real data it was observed that predictions using the RCC could be reasonable (relative to the normal versions of the algorithms), but caution on using these estimates should be used.
The methodology we have presented is not online. As we have seen, when one modifies the approaches to have fixed computational complexity, the path degeneracy problem occurs and one cannot deal with scenario with static parameters. In this case, we are working with Dr. N. Whiteley on a technique based upon fixed window filtering. This is an on-line algorithm which allows data to be incorporated as they arrive with computational cost which is non-increasing over time, but is biased. The approach involves sampling from a sequence of distributions which are constructed such that, at time \(t_n\), previously sampled events in \([0,t_{n-\ell }]\) can be discarded. In order to be exact (in the sense of targeting the true posterior distributions), this scheme would involve point-wise evaluation of an intractable density. We are working on a sensible approximation of this density, at the cost of introducing a small bias.
7 Appendix
7.1 Proposition 1
The following assumption is made.
Assumption (A). There exist an \(\epsilon _1\in (0,1)\) and probability measure \(\kappa _1\) on \(\bar{E}_1\) such that for any \(\bar{x}_1\in \bar{E}_1\)
For any \(n\ge 2\), there exist an \(\epsilon _n\in (0,1)\) and probability measure \(\kappa _n\) on \(\bar{E}_n\setminus \bar{E}_{n-1}\) such that for any \(\widetilde{x}_n\in \bar{E}_n\) and any collection of points \((\chi _{n-1}^{(1)},\dots ,\chi _{n-1}^{(N)})\in \bar{E}_{n-1}^N\)
For any \(n\ge 2\)
where \(g_n\) is as in (11).
Proof 1
The proof is inductive on \(n\). Some details are omitted as the proof is quite similar to the control of adaptive MCMC chains, e.g. Andrieu et al. (2011). It should be noted the proof for this algorithm differs as the kernel possesses an invariant measure that does not change with the iteration \(i\in \{1,\dots ,N\}\).
7.2 Proof of Proposition 2
Acknowledgments
We thank Nick Whiteley for conversations on this work. The first author acknowledges the support of an EPSRC grant. The second author was supported by an MOE grant. We thank two referees and an associate editor for their comments, which have vastly improved the article.