Abstract
Integro-difference equations (IDEs) provide a flexible framework for dynamic modeling of spatio-temporal data. The choice of kernel in an IDE model relates directly to the underlying physical process modeled, and it can affect model fit and predictive accuracy. We introduce Bayesian non-parametric methods to the IDE literature as a means to allow flexibility in modeling the kernel. We propose a mixture of normal distributions for the IDE kernel, built from a spatial Dirichlet process for the mixing distribution, which can model kernels with shapes that change with location. This allows the IDE model to capture non-stationarity with respect to location and to reflect a changing physical process across the domain. We address computational concerns for inference that leverage the use of Hermite polynomials as a basis for the representation of the process and the IDE kernel, and incorporate Hamiltonian Markov chain Monte Carlo steps in the posterior simulation method. An example with synthetic data demonstrates that the model can successfully capture location-dependent dynamics. Moreover, using a data set of ozone pressure, we show that the spatial Dirichlet process mixture model outperforms several alternative models for the IDE kernel, including the state of the art in the IDE literature, that is, a Gaussian kernel with location-dependent parameters.
Similar content being viewed by others
References
Brown, P.E., Roberts, G.O., Kåresen, K.F., Tonellato, S.: Blur-generated non-separable space-time models. J. R. Stat. Soc. 62, 847–860 (2000)
Chen, T., Fox, E.B., Guestrin, C.: Stochastic gradient Hamiltonian Monte Carlo. In: ICML, pp. 1683–1691 (2014)
Cressie, N.: Statistics for Spatial Data. Wiley, New York (1993)
Cressie, N., Huang, H.-C.: Classes of nonseparable, spatio-temporal stationary covariance functions. J. Am. Stat. Assoc. 94, 1330–1339 (1999)
Cressie, N., Wikle, C.K.: Statistics for Spatio-Temporal Data. Wiley, New York (2011)
Frühwirth-Schnatter, S.: Data augmentation and dynamic linear models. J. Time Ser. Anal. 15, 183–202 (1994)
Gelfand, A.E., Kottas, A., MacEachern, S.N.: Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Am. Stat. Assoc. 100, 1021–1035 (2005)
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–472 (1992)
Geweke, J. et al.: Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments, vol. 196. Federal Reserve Bank of Minneapolis, Research Department Minneapolis(1991)
Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. 73, 123–214 (2011)
Gneiting, T., Stanberry, L.I., Grimit, E.P., Held, L., Johnson, N.A.: Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. Test 17, 211–235 (2008)
Heine, V.: Models for two-dimensional stationary stochastic processes. Biometrika 42, 170–178 (1955)
Higdon, D.: A process-convolution approach to modelling temperatures in the North Atlantic Ocean. Environ. Ecol. Stat. 5, 173–190 (1998)
Hooten, M.B., Wikle, C.K.: A hierarchical Bayesian non-linear spatio-temporal model for the spread of invasive species with application to the Eurasian collared-dove. Environ. Ecol. Stat. 15, 59–70 (2008)
Jones, R.H., Zhang, Y.: Models for continuous stationary space-time processes. In: Gregoire, T.G., Brillinger, D.R., Diggle, P.J., Russek-Cohen, E., Warren, W.G., Wolfinger, R.D. (eds.) Modelling Longitudinal and Spatially Correlated Data, pp. 289–298. Springer, New York (1997)
Kot, M., Lewis, M.A., van den Driessche, P.: Dispersal data and the spread of invading organisms. Ecology 77, 2027–2042 (1996)
Kottas, A., Duan, J.A., Gelfand, A.E.: Modeling disease incidence data with spatial and spatio temporal Dirichlet process mixtures. Biom. J. 50, 29–42 (2008)
Ma, C.: Nonstationary covariance functions that model space-time interactions. Stat. Probab. Lett. 61, 411–419 (2003)
Neal, R.M.: MCMC using Hamiltonian dynamics. Handb. Markov Chain Mt. Carlo 2, 113–162 (2011)
Neubert, M.G., Kot, M., Lewis, M.A.: Dispersal and pattern formation in a discrete-time predator-prey model. Theor. Popul. Biol. 48, 7–43 (1995)
Nolan, J.: Stable Distributions: models for Heavy-Tailed Data. Birkhauser, New York (2003)
Olver, F.W.: NIST Handbook of Mathematical Functions. Cambridge University Press, Cambridge (2010)
Richardson, R., Kottas, A., Sansó, B.: Flexible Integro-Difference Equation Modeling for Spatio-Temporal Data, To appear in Computational Statistics and Data Analysis (2017)
Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4, 639–650 (1994)
Smith, B.J.: boa: an R package for MCMC output convergence assessment and posterior inference. J. Stat. Softw. 21, 1–37 (2007)
Storvik, G., Frigessi, A., Hirst, D.: Stationary space-time Gaussian fields and their time autoregressive representation. Stat. Model. 2, 139–161 (2002)
Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 681–688 (2011)
West, M., Harrison, J.: Bayesian Forecasting and Dynamic Models, 2nd edn. Springer, New York (1997)
Wikle, C.K.: A kernel-based spectral model for non-Gaussian spatio-temporal processes. Stat. Model. 2, 299–314 (2002)
Wikle, C.K., Cressie, N.: A dimension-reduced approach to space-time Kalman filtering. Biometrika 86, 815–829 (1999)
Wikle, C.K., Hooten, M.B.: A general science-based framework for dynamical spatio-temporal models. Test 19, 417–451 (2010)
Wikle, C.K., Holan, S.H.: Polynomial nonlinear spatio-temporal integro-difference equation models. J. Time Ser. Anal. 32, 339–350 (2011)
Xu, K., Wikle, C.K., Fox, N.I.: A kernel-based spatio-temporal dynamical model for nowcasting weather radar reflectivities. J. Am. Stat. Assoc. 100, 1133–1144 (2005)
Acknowledgements
This research is part of the first author’s Ph.D. dissertation completed at University of California, Santa Cruz. A. Kottas was supported in part by the National Science Foundation under award DMS 1310438. B. Sansó was supported in part by the National Science Foundation under award DMS 1513076. The authors wish to thank an Associate Editor and two reviewers for constructive feedback and for comments that improved the presentation of the material in the paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Stationarity of IDE processes
Brown et al. (2000) show that IDE models are stationary in space when the IDE kernel has parameters which are not spatially varying. The following lemma handles the more general case with spatially varying IDE kernels.
Lemma 1
Consider an IDE process, \(X_t(s)\), with a stationary initial process \(X_0(s)\), and with a kernel that belongs to a location family of distributions. The covariance function of the process is non-stationary with respect to location for all \(t>0\), when the kernel parameters depend on the location s.
Proof
Assume \(\hbox {Cov}[X_{t-1}(s),X_{t-1}(r)]=\rho (|s-r|)\), a stationary covariance function, and that the error process \(\omega _t(s)\) is also stationary with covariance function \(\gamma (|s-r|)\). Then,
Using the transformations \(\eta = u-v\) and \(w=v-s\), and the assumption of a kernel that belongs to a location family, i.e., \(k(u \mid s,\varvec{\theta }_s)=\) \(k(u-s \mid \varvec{\theta }_s)\), the covariance function can be written as
The covariance is a function of s and r, which implies non-stationarity in space. \(\square \)
When the parameter vector does not depend on the location, i.e., \(\varvec{\theta }_s=\varvec{\theta }\), the location s disappears from the covariance. This is consistent with the previous work which shows that Gaussian kernel IDE models are stationary when the parameters do not change with location.
Appendix 2: Hamiltonian Markov chain Monte Carlo
Hamiltonian Markov chain Monte Carlo (HMCMC) involves taking the derivative with respect to unknown parameters of the negative log of the target function, which in this case is the posterior (Neal 2011). Letting \(\varvec{W} = \tau ^2 \varvec{G} \varvec{V} \varvec{G}\), the negative log of the relevant parts of the posterior is
which can be expanded out as
Using matrix calculus, the derivative is
where \(\frac{\partial \varvec{B}_\theta }{\partial \theta _i}\) is a element-wise derivative of \(\varvec{B}_\theta \) with respect to \(\theta _i\), and \(\theta _i\) is the i-th element of the parameter vector, \(\varvec{\theta }\).
Let \(E(\varvec{\theta })\) refer to the negative log posterior. A step size \(\epsilon \) and number of iterations, L, must be defined prior to the algorithm. Latent variables, \(p_i\), are introduced for each parameter as independent normal variables with zero mean and variance \(M_i\). Then, one “leapfrog” step given current iteration \((\varvec{\theta }^b, {{\mathbf {p}}}^b)\) is
The parameters leapfrog L times ending at new proposals for the posterior. The function \(H(\varvec{\theta },p)\) is defined to be the sum of \(E(\varvec{\theta })\) and \(K(p) = \frac{1}{2}\sum \frac{p_i^2}{M_i}\). The new values \((\varvec{\theta }^{(b+1)}, {{\mathbf {p}}}^{(b+1)})\) are accepted with probability \(\min (1,\exp \left( H(\varvec{\theta }^{(b+1)}, {{\mathbf {p}}}^{(b+1)})-H(\varvec{\theta }^b, {{\mathbf {p}}}^b)\right) \). If the new value is rejected, it is set to the previous values. The method must be tuned to accept and reject at reasonable rates, perhaps accepting between 40 and 60% of proposed samples. Both L and \(\epsilon \) can be tuned, where \(L \times \epsilon \) is closely associated with acceptance rates.
For a normal distribution, the derivative \(\frac{\partial B_\theta }{\partial \theta _i}\) is found by taking element-wise derivatives of the coefficients found in Eq. (6). The required derivative with respect to the mean parameter using Hermite basis functions is
Again, \(H_{n,k}\) is the k-th coefficient in the n-th Hermite polynomial and \(m_k\) is the k-th raw moment of a normal distribution with mean \(\mu /(\sigma ^2+1)\) and variance \(\sigma ^2/(\sigma ^2+1)\). In terms of the basis coefficients, the derivative can be written as
Mixtures of normals result in more complex calculations than the normal, but the basis coefficients of the mixture as a whole are given through the sum of the individual basis coefficients. The derivative of the basis coefficients needed for the HMCMC is the weighted sum of the derivatives of the basis coefficients of the normal components.
Rights and permissions
About this article
Cite this article
Richardson, R., Kottas, A. & Sansó, B. Bayesian non-parametric modeling for integro-difference equations. Stat Comput 28, 87–101 (2018). https://doi.org/10.1007/s11222-016-9719-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9719-1