Bayesian non-parametric modeling for integro-difference equations

Richardson, Robert; Kottas, Athanasios; Sansó, Bruno

doi:10.1007/s11222-016-9719-1

Bayesian non-parametric modeling for integro-difference equations

Published: 08 December 2016

Volume 28, pages 87–101, (2018)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

402 Accesses
5 Citations
Explore all metrics

Abstract

Integro-difference equations (IDEs) provide a flexible framework for dynamic modeling of spatio-temporal data. The choice of kernel in an IDE model relates directly to the underlying physical process modeled, and it can affect model fit and predictive accuracy. We introduce Bayesian non-parametric methods to the IDE literature as a means to allow flexibility in modeling the kernel. We propose a mixture of normal distributions for the IDE kernel, built from a spatial Dirichlet process for the mixing distribution, which can model kernels with shapes that change with location. This allows the IDE model to capture non-stationarity with respect to location and to reflect a changing physical process across the domain. We address computational concerns for inference that leverage the use of Hermite polynomials as a basis for the representation of the process and the IDE kernel, and incorporate Hamiltonian Markov chain Monte Carlo steps in the posterior simulation method. An example with synthetic data demonstrates that the model can successfully capture location-dependent dynamics. Moreover, using a data set of ozone pressure, we show that the spatial Dirichlet process mixture model outperforms several alternative models for the IDE kernel, including the state of the art in the IDE literature, that is, a Gaussian kernel with location-dependent parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variational Stochastic Parameterisations and Their Applications to Primitive Equation Models

Gaussian process for estimating parameters of partial differential equations and its application to the Richards equation

Article 26 July 2019

Multivariate Geostatistical Grid-Free Simulation of Natural Phenomena

Article 28 September 2016

References

Brown, P.E., Roberts, G.O., Kåresen, K.F., Tonellato, S.: Blur-generated non-separable space-time models. J. R. Stat. Soc. 62, 847–860 (2000)
Article MathSciNet MATH Google Scholar
Chen, T., Fox, E.B., Guestrin, C.: Stochastic gradient Hamiltonian Monte Carlo. In: ICML, pp. 1683–1691 (2014)
Cressie, N.: Statistics for Spatial Data. Wiley, New York (1993)
MATH Google Scholar
Cressie, N., Huang, H.-C.: Classes of nonseparable, spatio-temporal stationary covariance functions. J. Am. Stat. Assoc. 94, 1330–1339 (1999)
Article MathSciNet MATH Google Scholar
Cressie, N., Wikle, C.K.: Statistics for Spatio-Temporal Data. Wiley, New York (2011)
MATH Google Scholar
Frühwirth-Schnatter, S.: Data augmentation and dynamic linear models. J. Time Ser. Anal. 15, 183–202 (1994)
Article MathSciNet MATH Google Scholar
Gelfand, A.E., Kottas, A., MacEachern, S.N.: Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Am. Stat. Assoc. 100, 1021–1035 (2005)
Article MathSciNet MATH Google Scholar
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–472 (1992)
Article Google Scholar
Geweke, J. et al.: Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments, vol. 196. Federal Reserve Bank of Minneapolis, Research Department Minneapolis(1991)
Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. 73, 123–214 (2011)
Article MathSciNet Google Scholar
Gneiting, T., Stanberry, L.I., Grimit, E.P., Held, L., Johnson, N.A.: Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. Test 17, 211–235 (2008)
Article MathSciNet MATH Google Scholar
Heine, V.: Models for two-dimensional stationary stochastic processes. Biometrika 42, 170–178 (1955)
Article MathSciNet MATH Google Scholar
Higdon, D.: A process-convolution approach to modelling temperatures in the North Atlantic Ocean. Environ. Ecol. Stat. 5, 173–190 (1998)
Article Google Scholar
Hooten, M.B., Wikle, C.K.: A hierarchical Bayesian non-linear spatio-temporal model for the spread of invasive species with application to the Eurasian collared-dove. Environ. Ecol. Stat. 15, 59–70 (2008)
Article MathSciNet Google Scholar
Jones, R.H., Zhang, Y.: Models for continuous stationary space-time processes. In: Gregoire, T.G., Brillinger, D.R., Diggle, P.J., Russek-Cohen, E., Warren, W.G., Wolfinger, R.D. (eds.) Modelling Longitudinal and Spatially Correlated Data, pp. 289–298. Springer, New York (1997)
Chapter Google Scholar
Kot, M., Lewis, M.A., van den Driessche, P.: Dispersal data and the spread of invading organisms. Ecology 77, 2027–2042 (1996)
Article Google Scholar
Kottas, A., Duan, J.A., Gelfand, A.E.: Modeling disease incidence data with spatial and spatio temporal Dirichlet process mixtures. Biom. J. 50, 29–42 (2008)
Article MathSciNet Google Scholar
Ma, C.: Nonstationary covariance functions that model space-time interactions. Stat. Probab. Lett. 61, 411–419 (2003)
Article MathSciNet MATH Google Scholar
Neal, R.M.: MCMC using Hamiltonian dynamics. Handb. Markov Chain Mt. Carlo 2, 113–162 (2011)
MathSciNet MATH Google Scholar
Neubert, M.G., Kot, M., Lewis, M.A.: Dispersal and pattern formation in a discrete-time predator-prey model. Theor. Popul. Biol. 48, 7–43 (1995)
Article MATH Google Scholar
Nolan, J.: Stable Distributions: models for Heavy-Tailed Data. Birkhauser, New York (2003)
Google Scholar
Olver, F.W.: NIST Handbook of Mathematical Functions. Cambridge University Press, Cambridge (2010)
Richardson, R., Kottas, A., Sansó, B.: Flexible Integro-Difference Equation Modeling for Spatio-Temporal Data, To appear in Computational Statistics and Data Analysis (2017)
Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4, 639–650 (1994)
MathSciNet MATH Google Scholar
Smith, B.J.: boa: an R package for MCMC output convergence assessment and posterior inference. J. Stat. Softw. 21, 1–37 (2007)
Article Google Scholar
Storvik, G., Frigessi, A., Hirst, D.: Stationary space-time Gaussian fields and their time autoregressive representation. Stat. Model. 2, 139–161 (2002)
Article MathSciNet MATH Google Scholar
Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 681–688 (2011)
West, M., Harrison, J.: Bayesian Forecasting and Dynamic Models, 2nd edn. Springer, New York (1997)
MATH Google Scholar
Wikle, C.K.: A kernel-based spectral model for non-Gaussian spatio-temporal processes. Stat. Model. 2, 299–314 (2002)
Article MathSciNet MATH Google Scholar
Wikle, C.K., Cressie, N.: A dimension-reduced approach to space-time Kalman filtering. Biometrika 86, 815–829 (1999)
Article MathSciNet MATH Google Scholar
Wikle, C.K., Hooten, M.B.: A general science-based framework for dynamical spatio-temporal models. Test 19, 417–451 (2010)
Article MathSciNet MATH Google Scholar
Wikle, C.K., Holan, S.H.: Polynomial nonlinear spatio-temporal integro-difference equation models. J. Time Ser. Anal. 32, 339–350 (2011)
Article MathSciNet MATH Google Scholar
Xu, K., Wikle, C.K., Fox, N.I.: A kernel-based spatio-temporal dynamical model for nowcasting weather radar reflectivities. J. Am. Stat. Assoc. 100, 1133–1144 (2005)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research is part of the first author’s Ph.D. dissertation completed at University of California, Santa Cruz. A. Kottas was supported in part by the National Science Foundation under award DMS 1310438. B. Sansó was supported in part by the National Science Foundation under award DMS 1513076. The authors wish to thank an Associate Editor and two reviewers for constructive feedback and for comments that improved the presentation of the material in the paper.

Author information

Authors and Affiliations

Department of Statistics, Brigham Young University, Provo, UT, USA
Robert Richardson
Department of Applied Mathematics and Statistics, University of California, Santa Cruz, CA, USA
Athanasios Kottas & Bruno Sansó

Authors

Robert Richardson
View author publications
You can also search for this author in PubMed Google Scholar
Athanasios Kottas
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Sansó
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Richardson.

Appendices

Appendix 1: Stationarity of IDE processes

Brown et al. (2000) show that IDE models are stationary in space when the IDE kernel has parameters which are not spatially varying. The following lemma handles the more general case with spatially varying IDE kernels.

Lemma 1

Consider an IDE process, $X_t(s)$, with a stationary initial process $X_0(s)$, and with a kernel that belongs to a location family of distributions. The covariance function of the process is non-stationary with respect to location for all $t>0$, when the kernel parameters depend on the location s.

Proof

Assume $\hbox {Cov}[X_{t-1}(s),X_{t-1}(r)]=\rho (|s-r|)$, a stationary covariance function, and that the error process $\omega _t(s)$ is also stationary with covariance function $\gamma (|s-r|)$. Then,

$$\begin{aligned}&\hbox {Cov}[X_t(s),X_t(s+r)] \\&\quad = \hbox {Cov}\left[ \int k(u \mid s,\varvec{\theta }_s)X_{t-1}(u) \hbox {d}u \right. \\&\qquad \left. + \,\,\omega _t(s), \int k(v \mid s+r,\varvec{\theta }_{s+r})X_{t-1}(v) \hbox {d}v+ \omega _t(s+r)\right] \\&\quad = \int \int k(u \mid s,\varvec{\theta }_s) k(v \mid s{+}\,r,\varvec{\theta }_{s+r})\rho (|u{-}v|) \hbox {d}u\hbox {d}v {+} \gamma (r). \end{aligned}$$

Using the transformations $\eta = u-v$ and $w=v-s$, and the assumption of a kernel that belongs to a location family, i.e., $k(u \mid s,\varvec{\theta }_s)=$ $k(u-s \mid \varvec{\theta }_s)$, the covariance function can be written as

$$\begin{aligned}&\hbox {Cov}[X_t(s),X_t(s+r)] \\&\quad = \int \int k(\eta {+}\,w \mid \varvec{\theta }_{s}) k(w-r \mid \varvec{\theta }_{s+r})\rho (|\eta |) \hbox {d}\eta \hbox {d}w + \gamma (r). \end{aligned}$$

The covariance is a function of s and r, which implies non-stationarity in space. $\square $

When the parameter vector does not depend on the location, i.e., $\varvec{\theta }_s=\varvec{\theta }$, the location s disappears from the covariance. This is consistent with the previous work which shows that Gaussian kernel IDE models are stationary when the parameters do not change with location.

Appendix 2: Hamiltonian Markov chain Monte Carlo

Hamiltonian Markov chain Monte Carlo (HMCMC) involves taking the derivative with respect to unknown parameters of the negative log of the target function, which in this case is the posterior (Neal 2011). Letting $\varvec{W} = \tau ^2 \varvec{G} \varvec{V} \varvec{G}$, the negative log of the relevant parts of the posterior is

$$\begin{aligned} -l(\varvec{\theta })= & {} -\log (p(\varvec{\theta })) \\&{+} \frac{1}{2}\sum _{t=1}^T \left( \varvec{a}_t{-}\varvec{G}' \varvec{B}_\theta \varvec{a}_{t-1}\right) '\varvec{W}^{-1}\left( \varvec{a}_t{-}\varvec{G}' \varvec{B}_\theta \varvec{a}_{t-1}\right) , \end{aligned}$$

which can be expanded out as

$$\begin{aligned} -l(\varvec{\theta })= & {} -\log (p(\varvec{\theta })) + \frac{1}{2}\sum _{t=1}^T \left( \varvec{a}_t'\varvec{W}^{-1}\varvec{a}_t \right. \\&\left. {-}2\varvec{a}_t\varvec{W}^{-1}\varvec{G}'\varvec{B}_\theta \varvec{a}_{t-1}{+}\,\varvec{a}_{t-1}'\varvec{B}_\theta '\varvec{G} \varvec{W}^{-1} \varvec{G}' \varvec{B}_\theta \varvec{a}_{t-1}\right) . \end{aligned}$$

Using matrix calculus, the derivative is

$$\begin{aligned}&\frac{\partial (-l(\varvec{\theta }))}{\partial \theta _i} = -\frac{\partial (-\log (p(\varvec{\theta })))}{\partial \theta _i}\\&\quad +\frac{1}{2}\sum _{t=1}^T -2\varvec{a}_t\varvec{W}^{-1}\varvec{G}'\frac{\partial \varvec{B}_\theta }{\partial \theta _i}\varvec{a}_{t-1}\\&\quad +\,2\ tr\left( \varvec{a}_{t-1}\varvec{a}_{t-1}'\varvec{B}_\theta \varvec{G} \varvec{W}^{-1} \varvec{G}'\frac{\partial \varvec{B}_\theta }{\partial \theta _i}\right) , \end{aligned}$$

where $\frac{\partial \varvec{B}_\theta }{\partial \theta _i}$ is a element-wise derivative of $\varvec{B}_\theta $ with respect to $\theta _i$, and $\theta _i$ is the i-th element of the parameter vector, $\varvec{\theta }$.

Let $E(\varvec{\theta })$ refer to the negative log posterior. A step size $\epsilon $ and number of iterations, L, must be defined prior to the algorithm. Latent variables, $p_i$, are introduced for each parameter as independent normal variables with zero mean and variance $M_i$. Then, one “leapfrog” step given current iteration $(\varvec{\theta }^b, {{\mathbf {p}}}^b)$ is

$$\begin{aligned}&{{\mathbf {p}}}^{\left( b+\epsilon /2\right) } = {{\mathbf {p}}}^b - \frac{\epsilon }{2} \frac{\partial E}{\partial \varvec{\theta }}\left( \varvec{\theta }^{\left( b\right) }\right) \\&\varvec{\theta }^{\left( b+\epsilon \right) } = \theta ^b+\epsilon \frac{{{\mathbf {p}}}^{\left( b+\epsilon /2\right) }}{{{\mathbf {m}}}} \\&{{\mathbf {p}}}^{\left( b+\epsilon \right) } = {{\mathbf {p}}}^{\left( b+\epsilon /2\right) }-\frac{\epsilon }{2}\frac{\partial E}{\partial \varvec{\theta }}\left( \varvec{\theta }^{\left( b+\epsilon \right) }\right) \end{aligned}$$

The parameters leapfrog L times ending at new proposals for the posterior. The function $H(\varvec{\theta },p)$ is defined to be the sum of $E(\varvec{\theta })$ and $K(p) = \frac{1}{2}\sum \frac{p_i^2}{M_i}$. The new values $(\varvec{\theta }^{(b+1)}, {{\mathbf {p}}}^{(b+1)})$ are accepted with probability $\min (1,\exp \left( H(\varvec{\theta }^{(b+1)}, {{\mathbf {p}}}^{(b+1)})-H(\varvec{\theta }^b, {{\mathbf {p}}}^b)\right) $. If the new value is rejected, it is set to the previous values. The method must be tuned to accept and reject at reasonable rates, perhaps accepting between 40 and 60% of proposed samples. Both L and $\epsilon $ can be tuned, where $L \times \epsilon $ is closely associated with acceptance rates.

For a normal distribution, the derivative $\frac{\partial B_\theta }{\partial \theta _i}$ is found by taking element-wise derivatives of the coefficients found in Eq. (6). The required derivative with respect to the mean parameter using Hermite basis functions is

$$\begin{aligned} \frac{\partial b_n}{\partial \mu }= & {} \frac{1}{\sigma ^2}\frac{1}{\sqrt{\left( \sqrt{\pi }2^n n!\right) (1+\sigma ^2)}}\\&\exp \left( -\frac{\mu ^2}{2(1+\sigma ^2)}\right) \sum _{k=0}^n H_{n,k} \left( \mu m_k-m_{k+1}\right) . \end{aligned}$$

Again, $H_{n,k}$ is the k-th coefficient in the n-th Hermite polynomial and $m_k$ is the k-th raw moment of a normal distribution with mean $\mu /(\sigma ^2+1)$ and variance $\sigma ^2/(\sigma ^2+1)$. In terms of the basis coefficients, the derivative can be written as

$$\begin{aligned} \frac{\partial b_n}{\partial \mu }= & {} \frac{1}{\sigma ^2}\left( \mu b_n-b_{n+1}+\frac{1}{\sqrt{\left( \sqrt{\pi }2^n n!\right) \left( 1+\sigma ^2\right) }}\right. \nonumber \\&\left. \exp \left( -\frac{\mu ^2}{2\left( 1+\sigma ^2\right) }\right) \right) . \end{aligned}$$

(13)

Mixtures of normals result in more complex calculations than the normal, but the basis coefficients of the mixture as a whole are given through the sum of the individual basis coefficients. The derivative of the basis coefficients needed for the HMCMC is the weighted sum of the derivatives of the basis coefficients of the normal components.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Richardson, R., Kottas, A. & Sansó, B. Bayesian non-parametric modeling for integro-difference equations. Stat Comput 28, 87–101 (2018). https://doi.org/10.1007/s11222-016-9719-1

Download citation

Received: 23 February 2016
Accepted: 25 November 2016
Published: 08 December 2016
Issue Date: January 2018
DOI: https://doi.org/10.1007/s11222-016-9719-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian non-parametric modeling for integro-difference equations

Abstract

Access this article

Similar content being viewed by others

Variational Stochastic Parameterisations and Their Applications to Primitive Equation Models

Gaussian process for estimating parameters of partial differential equations and its application to the Richards equation

Multivariate Geostatistical Grid-Free Simulation of Natural Phenomena

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Stationarity of IDE processes

Lemma 1

Proof

Appendix 2: Hamiltonian Markov chain Monte Carlo

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian non-parametric modeling for integro-difference equations

Abstract

Access this article

Similar content being viewed by others

Variational Stochastic Parameterisations and Their Applications to Primitive Equation Models

Gaussian process for estimating parameters of partial differential equations and its application to the Richards equation

Multivariate Geostatistical Grid-Free Simulation of Natural Phenomena

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Stationarity of IDE processes

Lemma 1

Proof

Appendix 2: Hamiltonian Markov chain Monte Carlo

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation