Skip to main content
Log in

Bayesian non-parametric modeling for integro-difference equations

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Integro-difference equations (IDEs) provide a flexible framework for dynamic modeling of spatio-temporal data. The choice of kernel in an IDE model relates directly to the underlying physical process modeled, and it can affect model fit and predictive accuracy. We introduce Bayesian non-parametric methods to the IDE literature as a means to allow flexibility in modeling the kernel. We propose a mixture of normal distributions for the IDE kernel, built from a spatial Dirichlet process for the mixing distribution, which can model kernels with shapes that change with location. This allows the IDE model to capture non-stationarity with respect to location and to reflect a changing physical process across the domain. We address computational concerns for inference that leverage the use of Hermite polynomials as a basis for the representation of the process and the IDE kernel, and incorporate Hamiltonian Markov chain Monte Carlo steps in the posterior simulation method. An example with synthetic data demonstrates that the model can successfully capture location-dependent dynamics. Moreover, using a data set of ozone pressure, we show that the spatial Dirichlet process mixture model outperforms several alternative models for the IDE kernel, including the state of the art in the IDE literature, that is, a Gaussian kernel with location-dependent parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Brown, P.E., Roberts, G.O., Kåresen, K.F., Tonellato, S.: Blur-generated non-separable space-time models. J. R. Stat. Soc. 62, 847–860 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, T., Fox, E.B., Guestrin, C.: Stochastic gradient Hamiltonian Monte Carlo. In: ICML, pp. 1683–1691 (2014)

  • Cressie, N.: Statistics for Spatial Data. Wiley, New York (1993)

    MATH  Google Scholar 

  • Cressie, N., Huang, H.-C.: Classes of nonseparable, spatio-temporal stationary covariance functions. J. Am. Stat. Assoc. 94, 1330–1339 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Cressie, N., Wikle, C.K.: Statistics for Spatio-Temporal Data. Wiley, New York (2011)

    MATH  Google Scholar 

  • Frühwirth-Schnatter, S.: Data augmentation and dynamic linear models. J. Time Ser. Anal. 15, 183–202 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Gelfand, A.E., Kottas, A., MacEachern, S.N.: Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Am. Stat. Assoc. 100, 1021–1035 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–472 (1992)

    Article  Google Scholar 

  • Geweke, J. et al.: Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments, vol. 196. Federal Reserve Bank of Minneapolis, Research Department Minneapolis(1991)

  • Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. 73, 123–214 (2011)

    Article  MathSciNet  Google Scholar 

  • Gneiting, T., Stanberry, L.I., Grimit, E.P., Held, L., Johnson, N.A.: Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. Test 17, 211–235 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Heine, V.: Models for two-dimensional stationary stochastic processes. Biometrika 42, 170–178 (1955)

    Article  MathSciNet  MATH  Google Scholar 

  • Higdon, D.: A process-convolution approach to modelling temperatures in the North Atlantic Ocean. Environ. Ecol. Stat. 5, 173–190 (1998)

    Article  Google Scholar 

  • Hooten, M.B., Wikle, C.K.: A hierarchical Bayesian non-linear spatio-temporal model for the spread of invasive species with application to the Eurasian collared-dove. Environ. Ecol. Stat. 15, 59–70 (2008)

    Article  MathSciNet  Google Scholar 

  • Jones, R.H., Zhang, Y.: Models for continuous stationary space-time processes. In: Gregoire, T.G., Brillinger, D.R., Diggle, P.J., Russek-Cohen, E., Warren, W.G., Wolfinger, R.D. (eds.) Modelling Longitudinal and Spatially Correlated Data, pp. 289–298. Springer, New York (1997)

    Chapter  Google Scholar 

  • Kot, M., Lewis, M.A., van den Driessche, P.: Dispersal data and the spread of invading organisms. Ecology 77, 2027–2042 (1996)

    Article  Google Scholar 

  • Kottas, A., Duan, J.A., Gelfand, A.E.: Modeling disease incidence data with spatial and spatio temporal Dirichlet process mixtures. Biom. J. 50, 29–42 (2008)

    Article  MathSciNet  Google Scholar 

  • Ma, C.: Nonstationary covariance functions that model space-time interactions. Stat. Probab. Lett. 61, 411–419 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Neal, R.M.: MCMC using Hamiltonian dynamics. Handb. Markov Chain Mt. Carlo 2, 113–162 (2011)

    MathSciNet  MATH  Google Scholar 

  • Neubert, M.G., Kot, M., Lewis, M.A.: Dispersal and pattern formation in a discrete-time predator-prey model. Theor. Popul. Biol. 48, 7–43 (1995)

    Article  MATH  Google Scholar 

  • Nolan, J.: Stable Distributions: models for Heavy-Tailed Data. Birkhauser, New York (2003)

    Google Scholar 

  • Olver, F.W.: NIST Handbook of Mathematical Functions. Cambridge University Press, Cambridge (2010)

  • Richardson, R., Kottas, A., Sansó, B.: Flexible Integro-Difference Equation Modeling for Spatio-Temporal Data, To appear in Computational Statistics and Data Analysis (2017)

  • Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4, 639–650 (1994)

    MathSciNet  MATH  Google Scholar 

  • Smith, B.J.: boa: an R package for MCMC output convergence assessment and posterior inference. J. Stat. Softw. 21, 1–37 (2007)

    Article  Google Scholar 

  • Storvik, G., Frigessi, A., Hirst, D.: Stationary space-time Gaussian fields and their time autoregressive representation. Stat. Model. 2, 139–161 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 681–688 (2011)

  • West, M., Harrison, J.: Bayesian Forecasting and Dynamic Models, 2nd edn. Springer, New York (1997)

    MATH  Google Scholar 

  • Wikle, C.K.: A kernel-based spectral model for non-Gaussian spatio-temporal processes. Stat. Model. 2, 299–314 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Wikle, C.K., Cressie, N.: A dimension-reduced approach to space-time Kalman filtering. Biometrika 86, 815–829 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Wikle, C.K., Hooten, M.B.: A general science-based framework for dynamical spatio-temporal models. Test 19, 417–451 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Wikle, C.K., Holan, S.H.: Polynomial nonlinear spatio-temporal integro-difference equation models. J. Time Ser. Anal. 32, 339–350 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Xu, K., Wikle, C.K., Fox, N.I.: A kernel-based spatio-temporal dynamical model for nowcasting weather radar reflectivities. J. Am. Stat. Assoc. 100, 1133–1144 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research is part of the first author’s Ph.D. dissertation completed at University of California, Santa Cruz. A. Kottas was supported in part by the National Science Foundation under award DMS 1310438. B. Sansó was supported in part by the National Science Foundation under award DMS 1513076. The authors wish to thank an Associate Editor and two reviewers for constructive feedback and for comments that improved the presentation of the material in the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Richardson.

Appendices

Appendix 1: Stationarity of IDE processes

Brown et al. (2000) show that IDE models are stationary in space when the IDE kernel has parameters which are not spatially varying. The following lemma handles the more general case with spatially varying IDE kernels.

Lemma 1

Consider an IDE process, \(X_t(s)\), with a stationary initial process \(X_0(s)\), and with a kernel that belongs to a location family of distributions. The covariance function of the process is non-stationary with respect to location for all \(t>0\), when the kernel parameters depend on the location s.

Proof

Assume \(\hbox {Cov}[X_{t-1}(s),X_{t-1}(r)]=\rho (|s-r|)\), a stationary covariance function, and that the error process \(\omega _t(s)\) is also stationary with covariance function \(\gamma (|s-r|)\). Then,

$$\begin{aligned}&\hbox {Cov}[X_t(s),X_t(s+r)] \\&\quad = \hbox {Cov}\left[ \int k(u \mid s,\varvec{\theta }_s)X_{t-1}(u) \hbox {d}u \right. \\&\qquad \left. + \,\,\omega _t(s), \int k(v \mid s+r,\varvec{\theta }_{s+r})X_{t-1}(v) \hbox {d}v+ \omega _t(s+r)\right] \\&\quad = \int \int k(u \mid s,\varvec{\theta }_s) k(v \mid s{+}\,r,\varvec{\theta }_{s+r})\rho (|u{-}v|) \hbox {d}u\hbox {d}v {+} \gamma (r). \end{aligned}$$

Using the transformations \(\eta = u-v\) and \(w=v-s\), and the assumption of a kernel that belongs to a location family, i.e., \(k(u \mid s,\varvec{\theta }_s)=\) \(k(u-s \mid \varvec{\theta }_s)\), the covariance function can be written as

$$\begin{aligned}&\hbox {Cov}[X_t(s),X_t(s+r)] \\&\quad = \int \int k(\eta {+}\,w \mid \varvec{\theta }_{s}) k(w-r \mid \varvec{\theta }_{s+r})\rho (|\eta |) \hbox {d}\eta \hbox {d}w + \gamma (r). \end{aligned}$$

The covariance is a function of s and r, which implies non-stationarity in space. \(\square \)

When the parameter vector does not depend on the location, i.e., \(\varvec{\theta }_s=\varvec{\theta }\), the location s disappears from the covariance. This is consistent with the previous work which shows that Gaussian kernel IDE models are stationary when the parameters do not change with location.

Appendix 2: Hamiltonian Markov chain Monte Carlo

Hamiltonian Markov chain Monte Carlo (HMCMC) involves taking the derivative with respect to unknown parameters of the negative log of the target function, which in this case is the posterior (Neal 2011). Letting \(\varvec{W} = \tau ^2 \varvec{G} \varvec{V} \varvec{G}\), the negative log of the relevant parts of the posterior is

$$\begin{aligned} -l(\varvec{\theta })= & {} -\log (p(\varvec{\theta })) \\&{+} \frac{1}{2}\sum _{t=1}^T \left( \varvec{a}_t{-}\varvec{G}' \varvec{B}_\theta \varvec{a}_{t-1}\right) '\varvec{W}^{-1}\left( \varvec{a}_t{-}\varvec{G}' \varvec{B}_\theta \varvec{a}_{t-1}\right) , \end{aligned}$$

which can be expanded out as

$$\begin{aligned} -l(\varvec{\theta })= & {} -\log (p(\varvec{\theta })) + \frac{1}{2}\sum _{t=1}^T \left( \varvec{a}_t'\varvec{W}^{-1}\varvec{a}_t \right. \\&\left. {-}2\varvec{a}_t\varvec{W}^{-1}\varvec{G}'\varvec{B}_\theta \varvec{a}_{t-1}{+}\,\varvec{a}_{t-1}'\varvec{B}_\theta '\varvec{G} \varvec{W}^{-1} \varvec{G}' \varvec{B}_\theta \varvec{a}_{t-1}\right) . \end{aligned}$$

Using matrix calculus, the derivative is

$$\begin{aligned}&\frac{\partial (-l(\varvec{\theta }))}{\partial \theta _i} = -\frac{\partial (-\log (p(\varvec{\theta })))}{\partial \theta _i}\\&\quad +\frac{1}{2}\sum _{t=1}^T -2\varvec{a}_t\varvec{W}^{-1}\varvec{G}'\frac{\partial \varvec{B}_\theta }{\partial \theta _i}\varvec{a}_{t-1}\\&\quad +\,2\ tr\left( \varvec{a}_{t-1}\varvec{a}_{t-1}'\varvec{B}_\theta \varvec{G} \varvec{W}^{-1} \varvec{G}'\frac{\partial \varvec{B}_\theta }{\partial \theta _i}\right) , \end{aligned}$$

where \(\frac{\partial \varvec{B}_\theta }{\partial \theta _i}\) is a element-wise derivative of \(\varvec{B}_\theta \) with respect to \(\theta _i\), and \(\theta _i\) is the i-th element of the parameter vector, \(\varvec{\theta }\).

Let \(E(\varvec{\theta })\) refer to the negative log posterior. A step size \(\epsilon \) and number of iterations, L, must be defined prior to the algorithm. Latent variables, \(p_i\), are introduced for each parameter as independent normal variables with zero mean and variance \(M_i\). Then, one “leapfrog” step given current iteration \((\varvec{\theta }^b, {{\mathbf {p}}}^b)\) is

$$\begin{aligned}&{{\mathbf {p}}}^{\left( b+\epsilon /2\right) } = {{\mathbf {p}}}^b - \frac{\epsilon }{2} \frac{\partial E}{\partial \varvec{\theta }}\left( \varvec{\theta }^{\left( b\right) }\right) \\&\varvec{\theta }^{\left( b+\epsilon \right) } = \theta ^b+\epsilon \frac{{{\mathbf {p}}}^{\left( b+\epsilon /2\right) }}{{{\mathbf {m}}}} \\&{{\mathbf {p}}}^{\left( b+\epsilon \right) } = {{\mathbf {p}}}^{\left( b+\epsilon /2\right) }-\frac{\epsilon }{2}\frac{\partial E}{\partial \varvec{\theta }}\left( \varvec{\theta }^{\left( b+\epsilon \right) }\right) \end{aligned}$$

The parameters leapfrog L times ending at new proposals for the posterior. The function \(H(\varvec{\theta },p)\) is defined to be the sum of \(E(\varvec{\theta })\) and \(K(p) = \frac{1}{2}\sum \frac{p_i^2}{M_i}\). The new values \((\varvec{\theta }^{(b+1)}, {{\mathbf {p}}}^{(b+1)})\) are accepted with probability \(\min (1,\exp \left( H(\varvec{\theta }^{(b+1)}, {{\mathbf {p}}}^{(b+1)})-H(\varvec{\theta }^b, {{\mathbf {p}}}^b)\right) \). If the new value is rejected, it is set to the previous values. The method must be tuned to accept and reject at reasonable rates, perhaps accepting between 40 and 60% of proposed samples. Both L and \(\epsilon \) can be tuned, where \(L \times \epsilon \) is closely associated with acceptance rates.

For a normal distribution, the derivative \(\frac{\partial B_\theta }{\partial \theta _i}\) is found by taking element-wise derivatives of the coefficients found in Eq. (6). The required derivative with respect to the mean parameter using Hermite basis functions is

$$\begin{aligned} \frac{\partial b_n}{\partial \mu }= & {} \frac{1}{\sigma ^2}\frac{1}{\sqrt{\left( \sqrt{\pi }2^n n!\right) (1+\sigma ^2)}}\\&\exp \left( -\frac{\mu ^2}{2(1+\sigma ^2)}\right) \sum _{k=0}^n H_{n,k} \left( \mu m_k-m_{k+1}\right) . \end{aligned}$$

Again, \(H_{n,k}\) is the k-th coefficient in the n-th Hermite polynomial and \(m_k\) is the k-th raw moment of a normal distribution with mean \(\mu /(\sigma ^2+1)\) and variance \(\sigma ^2/(\sigma ^2+1)\). In terms of the basis coefficients, the derivative can be written as

$$\begin{aligned} \frac{\partial b_n}{\partial \mu }= & {} \frac{1}{\sigma ^2}\left( \mu b_n-b_{n+1}+\frac{1}{\sqrt{\left( \sqrt{\pi }2^n n!\right) \left( 1+\sigma ^2\right) }}\right. \nonumber \\&\left. \exp \left( -\frac{\mu ^2}{2\left( 1+\sigma ^2\right) }\right) \right) . \end{aligned}$$
(13)

Mixtures of normals result in more complex calculations than the normal, but the basis coefficients of the mixture as a whole are given through the sum of the individual basis coefficients. The derivative of the basis coefficients needed for the HMCMC is the weighted sum of the derivatives of the basis coefficients of the normal components.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Richardson, R., Kottas, A. & Sansó, B. Bayesian non-parametric modeling for integro-difference equations. Stat Comput 28, 87–101 (2018). https://doi.org/10.1007/s11222-016-9719-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9719-1

Keywords

Navigation