Encyclopedia of GIS

2017 Edition
| Editors: Shashi Shekhar, Hui Xiong, Xun Zhou

Hierarchical Spatial Models

  • Ali Arab
  • Mevin B. Hooten
  • Christopher K. Wikle
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-17885-1_564


A hierarchical spatial model is the product of conditional distributions for data conditioned on a spatial process and parameters, the spatial process conditioned on the parameters defining the spatial dependencies between process locations and the parameters themselves.

Historical Background

Scientists across a wide range of disciplines have long recognized the importance of spatial dependencies in their data and the underlying process of interest. Initially due to computational limitations, they dealt with such dependencies by randomization and blocking rather than the explicit characterization of the dependencies in their models. Early developments in spatial modeling started in the 1950s and 1960s motivated by problems in mining engineering and meteorology (Cressie 1993), followed by the introduction of Markov random fields (Besag 1974). The application of hierarchical spatial and spatiotemporal models have become increasingly popular since the advancements of computational techniques, such as MCMC methods, in the later years of the twentieth century.

Scientific Fundamentals

Methods for spatial and spatiotemporal modeling are becoming increasingly important in the environmental sciences and other sciences where data arise from processes in spatial settings. Unfortunately, the application of traditional covariance-based spatial statistical models is either inappropriate or computationally inefficient in many problems. Moreover, conventional methods are often incapable of allowing the researcher to quantify uncertainties corresponding to the model parameters since the parameter space of most complex spatial and spatiotemporal models is very large.

Hierarchical Models

A main goal in the rigorous characterization of natural phenomena is the estimation and prediction of processes as well as the parameters governing processes. Thus, a flexible framework capable of accommodating complex relationships between data and process models while incorporating various sources of uncertainty is necessary. Traditional likelihood-based approaches to modeling have allowed for scientifically meaningful data structures, though, in complicated situations with heavily parameterized models and limited or missing data; estimation by likelihood maximization is often problematic or infeasible. Developments in numerical approximation methods have been useful in many cases, especially for high-dimensional parameter spaces (e.g., Newton-Raphson and E-M methods, Givens and Hoeting 2005), though can still be difficult or impossible to implement and have no provision for accommodating uncertainty at multiple levels.

Hierarchical models, whereby a problem is decomposed into a series of levels linked by simple rules of probability, assume a very flexible framework capable of accommodating uncertainty and potential a priori scientific knowledge while retaining many advantages of a strict likelihood approach (e.g., multiple sources of data and scientifically meaningful structure). The years after introduction of the Bayesian hierarchical model and development of MCMC (i.e., Markov Chain Monte Carlo) have brought on an explosion of research, both theoretical and applied, utilizing and (or) developing hierarchical models.

Hierarchical modeling is based on a simple fact from probability that the joint distribution of a collection of random variables can be decomposed into a series of conditional models. For example, if a, b, c are random variables, then basic probability allows the factorization:
$$\displaystyle{ [a,b,c] = [a\vert b,c][b\vert c][c]\;, }$$
where the notation [.] is used to specify a probability distribution, and [x | y] refers to the distribution of x conditioned on y. In the case of spatial and spatiotemporal models, the joint distribution describes the behavior of the process at all spatial locations of potential interest (and, possibly, all times). This joint distribution (left-hand side of (1)) is difficult to specify for complicated processes. Typically, it is much easier to specify the distribution of the conditional models (right-hand side of (1)). In that case, the product of the series of relatively simple conditional models gives a joint distribution that can be quite complex.
When modeling complicated processes in the presence of data, it is helpful to write the hierarchical model in three basic stages:
  1. Stage 1.

    Data Model: [data—process, data   parameters]

  2. Stage 2.

    Process Model: [process— process   parameters]

  3. Stage 3.

    Parameter Model: [data   and   process   parameters].


The basic idea is to approach the complex problem by breaking it into simpler subproblems. Although hierarchical modeling is not new to statistics (Lindley and Smith 1972), this basic formulation for modeling complicated spatial and spatiotemporal processes in the environmental sciences is a relatively new development (e.g., Berliner 1996; Wikle et al. 1998). The first stage is concerned with the observational process or “data model,” which specifies the distribution of the data given the fundamental process of interest and parameters that describe the data model. The second stage then describes the process, conditional on other process parameters. Finally, the last stage models the uncertainty in the parameters, from both the data and process stages. Note that each of these stages can have many substages (e.g., Wikle et al. 19982001).


The goal is to estimate the distribution of the process and parameters given the data. Bayesian methods are naturally suited to estimation in such hierarchical settings, although non-Bayesian methods can sometimes be utilized but often require additional assumptions. Using a Bayesian approach, the “posterior distribution” (i.e., the joint distribution of the process and parameters given the data) is obtained via Bayes’ theorem:

$$\displaystyle\begin{array}{rcl} [\mathrm{process,parameters\vert data}]& \propto & [\mathrm{data\vert process,parameters}] \times \\ & & [\mathrm{process\vert parameters}][\mathrm{parameters}].{}\end{array}$$

Bayesian statistics involves drawing statistical conclusions from the posterior distribution which is proportional to the data model (i.e., the likelihood) times the a priori knowledge (i.e., the prior). Bayes’ theorem is thus the mechanism that provides access to the posterior. Although simple in principle, the implementation of Bayes’ theorem for complicated models can be challenging. One challenge concerns the specification of the parameterized component distributions on the right-hand side of (2). Although there has long been a debate in the statistics community concerning the appropriateness of “subjective” specification of such distributions, such choices are a natural part of scientific modeling. In fact, the use of scientific knowledge in the prior distribution allows for the incorporation of uncertainty related to these specifications explicitly in the model. Another, perhaps more important, challenge, from a practical perspective, is the calculation of the posterior distribution. The complex and high-dimensional nature of many scientific models (and indeed, most spatiotemporal models) prohibits the direct evaluation of the posterior. However, MCMC approaches can be utilized to estimate the posterior distribution through iterative sampling. As previously mentioned, the use of MCMC has been critical for the implementation of Bayesian hierarchical models, in that realistic (i.e., complicated) models can be considered; this is especially evident in the analysis of spatial and spatiotemporal processes. Yet, typically the computational burden must be considered when formulating the conditional models in such problems. Thus, the model-building phase requires not only scientific understanding of the problem but in what ways that understanding can be modified to fit into the computational framework.

Nonanalytical hierarchical models can be fitted to data using high-level programming languages (such as R, S-plus, MATLAB) or low-level languages (such as C, C++, FORTRAN). High-level languages allow for efficient programming, whereas low-level languages often allow for more efficient execution. Alternatively, the freely distributed Bayesian computation software WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/) or its open-source version, OpenBUGS (http://www.openbugs.net/w/FrontPage), and its spatial package GeoBUGS can be used to carry out Bayesian computations (Banerjee et al. 2015). Another similar tool is JAGS (http://mcmc-jags.sourceforge.net/) which is based on BUGS but can be used on many platforms (as opposed to WinBUGS/OpenBUGS which are limited to Windows). The developers of the automated Gibbs sampler program, BUGS and WinBUGS, have been quick to point out the caveat that comes with misuse of the software (and Bayes methods in general) by cautioning that MCMC methods are not as robust as analytical methods and that the analyst should mindfully utilize such methods, especially when choosing prior distributions. Another computational tool that is recently enjoying great popularity is the integrated nested Laplace approximation (INLA; http://www.r-inla.org/; Rue et al. 2009). The INLA approach is a numerically implemented analytical solution for approximating posterior marginals in hierarchical models with latent Gaussian processes. A recent tool that is gaining popularity for carrying out Bayesian computations is STAN (http://mc-stan.org/) which is an open-source software coded in C++ based on Hamiltonian Monte Carlo methods and has several interfaces including an R interface (RStan).

Spatial Processes

In this section we focus on the process model stage of the hierarchical framework described in the previous section and specifically applied in spatial settings. We consider the two important cases of continuous and areal data and discuss popular modeling choices and their hierarchical forms.

General Hierarchical Spatial Model Framework

In order to consider spatial models in a general hierarchical framework, assume \(\boldsymbol{Z} = (Z(r_{1}),\ldots, Z(r_{m}))^{{\prime}}\) as a vector of observations of a spatial process denoted by \(\boldsymbol{y} = (y(s_{1}),\ldots, y(s_{n}))^{{\prime}}\) where the spatial locations of the observations (r i ) do not necessarily correspond to the support of the underlying spatial process of interest (s i ). The general framework for hierarchical spatial modeling is given by the following simple and flexible structure based on the three-stage component models described previously:
$$\displaystyle{[\boldsymbol{Z}\mid \boldsymbol{\mu },\boldsymbol{\theta }_{z}][\boldsymbol{\mu }\mid \boldsymbol{y},\boldsymbol{\theta }_{\mu }][\boldsymbol{\theta }_{z},\boldsymbol{\theta }_{\mu }],}$$
where \(\boldsymbol{\theta }_{z}\) and \(\boldsymbol{\theta }_{\mu }\) are parameter vectors. The data model typically specifies that we have measurements from an underlying “true” process in the presence of a measurement error process. A generalized linear framework for the process component can be considered which generalizes the Gaussian model to the exponential model (see Diggle et al. 1998). Thus, the process model can be written:
$$\displaystyle{h(\boldsymbol{\mu }) =\boldsymbol{ Xy}+\boldsymbol{\eta },}$$
where h(. ) is a known link function, \(\boldsymbol{X} = (\mathbf{x}^{{\prime}}(s_{1}),\ldots, \mathbf{x}^{{\prime}}(s_{n}))^{{\prime}}\) denotes the covariates, and \(\boldsymbol{\eta }\) is process model noise containing explicit spatial covariance structure. This process specification can easily accommodate most conventional spatial models (e.g., see next section on models for spatially continuous data).
For example, in case of normal data, the data model can be written as,
$$\displaystyle{\boldsymbol{Z}\mid \boldsymbol{y},\boldsymbol{\varSigma }\sim N(\boldsymbol{K}\boldsymbol{y},\boldsymbol{\varSigma }),}$$
where \(\boldsymbol{Z}\) denotes measurements from an underlying “true” process \(\boldsymbol{y}\) in the presence of a measurement error process, and \(\boldsymbol{K}\) is a matrix that maps the observations to process locations (allowing for differing observation and process supports, e.g., Wikle and Berliner 2005). Similarly for count data arising from an appropriate distribution such as the Poisson, the data model can be written as
$$\displaystyle{\boldsymbol{Z}\mid \boldsymbol{\lambda } \sim \mathrm{ Poisson}(\boldsymbol{K\lambda }),}$$
where the observations \(\boldsymbol{Z} = (Z(s_{1}),\ldots, Z(s_{n}))^{{\prime}}\) are assumed to be conditionally independent, and \(\boldsymbol{\lambda }= (\lambda (s_{1}),\ldots, \lambda (s_{n}))^{{\prime}}\) is the unknown spatially varying Poisson intensity. The Poisson intensity process can then be modeled in the process stage using covariates or latent variables (see e.g., Wikle and Hooten 2006).

Process Models for Spatially Continuous Data

Spatially continuous data (also known as geostatistical data) refer to spatially indexed data at location s, where s varies continuously over some region R. The modeling of spatially continuous data has long been the dominant theme in spatial statistics. The most common class of models for spatially continuous data are known as Kriging models which are extensions of the method of minimizing the mean squared error in spatial settings. A general continuous spatial model has the following form:
$$\displaystyle{ \boldsymbol{y} =\boldsymbol{ X\beta }+\boldsymbol{\eta },\;\text{where}\;\boldsymbol{\eta } \sim N(\boldsymbol{0},\boldsymbol{\varSigma }), }$$
where \(\boldsymbol{y} = \left [\begin{array}{c} \boldsymbol{y}_{1} \\ \boldsymbol{y}_{2}\end{array} \right ]\), \(\boldsymbol{y}_{1} = (y_{s_{1}},\ldots, y_{s_{n}})^{{\prime}}\) represents the spatial process at locations for which there are data, and \(\boldsymbol{y}_{2}\) denotes the process at a new location (e.g., s0) or locations. Furthermore, the term \(\boldsymbol{X\beta }\;(=\boldsymbol{\mu })\) represents the mean of the process (possibly explained by covariates \(\boldsymbol{X}\)) and spatially correlated error \(\boldsymbol{\eta }\) with covariance,
$$\displaystyle{\boldsymbol{\varSigma }= \left (\begin{array}{cc} \boldsymbol{\varSigma }_{11}\quad \boldsymbol{\varSigma }_{12}\\ \boldsymbol{\varSigma }_{ 21}\quad \boldsymbol{\varSigma }_{22} \end{array} \right ).}$$
Kriging then involves finding the best linear predictor of \(\boldsymbol{y}_{2}\) given \(\boldsymbol{y}_{1}\):
$$\displaystyle\begin{array}{rcl} & & \boldsymbol{y}_{2}\mid \boldsymbol{y}_{1} \sim N(\boldsymbol{\mu }_{2} +\boldsymbol{\varSigma } _{21}\boldsymbol{\varSigma }_{11}^{-1}(\boldsymbol{y}_{ 1} -\boldsymbol{\mu }_{1}), {}\\ & & \boldsymbol{\varSigma }_{22} -\boldsymbol{\varSigma }_{21}\boldsymbol{\varSigma }_{11}^{-1}\boldsymbol{\varSigma }_{ 12}), {}\\ \end{array}$$
where \(\boldsymbol{\mu }=\left [\begin{array}{c} \boldsymbol{\mu }_{1}\\ \boldsymbol{\mu }_{ 2} \end{array} \right ]\), \(E(\boldsymbol{y}_{1})=\boldsymbol{\mu }_{1}\), and \(E(\boldsymbol{y}_{2})=\boldsymbol{\mu }_{2}\).
The model in (3) is known as a universal Kriging model. A hierarchical representation of this model can be written as:
$$\displaystyle{ \boldsymbol{z}\mid \boldsymbol{y},\sigma ^{2} \sim N(\boldsymbol{Ky},\sigma ^{2}\boldsymbol{I}), }$$
$$\displaystyle{ \boldsymbol{y}\mid \boldsymbol{\beta },\boldsymbol{\varSigma }_{y} \sim N(\boldsymbol{X\beta },\boldsymbol{\varSigma }_{y}), }$$
$$\displaystyle{ \boldsymbol{\beta }\mid \boldsymbol{\beta }_{0},\boldsymbol{\varSigma }_{\beta } \sim N(\boldsymbol{\beta }_{0},\boldsymbol{\varSigma }_{\beta }), }$$
$$\displaystyle{ \{\sigma ^{2},\boldsymbol{\theta }_{ y},\boldsymbol{\beta }_{0},\boldsymbol{\varSigma }_{\beta }\} \sim [\sigma ^{2},\boldsymbol{\theta }_{ y},\boldsymbol{\beta }_{0},\boldsymbol{\varSigma }_{y},\boldsymbol{\varSigma }_{\beta }]. }$$

Process Models for Areal Data

Areal data (also known as lattice data) are spatially indexed data associated with geographic regions or areas such as counties or zip codes and are often presented as aggregated values over an areal unit with well-defined boundaries. Spatial association among the areal units is specified by defining neighborhood structure for the areas (regular or irregular) of interest. Examples of such data include a wide variety of problems from disease mapping in counties to modeling air pollution on a grid. Models described in this section are based on Markov random fields (MRFs). MRFs are a special class of spatial models that are suitable for data on discrete (countable) spatial domains in which a joint distribution of y i (for \(i = 1,\ldots, n\), where y i is the spatial process at spatial unit i) is determined by using a set of locally specified conditional distributions for each spatial unit conditioned on its neighbors. MRFs include a wide class of spatial models, such as auto-Gaussian models for spatial Gaussian processes, auto-logistic models for binary spatial random variables, auto-Gamma models for nonnegative continuous processes, and auto-Poisson models for spatial count processes. Here we focus on two popular auto-Gaussian models, CAR and SAR models.

Conditionally Autoregressive (CAR) Models

As introduced by Besag (1974), conditionally autoregressive (CAR) models are popular hierarchical spatial models for use with areal data. Here, we consider the Gaussian case. Assume \(y_{i} \equiv y(i^{\text{th}}\text{ spatial area})\) and \(\boldsymbol{y} = (y_{1},\ldots, y_{n})^{{\prime}}\), then the CAR model can be defined using the following n conditional distributions:
$$\displaystyle\begin{array}{rcl} & & y_{i}\mid y_{j},\tau _{i}^{2} \sim N(\mu _{ i} +\sum _{j\in N_{i}}c_{ij}(y_{j} -\mu _{j}),\tau _{i}^{2}), \\ & & i,j = 1,\ldots, n, {}\end{array}$$
where N i is defined as a set of neighbors of area i, E(y i ) = μ i , τ i 2 is the conditional variance, and the c ij are constants such that c ii = 0 for \(i = 1,\ldots, n\), and c ij τ j 2 = c ji τ i 2. It can be shown (Cressie 1993) that the joint distribution of \(\boldsymbol{y}\) using the conditional distributions can be written as
$$\displaystyle{ \boldsymbol{y} \sim N(\boldsymbol{\mu },(\boldsymbol{I} -\boldsymbol{ C})^{-1}\boldsymbol{M}), }$$
where \(\boldsymbol{\mu }= (\mu _{1},\ldots, \mu _{n})^{{\prime}}\), \(\boldsymbol{C} = [c_{ij}]_{n\times n}\), and \(\boldsymbol{M} =\mathrm{ diag}(\tau _{1}^{2},\ldots, \tau _{n}^{2})\). Note that \((\boldsymbol{I} -\boldsymbol{ C})\) must be invertible and \((\boldsymbol{I} -\boldsymbol{ C})^{-1}\boldsymbol{M}\) must be symmetric and positive definite.

The implementation of the CAR model is convenient in hierarchical Bayesian settings because of the explicit conditional structure. Perhaps the most popular implementation of the CAR model is the pairwise difference formulation proposed by Besag et al. (1991), where C is decomposed into an adjacency matrix and a diagonal matrix containing information on the number of neighbors for each of the areal units which results in a simple and easy to fit version of the model. Although the convenient specification of CAR models makes them attractive for modeling areal data, the usage of these models often involves numerous theoretical and computational difficulties (e.g., singularity of the covariance function of the joint distribution results in the joint distribution being improper which is called the intrinsic CAR or ICAR; for more details see Banerjee et al. 2015). Several methods to overcome such difficulties have been proposed (e.g., Cressie 1993; Carlin and Banerjee 2003); however, the development of strategies to address the difficulties of CAR models is a topic of ongoing research.

Simultaneous Autoregressive (SAR) Models

Simultaneous autoregressive (SAR) models, introduced by Whittle (1954), are another class of spatial models for areal data. SAR models are a subset of MRFs and are popular in econometrics. Here we consider the Gaussian case,
$$\displaystyle\begin{array}{rcl} & & y_{i} =\mu _{i} +\sum _{j}b_{ij}(y_{j} -\mu _{j}) +\varepsilon _{i}, \\ & & i,j = 1,\ldots, n, {}\end{array}$$
or equivalently, in matrix notation,
$$\displaystyle{ \boldsymbol{y} =\boldsymbol{\mu } +(\boldsymbol{I} -\boldsymbol{ B})^{-1}\boldsymbol{\varepsilon }, }$$
where \(\boldsymbol{B} = [b_{ij}]_{n\times n}\) is a matrix that can be interpreted as the spatial-dependence matrix, \(\boldsymbol{\varepsilon }\sim N(\boldsymbol{0},\boldsymbol{\varLambda })\), and \(\boldsymbol{\varLambda }\) is an \(n \times n\) diagonal covariance matrix of \(\boldsymbol{\varepsilon }\) (e.g., \(\boldsymbol{\varLambda }=\sigma ^{2}\boldsymbol{I}\)). Thus, \(\boldsymbol{\varepsilon }\) induces the following distribution for \(\boldsymbol{y}\),
$$\displaystyle{ \boldsymbol{y} \sim N(\boldsymbol{\mu },(\boldsymbol{I} -\boldsymbol{ B})^{-1}\boldsymbol{\varLambda }(\boldsymbol{I} -\boldsymbol{ B}^{{\prime}})^{-1}), }$$
where \((\boldsymbol{I} -\boldsymbol{ B})\) must be full rank. There are two common choices for \(\boldsymbol{B}\); one is based on a spatial autoregression parameter (ρ) and an adjacency matrix, and the other is based on a spatial autocorrelation parameter (α) and a normalized adjacency matrix. Thus, the following alternative models can be considered:
$$\displaystyle{ y_{i} =\rho \sum _{j\in N_{i}}w_{ij}y_{j} +\varepsilon _{i}, }$$
$$\displaystyle{ y_{i} =\alpha \sum _{j\in N_{i}} \frac{w_{ij}} {\sum _{k}w_{ik}}y_{j} +\varepsilon _{i} }$$
where the w ij s are elements of an adjacency matrix W, with 0 or 1 entries, describing the neighborhood structure for each unit. CAR and SAR models are equivalent if and only if their covariance matrices are equal (i.e., \((\boldsymbol{I} -\boldsymbol{\ C})^{-1}\boldsymbol{M} = (\boldsymbol{I} -\boldsymbol{ B})^{-1}\boldsymbol{\varLambda }(\boldsymbol{I} -\boldsymbol{ B}^{{\prime}})^{-1}\)). Any SAR model can be represented as a CAR model, but the converse is not necessarily true. One main difference between SAR and CAR models is that the spatial-dependence matrix for CAR models (\(\boldsymbol{C}\)) is symmetric, while the spatial-dependence matrix for SAR models (\(\boldsymbol{B}\)) need not be symmetric. Although this might be interpreted as an advantage for SAR models in situations where the spatial dependence of neighboring sites is defined in an asymmetric way, non-identifiability issues related to estimation of model parameters make CAR models more preferable for cases with symmetric dependency structures (for more details on the comparisons between SAR and CAR models, see Cressie (1993, pages 408–410)).

Spatiotemporal Processes

Spatiotemporal processes are often complex, exhibiting different scales of spatial and temporal variability. Such processes are typically characterized by a large number of observations and prediction locations in space and time, differing spatial and temporal support, orientation and alignment (relative to the process of interest), and complicated underlying dynamics. The complexity of such processes in “real-world” situations is often intensified due to nonexistence of simplifying assumptions such as Gaussianity, spatial and temporal stationarity, linearity, and space-time separability of the covariance function. Thus, a joint perspective for modeling spatiotemporal processes, although relatively easy to formulate, is challenging to implement. On the contrary, a hierarchical formulation allows the modeling of complicated spatial and temporal structures by decomposing an intricate joint spatiotemporal process into relatively simple conditional models. The main advantage of the Bayesian hierarchical model over traditional covariance-based methods is that it allows the complicated structure to be modeled at a lower level in the hierarchy, rather than attempting to model the complex joint dependencies.

General Spatiotemporal Model

Let Z(s, t) be a spatiotemporal process where sD s , and D s is a continuous or discrete, and potentially time-varying, spatial domain, and tD t is a discrete temporal domain. The generality of the definition of the spatial domain allows for the spatiotemporal process to be applicable to both cases of continuous data and areal data. A general decomposition of the process (where Z(s, t) and Y (s, t) have the same spatial support and no missing data) can be written as
$$\displaystyle{ Z(s,t) = Y (s,t) +\varepsilon (s,t), }$$
where Y (s, t) is the “true” underlying correlated process of interest, and ɛ(s, t) is a zero-mean measurement error process (Wikle and Hooten 2010; Cressie and Wikle 2011). The underlying process Y (s, t) can be further decomposed into a mean process, additive error process, and spatial or temporal random effects. Recent approaches to spatiotemporal modeling have focused on the specification of joint space-time covariance structures (e.g., Cressie and Huang 1999; Stein 2005). However, in high-dimensional settings with complicated nonlinear spatiotemporal behavior, such covariance structures are very difficult to formulate. An alternative approach to modeling such complicated processes is to use spatiotemporal dynamic models in a hierarchical fashion.

Dynamical Spatiotemporal Models

Many spatiotemporal processes are dynamic in the sense that the current state of the process is a function of the previous states. There are many examples of spatiotemporal models with dynamic components in the literature (e.g., Huang and Cressie 1996; Waller et al. 1997; Wikle et al. 19982001; Wikle and Cressie 1999). The joint spatiotemporal process can be factored into conditional models based on a Markovian assumption:
$$\displaystyle{ [\mathbf{Y}\mid \boldsymbol{\theta }_{t},t = 1,\ldots, T] = [\mathbf{y}_{0}]\prod _{t=1}^{T}[\mathbf{y}_{ t}\vert \mathbf{y}_{t-1},\boldsymbol{\theta }_{t}], }$$
where \(\ \mathbf{y}\, =\, (\boldsymbol{y}_{1},\,\ldots, \,\boldsymbol{y}_{T})\)), \(\mathbf{y}_{t}\, =\, (y(s_{1},t),\ldots, y(s_{n},\,t))^{{\prime}}\), and the conditional distribution \([\mathbf{y}_{t}\vert \mathbf{y}_{t-1},\boldsymbol{\theta }_{t}]\) depends on a vector of parameters \(\boldsymbol{\theta }_{t}\) which govern the dynamics of the spatiotemporal process of interest. An example of such dynamical spatiotemporal models is when the process has a first-order Markovian structure:
$$\displaystyle{ \mathbf{y}_{t} = \mathbf{H}(\theta _{t})\mathbf{y}_{t-1} +\boldsymbol{\eta } _{t},\;\text{where}\;\boldsymbol{\eta }_{t} \sim N(\mathbf{0},\boldsymbol{\varSigma }_{\eta }), }$$
where \(\boldsymbol{\eta }_{t}\) is a spatial error process, and H(θ t ) is a “propagator matrix” (sometimes called a transition or evolution matrix) which includes the parameters that govern the dynamics of the process. If these parameters are known or easy to estimate, an implementation of the model through Kalman filtering is possible (e.g., West and Harrison 1989; Huang and Cressie 1996; Wikle and Cressie 1999). If the parameters are unknown, H(θ t ) can be modeled in a hierarchical fashion by specifying prior distributions for H(θ t ) or its parameters \(\boldsymbol{\theta }_{t}\). The hierarchical form for spatiotemporal dynamic models is sometimes motivated by partial differential equations (PDEs) that describe an approximate behavior of underlying physical processes. For example, Wikle et al. (2001) used the shallow-water PDEs that approximate atmospheric processes in the tropics to develop prior distributions for predicting high-resolution wind fields over the tropical ocean.

Multivariate Spatial and Spatiotemporal Models

Spatial and spatiotemporal models have recently been extended to accommodate multivariate situations (e.g., popular univariate models such as continuous spatial models (e.g., Kriging) and CAR models have been extended to include multivariate cases). The distinction between continuous data and areal data, as described for the univariate case, holds true for the multivariate case (Cressie and Wikle 2011). Multivariate approaches have the added advantage of not only being able to rely on covariate- and covariance-based information but to “borrow strength” between observation vectors (i.e., response variables) as well. Examples of such multivariate models are cokriging for multivariate continuous data, multivariate CAR models for areal data, and multivariate dynamic models.

Multivariate dynamical spatiotemporal models may be described from different perspectives. A comprehensive discussion of several approaches including augmenting the state process, conditioning on a common process, and conditional specification are provided in Cressie and Wikle (2011). This is an active area of research.

Dimension Reduction

Spatial and spatiotemporal models are typically high dimensional. This characteristic complicates the modeling process and necessitates development of efficient computational algorithms on the one hand and implementation of dimension reduction methods (e.g., recasting the problem in a spectral context) on the other hand.

In spatiotemporal dynamical models, often the dynamics of the process can be described based on a relatively low-dimensional manifold (Cressie and Wikle 2011; Wikle and Hooten 2010). A common approach is to write the process vector, y t , in terms of basis function expansions (e.g., orthogonal functions, wavelets, splines, or discrete kernel convolutions):
$$\displaystyle{ \mathbf{y}_{t} =\boldsymbol{ \Phi }\boldsymbol{\alpha }_{t}, }$$
where \(\boldsymbol{\Phi }\) is a matrix of basis functions, and \(\boldsymbol{\alpha }_{t}\) is a vector of corresponding spectral coefficients. Often the dimension of \(\boldsymbol{\alpha }_{t}\) is much smaller than the dimension of y t .

Similarly, for spatial models, the spatial process of interest may be written in terms of a linear combination of spatial basis functions. Comprehensive discussions of the choices of basis functions can be found in Wikle (2010) and Cressie and Wikle (2011). There are many examples of rank-reduced or low-rank spatial models in the literature including discrete process convolutions (Higdon 1998), empirical orthogonal functions (EOFs) (Wikle and Cressie 1999), spatial random effects model (Cressie and Johannesson 2008), and “predictive processes” (Banerjee et al. 2008).

Key Applications

Technological advances in remote sensing and monitoring networks and other methods of collecting spatial data in recent decades have revolutionized scientific endeavor in fields such as agriculture, climatology, ecology, economics, transportation, epidemiology, and health management, as well as many other areas. However, such technological advancements require a parallel effort in the development of techniques that enable researchers to make rigorous statistical inference given the wealth of new information at hand. The advancements of computational techniques for hierarchical spatial modeling in the last two decades have provided a flexible modeling framework for researchers to take advantage of available massive datasets for modeling complex problems.

Future Directions

In this entry, a brief overview of hierarchical spatial and spatiotemporal models is presented. In the recent decades, hierarchical models have drawn the attention of scientists in many fields and are especially suited to studying spatial and spatiotemporal processes. Recent computational advances and the development of efficient algorithms have provided the tools necessary for performing the extensive computations involved in hierarchical modeling. Advances in hierarchical modeling have created opportunities for scientists to take advantage of massive spatially referenced databases. Although the literature on hierarchical spatial modeling is rich, there are still many problems and issues yet to be considered. Below we briefly review some of these challenges.

In most spatial and spatiotemporal processes, researchers have to deal with data obtained by different sources as well as different scales. For example, a combination of Eulerian and Lagrangian data is often collected in sciences such as oceanography. Alignment and change of spatial support often presents a significant challenge for analysts. Spatial confounding and related identifiability issues are also challenging topics in spatial models (Hodges and Reich 2010). Spatial confounding has primarily been discussed for areal spatial data (e.g., Paciorek 2010), however, recently (Hanks et al. 2015) studied spatial confounding for geostatistical processes (i.e., continuous spatial support). Multivariate spatial and spatiotemporal models as well as nonlinear dynamical spatiotemporal models (e.g., Wikle and Hooten 2010) are active areas of research. Given the growth in size and complexity of spatial and spatiotemporal data, there is also need for more distributed computing in terms of more effective and efficient computing and database storage and management. There is a need for the development of efficient methods to address these issues.



  1. Banerjee S, Gelfand AE, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J R Stat Soc: Ser B (Stat Methodol) 70:825–848. doi:10.1111/j.1467-9868.2008.00663.xMathSciNetMATHCrossRefGoogle Scholar
  2. Banerjee S, Carlin BP, Gelfand AE (2015) Hierarchical modeling and analysis for spatial data, 2nd edn. CRC, Boca RatonMATHGoogle Scholar
  3. Berliner LM (1996) Hierarchical Bayesian time series models. In: Hanson K, Silver R (eds) Maximum entropy and Bayesian methods. Kluwer Academic, Dordrecht/Boston, pp 15–22CrossRefGoogle Scholar
  4. Besag J (1974) Spatial interactions and the statistical analysis of lattice systems (with discussion). J R Stat Soc Ser B 36:192–236MathSciNetMATHGoogle Scholar
  5. Besag J, York JC, Mollie A (1991) Bayesian image restoration, with two applications in spatial statistics (with discussion). Ann Inst Stat Math 43:1–59MATHCrossRefGoogle Scholar
  6. Carlin BP, Banerjee S (2003) Hierarchical multivariate CAR models for saptio-temporally correlated survival data (with discussion). In: Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M (eds) Bayesian statistics, vol 7. Oxford University Press, Oxford, pp 45–63Google Scholar
  7. Cressie NAC (1993) Statistics for spatial data. Wiley, New YorkMATHGoogle Scholar
  8. Cressie N, Huang H-C (1999) Classes of nonseparable spatio-temporal stationary covariance functions. J Am Stat Assoc 94:1330–1340MathSciNetMATHCrossRefGoogle Scholar
  9. Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. J R Stat Soc: Ser B (Stat Methodol) 70:209–226. doi:10.1111/j.1467-9868.2007.00633.xMathSciNetMATHCrossRefGoogle Scholar
  10. Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, HobokenMATHGoogle Scholar
  11. Diggle PJ, Tawn JA, Moyeed RA (1998) Model-based geostatistics (with discussions). Appl Stat 47(3):299–350MATHGoogle Scholar
  12. Givens GH, Hoeting JA (2005) Computational statistics. Wiley, HobokenMATHGoogle Scholar
  13. Hanks EM, Schliep EM, Hooten MB, Hoeting JA (2015) Restricted spatial regression in practice: geostatistical models, confounding, and robustness under model misspecification. Environmetrics. doi:10.1002/env.2331MathSciNetGoogle Scholar
  14. Higdon D (1998) Space and space-time modeling using process convolutions. In: Quantitative methods for current environmental issues. Springer, London, pp 37–56Google Scholar
  15. Hodges JS, Reich BJ (2010) Adding spatially-correlated errors can mess up the fixed effect you love. Am Stat 64(4):325–334MathSciNetMATHCrossRefGoogle Scholar
  16. Huang H-C, Cressie N (1996) Spatio-temporal prediction of snow water equivalent using the Kalman filter. Comput Stat Data Anal 22:159–175MathSciNetCrossRefGoogle Scholar
  17. Lindley DV, Smith AFM (1972) Bayes estimates for the linear model. J R Stat Soc Ser B 34:1–41MathSciNetMATHGoogle Scholar
  18. Paciorek CJ (2010) The importance of scale for spatial-confounding bias and precision of spatial regression estimators. Stat Sci 25(1):107–125MathSciNetMATHCrossRefGoogle Scholar
  19. Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B 71:319–392MathSciNetMATHCrossRefGoogle Scholar
  20. Stein ML (2005) Space-time covariance functions. J Am Stat Assoc 100:320–321MathSciNetCrossRefGoogle Scholar
  21. Waller LA, Xia BP, Gelfand AE (1997) Hierarchical spatio-temporal mapping of disease rates. J Am Stat Assoc 92:607–617MATHCrossRefGoogle Scholar
  22. West M, Harrison J (1989) Bayesian forecasting and dynamic models. Springer, New YorkMATHCrossRefGoogle Scholar
  23. Whittle P (1954) On stationary processes in the plane. Biometrika 41(3):434–449MathSciNetMATHCrossRefGoogle Scholar
  24. Wikle CK (2010) Low-rank representations for spatial processes. In: Handbook of spatial statistics. pp 107–118Google Scholar
  25. Wikle CK, Berliner LM (2005) Combining information across spatial scales. Technometrics 47:80–91MathSciNetCrossRefGoogle Scholar
  26. Wikle CK, Cressie N (1999) A dimension-reduced approach to space-time Kalman filtering. Biometrika 86(4):815–829MathSciNetMATHCrossRefGoogle Scholar
  27. Wikle CK, Hooten MB (2006) Hierarchical Bayesian spatio-temporal models for population spread. In: Clark JS, Gelfand AE (eds) Hierarchical modelling for the environmental sciences. Oxford University Press, Oxford, pp 145–169Google Scholar
  28. Wikle CK, Hooten MB (2010) A general science-based framework for dynamical spatio-temporal models. Test 19(3):417–451MathSciNetMATHCrossRefGoogle Scholar
  29. Wikle CK, Berliner LM, Cressie N (1998) Hierarchical Bayesian space-time models. J Environ Ecol Stat 5:117–154CrossRefGoogle Scholar
  30. Wikle CK, Millif RF, Nychka D, Berliner LM (2001) Spatiotemporal hierarchical Bayesian modeling: tropical ocean surface winds. J Am Stat Assoc 96:382–397MathSciNetMATHCrossRefGoogle Scholar

Recommended Reading

  1. Wikle CK (2015) Modern perspectives on statistics for spatio-temporal data. Wiley Interdiscip Rev Comput Stat 7(1):86–98MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Ali Arab
    • 1
  • Mevin B. Hooten
    • 2
  • Christopher K. Wikle
    • 3
  1. 1.Department of Mathematics and StatisticsGeorgetown UniversityWashingtonUSA
  2. 2.U.S. Geological Survey, Colorado Cooperative Fish and Wildlife Research Unit, Department of Fish, Wildlife, and Conservation Biology, Department of StatisticsColorado State UniversityFort CollinsUSA
  3. 3.Department of StatisticsUniversity of Missouri-ColumbiaColumbiaUSA