Hierarchical Spatial Models
A hierarchical spatial model is the product of conditional distributions for data conditioned on a spatial process and parameters, the spatial process conditioned on the parameters defining the spatial dependencies between process locations and the parameters themselves.
Scientists across a wide range of disciplines have long recognized the importance of spatial dependencies in their data and the underlying process of interest. Initially due to computational limitations, they dealt with such dependencies by randomization and blocking rather than the explicit characterization of the dependencies in their models. Early developments in spatial modeling started in the 1950s and 1960s motivated by problems in mining engineering and meteorology (Cressie 1993), followed by the introduction of Markov random fields (Besag 1974). The application of hierarchical spatial and spatiotemporal models have become increasingly popular since the advancements of computational techniques, such as MCMC methods, in the later years of the twentieth century.
Methods for spatial and spatiotemporal modeling are becoming increasingly important in the environmental sciences and other sciences where data arise from processes in spatial settings. Unfortunately, the application of traditional covariance-based spatial statistical models is either inappropriate or computationally inefficient in many problems. Moreover, conventional methods are often incapable of allowing the researcher to quantify uncertainties corresponding to the model parameters since the parameter space of most complex spatial and spatiotemporal models is very large.
A main goal in the rigorous characterization of natural phenomena is the estimation and prediction of processes as well as the parameters governing processes. Thus, a flexible framework capable of accommodating complex relationships between data and process models while incorporating various sources of uncertainty is necessary. Traditional likelihood-based approaches to modeling have allowed for scientifically meaningful data structures, though, in complicated situations with heavily parameterized models and limited or missing data; estimation by likelihood maximization is often problematic or infeasible. Developments in numerical approximation methods have been useful in many cases, especially for high-dimensional parameter spaces (e.g., Newton-Raphson and E-M methods, Givens and Hoeting 2005), though can still be difficult or impossible to implement and have no provision for accommodating uncertainty at multiple levels.
Hierarchical models, whereby a problem is decomposed into a series of levels linked by simple rules of probability, assume a very flexible framework capable of accommodating uncertainty and potential a priori scientific knowledge while retaining many advantages of a strict likelihood approach (e.g., multiple sources of data and scientifically meaningful structure). The years after introduction of the Bayesian hierarchical model and development of MCMC (i.e., Markov Chain Monte Carlo) have brought on an explosion of research, both theoretical and applied, utilizing and (or) developing hierarchical models.
- Stage 1.
Data Model: [data—process, data parameters]
- Stage 2.
Process Model: [process— process parameters]
- Stage 3.
Parameter Model: [data and process parameters].
The basic idea is to approach the complex problem by breaking it into simpler subproblems. Although hierarchical modeling is not new to statistics (Lindley and Smith 1972), this basic formulation for modeling complicated spatial and spatiotemporal processes in the environmental sciences is a relatively new development (e.g., Berliner 1996; Wikle et al. 1998). The first stage is concerned with the observational process or “data model,” which specifies the distribution of the data given the fundamental process of interest and parameters that describe the data model. The second stage then describes the process, conditional on other process parameters. Finally, the last stage models the uncertainty in the parameters, from both the data and process stages. Note that each of these stages can have many substages (e.g., Wikle et al. 1998, 2001).
The goal is to estimate the distribution of the process and parameters given the data. Bayesian methods are naturally suited to estimation in such hierarchical settings, although non-Bayesian methods can sometimes be utilized but often require additional assumptions. Using a Bayesian approach, the “posterior distribution” (i.e., the joint distribution of the process and parameters given the data) is obtained via Bayes’ theorem:
Bayesian statistics involves drawing statistical conclusions from the posterior distribution which is proportional to the data model (i.e., the likelihood) times the a priori knowledge (i.e., the prior). Bayes’ theorem is thus the mechanism that provides access to the posterior. Although simple in principle, the implementation of Bayes’ theorem for complicated models can be challenging. One challenge concerns the specification of the parameterized component distributions on the right-hand side of (2). Although there has long been a debate in the statistics community concerning the appropriateness of “subjective” specification of such distributions, such choices are a natural part of scientific modeling. In fact, the use of scientific knowledge in the prior distribution allows for the incorporation of uncertainty related to these specifications explicitly in the model. Another, perhaps more important, challenge, from a practical perspective, is the calculation of the posterior distribution. The complex and high-dimensional nature of many scientific models (and indeed, most spatiotemporal models) prohibits the direct evaluation of the posterior. However, MCMC approaches can be utilized to estimate the posterior distribution through iterative sampling. As previously mentioned, the use of MCMC has been critical for the implementation of Bayesian hierarchical models, in that realistic (i.e., complicated) models can be considered; this is especially evident in the analysis of spatial and spatiotemporal processes. Yet, typically the computational burden must be considered when formulating the conditional models in such problems. Thus, the model-building phase requires not only scientific understanding of the problem but in what ways that understanding can be modified to fit into the computational framework.
Nonanalytical hierarchical models can be fitted to data using high-level programming languages (such as R, S-plus, MATLAB) or low-level languages (such as C, C++, FORTRAN). High-level languages allow for efficient programming, whereas low-level languages often allow for more efficient execution. Alternatively, the freely distributed Bayesian computation software WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/) or its open-source version, OpenBUGS (http://www.openbugs.net/w/FrontPage), and its spatial package GeoBUGS can be used to carry out Bayesian computations (Banerjee et al. 2015). Another similar tool is JAGS (http://mcmc-jags.sourceforge.net/) which is based on BUGS but can be used on many platforms (as opposed to WinBUGS/OpenBUGS which are limited to Windows). The developers of the automated Gibbs sampler program, BUGS and WinBUGS, have been quick to point out the caveat that comes with misuse of the software (and Bayes methods in general) by cautioning that MCMC methods are not as robust as analytical methods and that the analyst should mindfully utilize such methods, especially when choosing prior distributions. Another computational tool that is recently enjoying great popularity is the integrated nested Laplace approximation (INLA; http://www.r-inla.org/; Rue et al. 2009). The INLA approach is a numerically implemented analytical solution for approximating posterior marginals in hierarchical models with latent Gaussian processes. A recent tool that is gaining popularity for carrying out Bayesian computations is STAN (http://mc-stan.org/) which is an open-source software coded in C++ based on Hamiltonian Monte Carlo methods and has several interfaces including an R interface (RStan).
In this section we focus on the process model stage of the hierarchical framework described in the previous section and specifically applied in spatial settings. We consider the two important cases of continuous and areal data and discuss popular modeling choices and their hierarchical forms.
General Hierarchical Spatial Model Framework
Process Models for Spatially Continuous Data
Process Models for Areal Data
Areal data (also known as lattice data) are spatially indexed data associated with geographic regions or areas such as counties or zip codes and are often presented as aggregated values over an areal unit with well-defined boundaries. Spatial association among the areal units is specified by defining neighborhood structure for the areas (regular or irregular) of interest. Examples of such data include a wide variety of problems from disease mapping in counties to modeling air pollution on a grid. Models described in this section are based on Markov random fields (MRFs). MRFs are a special class of spatial models that are suitable for data on discrete (countable) spatial domains in which a joint distribution of y i (for \(i = 1,\ldots, n\), where y i is the spatial process at spatial unit i) is determined by using a set of locally specified conditional distributions for each spatial unit conditioned on its neighbors. MRFs include a wide class of spatial models, such as auto-Gaussian models for spatial Gaussian processes, auto-logistic models for binary spatial random variables, auto-Gamma models for nonnegative continuous processes, and auto-Poisson models for spatial count processes. Here we focus on two popular auto-Gaussian models, CAR and SAR models.
Conditionally Autoregressive (CAR) Models
The implementation of the CAR model is convenient in hierarchical Bayesian settings because of the explicit conditional structure. Perhaps the most popular implementation of the CAR model is the pairwise difference formulation proposed by Besag et al. (1991), where C is decomposed into an adjacency matrix and a diagonal matrix containing information on the number of neighbors for each of the areal units which results in a simple and easy to fit version of the model. Although the convenient specification of CAR models makes them attractive for modeling areal data, the usage of these models often involves numerous theoretical and computational difficulties (e.g., singularity of the covariance function of the joint distribution results in the joint distribution being improper which is called the intrinsic CAR or ICAR; for more details see Banerjee et al. 2015). Several methods to overcome such difficulties have been proposed (e.g., Cressie 1993; Carlin and Banerjee 2003); however, the development of strategies to address the difficulties of CAR models is a topic of ongoing research.
Simultaneous Autoregressive (SAR) Models
Spatiotemporal processes are often complex, exhibiting different scales of spatial and temporal variability. Such processes are typically characterized by a large number of observations and prediction locations in space and time, differing spatial and temporal support, orientation and alignment (relative to the process of interest), and complicated underlying dynamics. The complexity of such processes in “real-world” situations is often intensified due to nonexistence of simplifying assumptions such as Gaussianity, spatial and temporal stationarity, linearity, and space-time separability of the covariance function. Thus, a joint perspective for modeling spatiotemporal processes, although relatively easy to formulate, is challenging to implement. On the contrary, a hierarchical formulation allows the modeling of complicated spatial and temporal structures by decomposing an intricate joint spatiotemporal process into relatively simple conditional models. The main advantage of the Bayesian hierarchical model over traditional covariance-based methods is that it allows the complicated structure to be modeled at a lower level in the hierarchy, rather than attempting to model the complex joint dependencies.
General Spatiotemporal Model
Dynamical Spatiotemporal Models
Multivariate Spatial and Spatiotemporal Models
Spatial and spatiotemporal models have recently been extended to accommodate multivariate situations (e.g., popular univariate models such as continuous spatial models (e.g., Kriging) and CAR models have been extended to include multivariate cases). The distinction between continuous data and areal data, as described for the univariate case, holds true for the multivariate case (Cressie and Wikle 2011). Multivariate approaches have the added advantage of not only being able to rely on covariate- and covariance-based information but to “borrow strength” between observation vectors (i.e., response variables) as well. Examples of such multivariate models are cokriging for multivariate continuous data, multivariate CAR models for areal data, and multivariate dynamic models.
Multivariate dynamical spatiotemporal models may be described from different perspectives. A comprehensive discussion of several approaches including augmenting the state process, conditioning on a common process, and conditional specification are provided in Cressie and Wikle (2011). This is an active area of research.
Spatial and spatiotemporal models are typically high dimensional. This characteristic complicates the modeling process and necessitates development of efficient computational algorithms on the one hand and implementation of dimension reduction methods (e.g., recasting the problem in a spectral context) on the other hand.
Similarly, for spatial models, the spatial process of interest may be written in terms of a linear combination of spatial basis functions. Comprehensive discussions of the choices of basis functions can be found in Wikle (2010) and Cressie and Wikle (2011). There are many examples of rank-reduced or low-rank spatial models in the literature including discrete process convolutions (Higdon 1998), empirical orthogonal functions (EOFs) (Wikle and Cressie 1999), spatial random effects model (Cressie and Johannesson 2008), and “predictive processes” (Banerjee et al. 2008).
Technological advances in remote sensing and monitoring networks and other methods of collecting spatial data in recent decades have revolutionized scientific endeavor in fields such as agriculture, climatology, ecology, economics, transportation, epidemiology, and health management, as well as many other areas. However, such technological advancements require a parallel effort in the development of techniques that enable researchers to make rigorous statistical inference given the wealth of new information at hand. The advancements of computational techniques for hierarchical spatial modeling in the last two decades have provided a flexible modeling framework for researchers to take advantage of available massive datasets for modeling complex problems.
In this entry, a brief overview of hierarchical spatial and spatiotemporal models is presented. In the recent decades, hierarchical models have drawn the attention of scientists in many fields and are especially suited to studying spatial and spatiotemporal processes. Recent computational advances and the development of efficient algorithms have provided the tools necessary for performing the extensive computations involved in hierarchical modeling. Advances in hierarchical modeling have created opportunities for scientists to take advantage of massive spatially referenced databases. Although the literature on hierarchical spatial modeling is rich, there are still many problems and issues yet to be considered. Below we briefly review some of these challenges.
In most spatial and spatiotemporal processes, researchers have to deal with data obtained by different sources as well as different scales. For example, a combination of Eulerian and Lagrangian data is often collected in sciences such as oceanography. Alignment and change of spatial support often presents a significant challenge for analysts. Spatial confounding and related identifiability issues are also challenging topics in spatial models (Hodges and Reich 2010). Spatial confounding has primarily been discussed for areal spatial data (e.g., Paciorek 2010), however, recently (Hanks et al. 2015) studied spatial confounding for geostatistical processes (i.e., continuous spatial support). Multivariate spatial and spatiotemporal models as well as nonlinear dynamical spatiotemporal models (e.g., Wikle and Hooten 2010) are active areas of research. Given the growth in size and complexity of spatial and spatiotemporal data, there is also need for more distributed computing in terms of more effective and efficient computing and database storage and management. There is a need for the development of efficient methods to address these issues.
- Carlin BP, Banerjee S (2003) Hierarchical multivariate CAR models for saptio-temporally correlated survival data (with discussion). In: Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M (eds) Bayesian statistics, vol 7. Oxford University Press, Oxford, pp 45–63Google Scholar
- Higdon D (1998) Space and space-time modeling using process convolutions. In: Quantitative methods for current environmental issues. Springer, London, pp 37–56Google Scholar
- Wikle CK (2010) Low-rank representations for spatial processes. In: Handbook of spatial statistics. pp 107–118Google Scholar
- Wikle CK, Hooten MB (2006) Hierarchical Bayesian spatio-temporal models for population spread. In: Clark JS, Gelfand AE (eds) Hierarchical modelling for the environmental sciences. Oxford University Press, Oxford, pp 145–169Google Scholar