1 Introduction

Modelling variation in disease or other events underlying observed totals for geographic areas is important for detecting elevated rates (Beale et al. 2008). In disease mapping, the observations often consist of incidence totals for chronic or infectious disease. Such data are subject to stochastic variations, and the underlying area specific incidence risks are often the focus in data mining studies. In such studies, the objects include extraction of underlying spatial and spatiotemporal patterns, including detection of elevated risk (hotspots) and spatial outliers (Shekhar et al. 2015). The particular focus of this paper is on ecological epidemiology, in the sense of focusing on population aggregates (Morgenstern 1995), namely geographic areas, and on environmental and socio-economic risk factors for infectious disease (Ploubidis et al. 2012). The applications are to human infectious disease, namely TB incidence.

Different forms of spatial correlation analysis or model have been proposed in disease applications (human and veterinary), environmental science, ecology, crime and other settings. For example, Wikle (2003) reviews hierarchical spatial models applied in environmental science, including irregular lattice data (such as geographic areas) and regular lattice data (such as air pollution grids). Beale et al. (2010) consider how regression findings for spatial ecology data are affected by the method used (if at all) to reflect spatial dependence. To exemplify hierarchical models for veterinary data, Pioz et al. (2012) apply simultaneous autoregressive (SAR) models to investigate bluetongue spread in French municipalities, while Farnsworth and Ward (2009) apply Bayesian conditional autoregressive (CAR) models to avian influenza H5N1 outbreak data. In such applications, identifying elevated risk in particular areas, detecting elevated risk clusters, or assessing significant predictors of risk, are emphasized, in methods recognizing the explicitly spatial structure of the data. However, the underlying assumptions of such techniques should be assessed, and subject to modification when indicated.

Hierarchical models involving spatial random effects, both CAR and SAR forms, can be estimated by classical methods (Alam et al. 2015; Horabik and Nahorski 2010) or Bayesian methods (Waller and Carlin 2010; Lesage 1997). CAR spatial priors imply local smoothing of outcome rates, that is smoothing towards the local rather than global average (Gelman 1996; Waller and Carlin 2010). Such local discontinuity is demonstrated in the England TB application considered below. Marked variability in risks has been detected in other area studies of infectious disease (Duarte-Cunha 2015; Varga et al. 2015; Ploubidis et al. 2012), whereas spatial variability in relative risks of chronic diseases (cancer, diabetes, etc.) is generally less pronounced. When there are spatial discontinuities in risk, it is preferable to allow differing strengths of association between neighbouring areas, as opposed to uniform local smoothing under CAR priors (Gelman 1996; Smith et al. 2015).

Bayesian applications in disease mapping and ecological epidemiology commonly employ a CAR prior (Lee 2011) to express spatial clustering in underlying risks (Besag et al. 1991; Best 1999), including human TB incidence (Nunes 2007; Maciel et al. 2010). Most applications of CAR priors assume a normal density for the underlying risks combined with uniform local smoothing. However, normality assumptions may be vitiated by heteroscedasticity linked to spatial outliers or to marked discrepancies in risk between neighbouring areas. It is also desirable that spatial disease models represent variation in area disease risks that is not attributable to spatial dependence (i.e. heterogeneity as against clustering). Some spatial priors may represent this feature by using more than one set of random effects, but at the cost of identifiability.

This paper considers modification of the local smoothing principle when there are spatial discontinuities, namely discrepant levels of outcome rates (e.g. disease or crime incidence) between neighbouring areas. In particular, we consider modifications of the normality assumption for area random effects based on a scale mixture version of the Leroux et al. (1999) model, allowing for heterogeneity and clustering in a single set of random effects, but with the scale mixture providing adaptivity to local discontinuity and spatial outliers. The relevance of such an approach is illustrated with simulated data on TB incidence in 49 mainland US states, and an application to observed TB incidence in 326 English local authorities.

2 Defining conditional spatial priors

As discussed by Besag and Kooperberg (1995), one may use properties of the multivariate normal to obtain the univariate conditional autoregressive prior from a joint spatial prior and vice versa. Thus consider a joint multivariate normal density for the spatial risk effects \( {\text{s}} = ({\text{s}}_{1} , \ldots ,{\text{s}}_{{\text{n}}} ) \) for \( \text{n} \) areas, with mean zero and covariance \( \Sigma _{{\text{s}}} \),

$${\text{p(s)}} = \tfrac{1}{{(2\pi )^{{{\text{n/2}}}} }}\left| {\Sigma _{{\text{s}}} } \right|^{{ - 0.5}} \exp \left( { - 0.5{\text{s}}^{\prime } \Sigma_{{\text{s}}}^{{ - 1}} s } \right) $$
(1)

Denote

$$ {\text{Q = [q}}_{\text{ij}} ] { = }\Sigma_{\text{s}}^{ - 1} $$

as the precision matrix, and \( {\text{s}}_{{{\text{[i]}}}} = ({\text{s}}_{1} , \ldots ,{\text{s}}_{{{\text{i}} - 1}} ,{\text{s}}_{{{\text{i}} + 1}} , \ldots ,{\text{s}}_{{\text{n}}} ) \) as the totality of effects omitting the ith effect. The conditional distributions for each \( {\text{s}}_{\text{i}} \) take a univariate normal form (Rue and Held 2005, p. 22), namely

$$ {\text{s}}_{{\text{i}}} |{\text{s}}_{{[{\text{i}}]}} \sim {\text{N}}\left( {\sum\limits_{{{\text{j}} \ne {\text{i}}}} {\left[ { - \tfrac{{{\text{q}}_{{{\text{ij}}}} }}{{{\text{q}}_{{{\text{ii}}}} }}} \right]{\text{s}}_{{\text{j}}} } ,\tfrac{1}{{{\text{q}}_{{{\text{ii}}}} }}} \right) $$
(2)

Following Besag and Kooperberg (1995, p 734) define \( {\text{h}}_{\text{ii}} { = 0,} \) and set

$$ {\text{h}}_{\text{ij}}\,=\,- {\text{q}}_{\text{ij}} / {\text{q}}_{\text{ii}} \;\quad ( {\text{i}} \ne {\text{j)}} . $$

Also set

$$ {\text{q}}_{\text{ii}} {\text{ = a}}_{\text{i}} /\updelta $$

with variance parameter \( \updelta \), so that

$$ {\text{h}}_{\text{ij}}\,=\,- {\text{q}}_{\text{ij}} \updelta / {\text{a}}_{\text{i}} $$
(3)

The density (2) is then in the conditional autoregressive form specified by Besag (1974), namely

$$ {\text{s}}_{\text{i}} | {\text{s}}_{{ [ {\text{i]}}}} \sim {\text{N}}\left( {\sum\limits_{{{\text{j}} \ne {\text{i}}}} {{\text{h}}_{\text{ij}} {\text{s}}_{\text{j}} ,\,\updelta / {\text{a}}_{\text{i}} } } \right) $$

To obtain the joint density from the conditional one, symmetry of \( \text{Q} \) means \( - {\text{q}}_{\text{ij}}\,=\,- {\text{q}}_{\text{ji}} \), so that from (3) the constraint

$$ {\text{h}}_{\text{ij}} {\text{a}}_{\text{i}}\,=\,{\text {h}}_{\text{ji}} {\text{a}}_{\text{j}} $$
(4)

applies.

3 Conditional autoregressive spatial priors

Various schemes for defining the \( {\text{h}}_{\text{ij}} \) and \( {\text{a}}_{\text{i}} \) can be used. A measure of spatial dependence \( 0 \le \upomega \le 1 \) is included by setting

$$ {{\text{h}_{{\text{ij}}}}} = \omega \frac{{w_{ij} }}{{\sum\limits_{k \ne i} {w_{ik} } }}, {\text{a}}_{\text{i}}\,=\,\sum\limits_{{{\text{k}} \ne {\text{i}}}} {\text{w}_{ik}} $$

where \( \text{w}_{ij} \) represent spatial interactions between areas i and j. If the interactions are specified as symmetric with \( {\text{w}}_{\text{ij}}\,=\,{\text{w}}_{\text{ji}} \), and also with \( {\text{w}}_{\text{ii}}\,=\,{0} \), the symmetry constraint (4) is ensured, with \( {\text{h}}_{{\text{ij}}} {\text{a}}_{{\text{i}}} = \omega {\text{w}}_{{\text{ij}}} = {\text{h}}_{{\text{ji}}} {\text{a}}_{{\text{j}}} \).

A common approach sets \( {\text{w}}_{\text{ij}}\,=\,{1} \) for adjacent areas and \( {\text{w}}_{\text{ij}} \,=\,{0} \) otherwise, with

$$ {\text{a}}_{\text{i}}\,=\,\sum\limits_{{{\text{k}} \ne {\text{i}}}} {{\text{w}}_{\text{ik}} }\,=\,{\text{d}}_{\text{i}} $$

then equal to the number, \( {\text{d}}_{\text{i}} \), of areas adjacent to area i. Equivalently, \( {\text{d}}_{\text{i}} \) is the number of areas in the locality \( {\text{N}}_{\text{i}} \) of area i (the areas surrounding area i, and excluding area i itself). This provides the conditionally autoregressive \( CAR(\upomega ) \) prior, with

$$ {\text{s}}_{\text{i}} | {\text{s}}_{{ [ {\text{i]}}}} \sim \text{N} \left (\upomega {\bar{A}_{i} ,\,\tfrac{\delta }{{d}_{i} }} \right) $$
(5)

where \( {\bar{\text{A}}}_{\text{i}} \) is the average of the \( \text{s}_{j} \) in locality \( \text{N}_{i} \), i.e.

$$ {\bar{\text{A}}}_{\text{i}} = \frac{{\sum\limits_{{{\text{j}} \in {\text{N}}_{\text{i}} }} {{\text{s}}_{\text{j}} } }}{{{\text{d}}_{\text{i}} }}. $$

Lower values of \( \upomega \) imply lesser degrees of spatial dependence between the \( {\text{s}}_{\text{i}} \), though the limiting case when \( \upomega = 0 \) has the disadvantage that the variance is not constant but depends on the number of neighbours \( {\text{d}}_{\text{i}} \). The \( {\text{CAR(1)}} \) prior (Besag et al. 1991) specifies relative risks entirely determined by spatial dependence, with

$$ {\text{s}}_{\text{i}} | {\text{s}}_{{ [ {\text{i]}}}} \sim {\text{N}}\left( {\sum\limits_{{{\text{j}} \in {\text{N}}_{\text{i}} }} {{\text{s}}_{\text{j}} / {\text{d}}_{\text{i}} ,\,\tfrac{\updelta }{{{\text{d}}_{\text{i}} }}} } \right) $$

In any set of area disease rates, some spatial correlation is typically detected, and this motivates spatial priors which imply borrowing of strength from nearby areas. However, there may also be particular local variations in illness risks unrelated to those in surrounding areas, namely unstructured variation without spatial dependence. In principle, the \( CAR(\upomega ) \) prior (also called the proper CAR prior) can represent various levels of spatial dependence through the \( \upomega \) parameter, but this parameter does not calibrate well with marginal measures of spatial correlation, such as Moran’s I (Banerjee et al. 2004; Rodrigues and Assunção 2012). Values of \( \upomega \) exceeding 0.99 are needed to achieve modest values of Moran’s I.

In practice, to represent a mix between spatial dependence and simple unstructured variation, called clustering and heterogeneity respectively by Clayton et al. (1993), a common strategy is the so-called convolution prior (Ugarte et al. 2005; Waller and Carlin 2010). This represents the unknown area relative risk as a sum of a pure spatial effect following a \( {\text{CAR(1)}} \) prior, combined with an iid (or unstructured) random effect. Thus denote observed disease counts as \( {\text{y}}_{\text{i}} \), expected counts as \( \text{E}_{i} \) (expected disease counts in the demographic sense) and known area risk variables (predictors) as \( {\text{X}}_{\text{i}} \). Then one might specify

$$ {\text{y}}_{{\text{i}}} \sim {\text{Po}}(\uprho _{{\text{i}}} {\text{E}}_{{\text{i}}} ), $$
$$ {\text{log(}}\uprho _{{\text{i}}} ){\text{ = X}}_{{\text{i}}} \upbeta {\text{ + s}}_{{\text{i}}} {\text{ + h}}_{{\text{i}}} $$
(6.1)
$$ \text{s}_{\text{i}} |\text{s}_{[\text{i}]} \sim \text{N}\left( {\sum\limits_{{{\text{j}} \in {\text{N}}_{\text{i}} }} {{\text{s}}_{\text{j}} / {\text{d}}_{\text{i}} ,\,\tfrac{\updelta }{{{\text{d}}_{\text{i}} }}} } \right) $$
(6.2)
$$ {\text{h}}_{{\text{i}}} \sim {\text{N}}({\text{0}},\upphi ) $$
(6.3)

where ρi denotes an area specific relative risk, and \( \upphi \) is a variance term for iid unstructured effects \( {\text{h}}_{\text{i}} \). A drawback with this scheme is that identifiability may be impeded by the presence of two sets for random effects representing one underlying aspect of the data, namely variation in area illness risks.

4 The Leroux et al. spatial prior

A scheme for area effects, incorporating both clustering and heterogeneity, involves scale adjustments

$$ {\text{a}}_{\text{i}}\,=\,{(1} - \uplambda ) \,{ + }\,\uplambda \sum\limits_{{{\text{j}} \ne {\text{i}}}} {{\text{w}}_{\text{ij}} },$$

with the parameter \( 0 \le \uplambda \le 1 \) providing a measure of spatial dependence (Leroux et al. 1999). This scheme, which may be represented as the LLB prior by virtue of its authors, has the benefit that only one set of random effects is involved in representing the pattern of area illness risks. This provides improved identifiability as compared to the convolution prior (Lee 2011). The case \( \uplambda = 0 \) corresponds to a lack of spatial interdependence (and i.i.d or unstructured errors \( {\text{s}}_{\text{i}} \)), with the advantage that the conditional variance is then simply \( \updelta \), independent of \( \sum\limits_{{{\text{j}} \ne {\text{i}}}} {{\text{w}}_{\text{ij}} } \). By contrast, \( \uplambda = 1 \) leads to a \( CAR(1) \) model, with purely spatial interdependence. In typical datasets \( \uplambda \) will be intermediate between these extreme values.

The symmetry condition \( {\text{h}}_{{{\text{ij}}}} {\text{a}}_{{\text{i}}} = {\text{h}}_{{{\text{ji}}}} {\text{a}}_{{\text{j}}} \) is maintained by setting

$$ {\text{h}}_{\text{ij}} = \frac{{\uplambda {\text{w}}_{\text{ij}} }}{{( 1- \uplambda ) + \uplambda \sum\limits_{{{\text{j}} \ne {\text{i}}}} {{\text{w}}_{\text{ij}} } }} $$

since \( {\text{h}}_{\text{ij}} {\text{a}}_{\text{i}} = \uplambda {\text{w}}_{\text{ij}} = \uplambda {\text{w}}_{\text{ji}} = {\text{h}}_{\text{ji}} {\text{a}}_{\text{j}} . \) So the conditional prior is

$$ {\text{s}}_{\text{i}} | {\text{s}}_{{ [ {\text{i]}}}} \sim {\text{N}}\left( {\frac{\uplambda }{{1 - \uplambda + \uplambda \sum\limits_{{{\text{j}} \ne {\text{i}}}} {{\text{w}}_{\text{ij}} } }}\sum\limits_{{{\text{j}} \ne {\text{i}}}} {{\text{w}}_{\text{ij}} {\text{s}}_{\text{j}} ,} \frac{\updelta }{{1 - \uplambda + \uplambda \sum\limits_{{{\text{j}} \ne {\text{i}}}} {{\text{w}}_{\text{ij}} } }}} \right) $$
(7)

with \( \updelta \) a scale parameter. When \( \uplambda = 0 \) one obtains normal iid effects \( {\text{s}}_{\text{i}} \sim {\text{N}}(0,\,\updelta ). \) If the \( {\text{w}}_{\text{ij}} \) are defined by contiguity one obtains (Leroux et al. 1999, p 181)

$$ {\text{s}}_{\text{i}} | {\text{s}}_{{ [ {\text{i]}}}} \sim {\text{N}}\left( {\tfrac{\uplambda }{{1 - \uplambda + \uplambda {\text{d}}_{\text{i}} }}\sum\limits_{{{\text{j}} \in {\text{N}}_{\text{i}} }} {{\text{s}}_{\text{j}} ,\,} \frac{\updelta }{{1 - \uplambda + \uplambda {\text{d}}_{\text{i}} }}} \right) $$
(8)

5 Adaptiveness to non-normality and spatial discontinuities

Proposals to modify spatial priors to achieve greater robustness have been made, including the presence of heteroscedasticity and heavier tails (excess kurtosis) than under the normal. Thus Yan (2007), Brewer and Nolan (2007), and Reich and Hodges (2008) propose modified CAR priors to accommodate heteroscedasticity. Other forms of modified spatial prior are considered by Nathoo and Ghosh (2013) and Lawson and Clark (2002). These schemes are all modifications of the CAR prior, or of the convolution prior, as considered in Sect. 3. Modifications of the pure spatial \( CAR(1) \) prior, without allowance for spatially unstructured variation, may be appropriate for particular applications, such as dental decay as in Reich and Hodges (2008), but for area illness data an allowance for heterogeneity is generally needed. Modification of the proper \( CAR(\upomega ) \) prior are left with the problem that its \( \upomega \) parameter does not calibrate well with marginal measures of spatial correlation. Studies such as Yan (2007) and Lawson and Clark (2002) modify the convolution prior, with potential identifiability problems due to multiple sets of random effects. Thus Yan (2007) allows for heteroscedasticity in spatial effects via a double implementation of the \( CAR(1) \) prior, namely

$$ {\text{y}}_{\text{i}} \sim {\text{Po}}(\uprho_{\text{i}} {\text{E}}_{\text{i}} ), $$
$$ \log (\uprho_{\text{i}} ) = {\text{X}}_{\text{i}} \upbeta + {\text{s}}_{\text{i}} + {\text{h}}_{\text{i}}, $$
$$ {\text{s}}_{\text{i}} | {\text{s}}_{{ [ {\text{i]}}}} \sim {\text{N}}\left( {\sum\limits_{{{\text{j}} \in {\text{N}}_{\text{i}} }} {{\text{s}}_{\text{j}} } /{\text{d}}_{\text{i}} ,\,\tfrac{{\updelta_{\text{s}} }}{{{\text{d}}_{\text{i}} }}} \right), $$
$$ {\text{h}}_{\text{i}} \sim {\text{N}}(0,\,\upvarphi_{\text{i}} ), $$
$$ \log (\upvarphi_{\text{i}} ) = \mu_{\text{h}} {\text{ + r}}_{\text{i}}, $$
$$ {\text{r}}_{\text{i}} | {\text{r}}_{{ [ {\text{i]}}}} \sim N\left( {\sum\limits_{{{\text{j}} \in {\text{N}}_{\text{i}} }} {{\text{r}}_{\text{j}} /{\text{d}}_{\text{i}} ,\,\tfrac{{\updelta_{\text{r}} }}{{{\text{d}}_{\text{i}} }}} } \right). $$

Here we modify the constant scale assumption of the LLB prior in (7) and (8) using a scale mixture, with the benefit of providing an indicator of potential outlier status for each area. To implement a scale mixture, define \( \upkappa_{\text{i}} \sim {\text{Ga(0}} . 5\nu , 0. 5\nu ) \) where \( \nu \) is a hyperparameter. The proposed model reduces to the scale mixture version of the Student t when \( \uplambda = 0 \) (Boris Choy and Chan 2008). The \( \upkappa_{\text{i}} \) have average 1 with small values of \( \upkappa_{\text{i}} \) (under 1) acting as indicators of outlier status (West 1984). Under this scale mixture modification, the symmetry condition (4) is maintained by setting

$$ {\text{a}}_{\text{i}} = \upkappa_{\text{i}} \left[ {(1 - \uplambda ) + \uplambda \sum\limits_{{{\text{j}} \ne {\text{i}}}} {{\text{w}}_{\text{ij}} } } \right], $$
$$ {\text{h}}_{\text{ij}} = \tfrac{{\uplambda {\text{w}}_{\text{ij}} \upkappa_{\text{j}} }}{{\left[ {1 - \uplambda + \uplambda \sum\limits_{{{\text{j}} \ne {\text{i}}}} {{\text{w}}_{\text{ij}} } } \right]}}, $$

since \( {\text{h}}_{\text{ij}} {\text{a}}_{\text{i}} = \uplambda {\text{w}}_{\text{ij}} \upkappa_{\text{j}} \upkappa_{\text{i}} = \uplambda {\text{w}}_{\text{ji}} \upkappa_{\text{i}} \upkappa_{\text{j}} = {\text{h}}_{\text{ji}} {\text{a}}_{\text{j}} \).Then the model for incidence counts becomes

$$ \text{y}_{i} \sim \text{Po}(\upmu_{\text{i}} ), $$
$$ \upmu_{i} = \uprho_{\text{i}} \text{E}_{\text{i}} , $$
$$ \log (\uprho_{i} ) = \text{X}_{\text{i}} \upbeta + \text{s}_{\text{i}} ,$$

where the conditional prior when the \( \text{w}_{\text{ij}} \) are binary indicators of adjacency is

$$ \text{s}_{\text{i}} |\text{s}_{[\text{i}]} \sim \text{N}\left( {\tfrac{\uplambda }{{1 - \uplambda + \uplambda \text{d}_{\text{i}} }}\sum\limits_{{\text{j} \in \text{N}_{\text{i}} }} {\upkappa_{\text{j}} \text{s}_{\text{j}} ,\,\tfrac{\updelta }{{\upkappa_{\text{i}} [1 - \uplambda + \uplambda \text{d}_{\text{i}} ]}}} } \right). $$
(9)

This prior reduces to an unstructured i.i.d scale mixture Student-t density

$$ {\text{s}}_{\text{i}} \sim {\text{N}}(0,\,\updelta /\upkappa_{\text{i}} ), $$

when \( \uplambda = 0 \).

From (9) it can be seen that small \( \upkappa_{j} \) values indicate areas discrepant in risk from their neighbours (i.e. they indicate outliers in spatial terms), and reduce the amount of spatial borrowing of strength. Equivalently stated, a clustering of small \( \upkappa_{j} \) values can be taken as indicators of spatial volatility, namely discrepant illness risks in a set of adjacent areas. In regression applications, small \( \upkappa_{j} \) values will also indicate where the regression predictions in the neighbourhood of area \( \text{i} \), and their implied neighbourhood relative risk \( \sum\limits_{{{\text{j}} \in {\text{N}}_{{\text{i}}} }} {\upmu _{{\text{j}}} } /\sum\limits_{{{\text{j}} \in {\text{N}}_{{\text{i}}} }} {{\text{E}}_{{\text{j}}} } \), are discrepant from the modelled relative risk in area \( \text{i} \) itself \( \upmu_{\text{i}} /\text{E}_{\text{i}} \).

Let \( \uptau = 1/\updelta , \) and let \( I(i \sim j) = I(j \sim i) \) denote that areas \( i \) and \( j \) are neighbours under binary adjacency. Then the precision matrix in the multivariate normal (1) has diagonal terms

$$ \text{Q}_{\text{ii}} = \uptau \text{a}_{\text{i}} = \uptau \upkappa_{\text{i}} \left[ {(1 - \uplambda ) + \uplambda \sum\limits_{\text{j} \ne \text{i}} {\text{w}_{\text{ij}} } } \right], $$

and off diagonal terms

$$ \text{Q}_{\text{ij}} = - \uptau \text{a}_{\text{i}} \text{h}_{\text{ij}} = - \uptau \uplambda \upkappa_{\text{i}} \upkappa_{\text{j}} \text{I}(\text{i} \sim \text{j}). $$

A scale mixture approach to spatial dependence can be set within a broader literature on heavy tailed priors (e.g. student t, double exponential) that can be represented as two level hierarchical models (Yi and Xu 2008). One application of such priors is to predictor selection in high dimensional regression, with a likelihood penalty function that is a normal scale mixture (e.g. Polson et al. 2014). Besag et al. (1991) propose a double exponential prior for spatial effects as a robust alternative to the normal conditional autoregressive, with an application provided by Manda (2013).

Identification of random effects in spatial disease models is often problematic (e.g. MacNab 2014; Nathoo and Ghosh 2013), especially for models including multiple random effects, or when disease counts are relatively small. In the case of the model just discussed, identification of outliers (e.g. in terms of significantly low \( \upkappa_{i} \)), as well as identification of elevated risks \( \text{s}_{\text{i}} , \) will be improved for larger disease counts and/or longer observation periods. Identification of hyperparameters may also be problematic, especially with small samples. For example, in student t binary regression with data augmentation, Gelman et al. (2004, p 447) recommend a robust analysis with \( \nu \) not estimated but preset at 4.

6 Simulation example

A simulation example of the heteroscedastic LLB prior involves TB incidence with a spatial framework provided by the \( \text{n} = 49 \) mainland states (including the District of Columbia). Expected TB incidence counts \( \text{E}_{\text{i}} \) are obtained by applying actual US-wide age specific rates for TB in 2013 to state population estimates for 2013, taken from the US National Cancer Institute SEER site (http://seer.cancer.gov/popdata/). TB incidence rates are from the CDC National Tuberculosis Surveillance System, with just over 9500 incident cases in 2013, and an all ages rate of 3 per 100,000. Highest rates (over 6 per 100 thousand) are for the 75-84 and 85 + age groups.

We simulate TB incidence counts using these expected counts as offsets. The LLB hyperparameters (guide values) are set as \( \uplambda = 0.7, \) \( \uptau = 3 \), and with \( \nu \) taking values 3,10, and 25. Although the student t is defined for degrees of freedom of 2 or less, it has infinite variance, and Gelman et al. (2004) mention that “t’s with one or two degrees of freedom have infinite variance and are not usually realistic in the far tails”. One hundred sets of random effects are generated from the multivariate normal \( {\text{s}}_{{ 1\;\; :\;\;{\text{n}}}} \sim {\text{N(0,}}\,{\text{Q}}^{ - 1} ) \). Simulated TB incidence counts are then obtained via a Poisson simulation \( {\text{y}}_{\text{i}} \sim {\text{Po(E}}_{\text{i}} \uprho_{\text{i}} ) \), with \( \log (\uprho_{\text{i}} ) = \upbeta_{0} + \text{s}_{\text{i}} , \) where \( \upbeta_{0} = - 0.1 \), and \( \uprho_{i} \) is the simulated disease relative risk in state \( \text{i} \) (relative to that expected on the basis of US wide incidence levels). The R code used is set out in “Appendix”. Note that each of the 100 simulations involves a separate sample of \( \upkappa_{\text{i}} \sim Ga(0.5\nu ,\,0.5\nu ) \).

Analyses to estimate the parameters from the 100 sets of simulated data \( \{ y,\,E\} \) (with \( \text{E} \) as in the simulations) are carried out using the WINBUGS package (Lunn et al. 2009). An exponential prior with mean 10 is adopted for \( \nu \,\; \)(Fernandez and Steel 1998; Geweke 1993), a gamma prior with shape 1 and index 0.01 assumed for the inverse variance parameter \( \uptau \), a normal prior with mean zero and precision 0.001 assumed for the fixed effect \( \upbeta_{0} \), and a uniform \( U(0,\,1) \) prior assumed on \( \uplambda \). Estimates are based on the last 5000 iterations from two chain runs of 10,000 iterations, with convergence assessed using Brooks–Gelman–Rubin diagnostics (Brooks and Gelman 1998).

The focus is on the posterior means for the main parameters of the LLB prior and risk regression over the 100 samples, namely \( \nu ,\,\uplambda ,\,\upbeta_{0} \), and the variance of the spatial effects (which depends on both \( \uptau \) and the sampled \( \upkappa_{i} ) \). The posterior densities for \( \nu \) tend to be positively skew, so Table 1 also includes results for the posterior summary of log(\( \nu \)). Because each simulation involves a distinct set of \( \upkappa_{i} \), the actual variance of the \( \text{s}_{\text{i}} \) will vary between simulations . This variance \( \text{V}_{\text{t}} \) of spatial effects for simulation \( t \) is recorded in the vector var.s[] in the code in “Appendix”. Table 1 sets out the percentiles (20th, 50th, 80th) of the 100 posterior means for \( \nu ,\,\log (\nu ),\,\uplambda , \) and \( \upbeta_{0} \), and also the proportion of simulated datasets where the 95 % credible interval for a parameter includes the guide value. Thus for the setting \( \nu = 10 \), 50 out of the 100 samples have posterior means for \( \nu \) below 10.4, and 50 samples have posterior means above 10.4.

Table 1 Recovered parameter estimates from 100 simulated datasets

The expected \( \text{E}_{\text{i}} \) are relatively large, so the Poisson simulations may be subject to some excess dispersion, which to some extent attenuates the spatial structure present in the simulated data. Nevertheless, the recovered parameters effectively reproduce those used in generating the data. This feature is also apparent in a correlation between the actual and estimated \( \text{V}_{\text{t}} \) over the 100 samples of 0.97. Figure 1 plots the two series of \( \text{V}_{\text{t}} \) for the \( \nu = 10 \) option, including 95 % credible intervals for the estimated \( \text{V}_{\text{t}} \). Of substantive relevance in interpreting the parameters of the LLB model, there is a 0.72 correlation between the 100 posterior means for \( \uplambda , \) and the corresponding posterior means for Moran’s I, which are estimated from the \( \text{s}_{\text{i}} \) in each dataset. To further illustrate variation over the simulations, Fig. 2 shows, for each simulated dataset, the posterior mean (and 95 % interval) of log(\( \nu ) \) under the \( \nu = 3 \) option.

Fig. 1
figure 1

a Simulated and estimated spatial effect variances: US Mainland States (1st 50 Samples). b Simulated and estimated spatial effect variances: US Mainland States (2nd 50 Samples)

Fig. 2
figure 2

Posterior intervals, re-estimated log(ν), 100 simulated datasets with ν = 3

One also wishes to reproduce the patterns of outlier status (areas with significantly low \( \upkappa_{i} \)). This involves, for the setting \( \nu = 10 \) (and other hyperparameters as above), simulating 100 sets of \( \text{y} \) based on a single set of \( \upkappa_{\text{i}} \) values (the “actual” \( \upkappa_{\text{i}} ) \), sampled from a gamma density, \( \upkappa_{\text{i}} \sim Ga(5,\,5). \) The expected incidence counts are multiplied by 10 to increase the amount of information provided by the data. Re-estimation of \( \upkappa_{\text{i}} \) from the simulated datasets shows a shrinkage effect, with posterior mean re-estimated \( \upkappa_{\text{i}} \) closer to 1 than the actual \( \upkappa_{\text{i}} \) (see Fig. 3). However, the re-estimation does identify as outliers the states with unusually low actual \( \upkappa_{\text{i}} . \) For the five states with the lowest actual \( \upkappa_{\text{i}} , \) four have 95 % credible intervals on the re-estimated \( \upkappa_{\text{i}} \) that are entirely below 1, and no other states have re-estimated \( \upkappa_{\text{i}} \) with credible intervals entirely below 1.

Fig. 3
figure 3

Pre-simulated and posterior mean re-estimated κi. simulated data (100 datasets) with preset κi

7 Application: TB incidence for England local authorities

An application involves TB incidence data \( \text{y} \) for 326 English local authorities between 2011 and 2013. Two analyses are undertaken, one without predictors and one with two predictors: an index of multiple socio-economic deprivation (\( \text{X}_{1} \)) and population density (\( \text{X}_{2} \)). The impact of poverty on TB incidence is well documented (Lopez de Fede et al. 2008) and population density is associated with infectious disease risk as “the likelihood that a susceptible person will be exposed to an infectious tuberculosis patient increases with population density” (Rieder 1999). The two predictors are centred and divided by 100. Thus with predictors \( \text{X}_{\text{i}} = (1,\,\text{X}_{1\text{i}} ,\,\text{X}_{2\text{i}} ) \), under the scale mixture model we have

$$ \text{y}_{\text{i}} \sim \text{Po}(\upmu_{\text{i}} ), $$
$$ \upmu_{\text{i}} = \uprho_{\text{i}} \text{E}_{\text{i}} , $$
$$ \log (\uprho_{\text{i}} ) = \text{X}_{\text{i}} \upbeta + \text{s}_{\text{i}} , $$
$$ {\text{s}}_{\text{i}} | {\text{s}}_{{ [ {\text{i]}}}} \sim {\text{N}}\left( {\tfrac{\uplambda }{{[1 - \uplambda + \uplambda {\text{d}}_{\text{i}} ]}}\sum\limits_{{{\text{j}} \in {\text{N}}_{\text{i}} }} {\upkappa_{\text{j}} {\text{s}}_{\text{j}} ,\,\tfrac{\updelta }{{\upkappa_{\text{i}} [1 - \uplambda + \uplambda {\text{d}}_{\text{i}} ]}}} } \right), $$
$$ \upkappa_{\text{i}} \sim Ga(0.5\nu ,\,0.5\nu ). $$

For the original Leroux et al. (1999) scheme, the conditional prior for \( \text{s}_{\text{i}} \) is as in (8).

Prior settings are as in Sect. 6, and inferences are from the last 5000 iterations from two chain runs of 10,000 iterations, with convergence assessed using Brooks–Gelman–Rubin diagnostics. Table 2 contains parameter summaries and comparison of measures of fit between the original LLB model (Sect. 4) and the heteroscedastic Leroux (Sect. 5). Fit is assessed using the posterior predictive loss (PPL) criterion (Gelfand and Ghosh 1998). Consider replicate observations \( \text{y}_{\text{rep}} \) sampled from the posterior predictive density \( \text{p}(\text{y}_{\text{rep}} |\text{y}) \). The PPL involves defining \( \text{t(z)} = \text{z}\log \text{z} - \text{z} \), and \( \xi_{\text{i}} = \text{t(y}_{\text{i},\,\text{rep}} ). \) Letting \( \eta_{i} \) and \( \phi_{i} \) denote posterior means for \( \text{y}_{\text{i},\,\text{rep}} \) and \( \xi_{\text{i}} , \) the PPL is

$$ 2\sum\limits_{{\text{i}}} {\{ \phi _{{\text{i}}} - {\text{t}}(} \eta _{{\text{i}}} )\} + 2({\text{k + 1}})\sum\limits_{{\text{i}}} {\left\{ {\tfrac{{{\text{t}}(\eta _{{\text{i}}} ) + {\text{kt}}({\text{y}}_{{\text{i}}} )}}{{{\text{k + 1}}}} - {\text{t}}\left( {\tfrac{{\eta _{{\text{i}}} + {\text{ky}}_{{\text{i}}} }}{{{\text{k + 1}}}}} \right)} \right\}} . $$

where the left term is a penalty complexity, and different \( \text{k} \) values put different stress on fit and parsimony. In Table 2, two values of \( \text{k} \) are used, \( \text{k} = 0.5 \) and \( \text{k} = 5 \), with the latter putting more stress on goodness of fit.

Table 2 Model assessment and parameter summaries, models without and including predictors

Also presented are predictive checks based on replicate observations. Posterior predictive probabilities \( \Pr (\text{y}_{\text{i},\,\text{rep}} > \text{y}_{\text{i}} |\text{y}) \) in extreme tails (e.g. values under 0.1 or over 0.9) indicate poorly fitted cases. The mixed predictive scheme (Marshall and Spiegelhalter 2003), providing checks that are close to leave-one-out cross validation (Green et al. 2009), was also applied. This involves sampling new random effects \( \text{s}_{\text{i},\,\text{rep}} \), and then sampling replicate data \( \text{y}_{\text{i},\,\text{rep},\,\text{mixed}} \) conditional on these new effects.

Table 2 shows that fit is generally improved under the heteroscedastic option, and predictive checks are also improved. The estimates for \( \uplambda \) suggest that spatial dependence is not overly pronounced, and hence illustrate the broader principle that a spatial prior represent unstructured as well as structured variation: estimates of \( \uplambda \) are all under 0.8. Figure 4 demonstrates disjunction between high risk and adjacent low risk areas. Table 2 also shows positive effects for both predictors but less precise effects under the scale mixture approach, in line with a general principle that neglecting heteroscedasticity may lead to mis-stated regression coefficient standard errors.

Fig. 4
figure 4

Modelled relative risks of TB incidence, scale mixture model

Table 3 contains a more detailed assessment of predictive discrepancies between the two approaches for the regression without predictors. As mentioned above, the \( \upkappa_{\text{i}} \) effects will act to identify spatial outliers, with illness levels discrepant from their neighbours, and so Table 3 contains the 20 areas with the lowest posterior mean \( \upkappa_{\text{i}} \) under the scale mixture approach. One may assess spatial outlier status to some extent from the observed data. The first two columns of Table 3 contain maximum likelihood (ML) relative risks in each area \( \text{R}_{\text{i}} = \text{y}_{\text{i}} /\text{E}_{\text{i}} \), and relative risks in the neighbourhoods \( \text{N}_{\text{i}} \) of each area, with ML estimates \( \text{L}_{\text{i}} = \) \( \sum\limits_{{\text{j} \in \text{N}_{\text{i}} }} {\text{y}_{\text{j}} } /\sum\limits_{{\text{j} \in \text{N}_{\text{i}} }} {\text{E}_{\text{j}} }. \)

Table 3 Areas ranked by outlier status, no predictors

Table 3 shows two types of outlier. One consists of major urban centres with high risk themselves, but a low risk hinterland (e.g. areas 1,2,3, and 8 in the Table). For example, Fig. 5 shows estimated relative risk patterns around area 2 (Leicester). These areas are underpredicted under the constant scale model, with mixed predictive \( \Pr (\text{y}_{\text{i},\,\text{rep},\,\text{mixed}} > \text{y}_{\text{i}} |\text{y}) \) p-values under 0.025. Under the scale mixture model they have higher means \( \upmu_{\text{i}} \), closer to the observed \( \text{y}_{\text{i}} \), as there is less local borrowing of strength. The other type of outlier (e.g. areas 5 and 6 in the Table) are low risk areas with much higher risk neighbourhoods. These are overpredicted under the constant scale model, with \( \Pr (\text{y}_{{\text{i}},{\text{rep}}} > \text{y}|\text{y}) = 0.91 \), and \( \Pr (\text{y}_{\text{i},\,\text{rep},\,\text{mixed}} > \text{y}_{\text{i}} |\text{y}) = 1 \) for area 6. Under the scale mixture model, modelled means are reduced closer to the observed \( \text{y}_{\text{i}} \). For all 20 areas, 19 have mixed predictive p-values under 0.05 or over 0.95 under a constant scale model, whereas under the scale mixture, this is reduced to 12 out of 20.

Fig. 5
figure 5

modelled relative risks of TB incidence around Leicester, scale mixture model

Table 4 contains the 10 areas with the lowest posterior mean \( \upkappa_{\text{i}} \) under the scale mixture approach when the two covariates are included. These areas illustrate when modelled relative risk in area \( \text{i} \) itself \( \upmu_{\text{i}} /\text{E}_{\text{i}} \) are discrepant from implied relative risk \( \sum\limits_{{\text{j} \in \text{N}_{\text{i}} }} {\upmu_{\text{j}} } /\sum\limits_{{\text{j} \in \text{N}_{\text{i}} }} {\text{E}_{\text{j}} } \) in the locality of area \( \text{i} \). These discrepancies may be related to covariate patterns. Under a scale mixture approach, local borrowing of strength is lessened, and Table 4 shows that the predicted TB counts \( \upmu_{\text{i}}\) are closer to the actual counts than under the constant scale LLB.

Table 4 Areas ranked by outlier status, regression with predictors

8 Conclusion

Different forms of spatial correlation analysis or modelling have been proposed in disease applications, ecological epidemiology, environmental science and other settings. Both Bayesian and frequentist estimation have been used. Common themes include identifying elevated risk areas or clusters of areas, and finding predictors of risk, while recognizing the explicitly spatial structure of the observations. For example, in a review of regression findings from spatial species abundance data, Dorfmann (2007) shows that ignoring spatial dependence (e.g. in regression residuals) leads to possible bias in parameter estimates and optimistic standard errors. However, while it is important to incorporate spatial dependence in models for area data, the assumptions of such techniques should be assessed, and subject to modification when the data so indicate. In particular, spatial discontinuities suggest a modification to the principle of uniform local smoothing.

In particular, Bayesian analyses of spatially arranged data often employ a conditionally autoregressive prior, which can express spatial clustering commonly present in the underlying risks, but typically assume a normal density for risks and uniform conditional association. However, a more sensitive parameterisation with utility in detecting outliers and locally irregular risk patterns may be obtained by allowing for non-normality. Commonly applied conditionally autoregressive priors, such as the proper CAR prior and the convolution prior, also have potential deficits when the observations contain a mixture of spatial dependence and unstructured heterogeneity. The present paper has proposed a scale mixture version of the Leroux et al. (1999) spatial prior, combining the benefit of adaptability when risks are discrepant in adjacent areas, and also a less problematic approach to representing a mixture of clustering and heterogeneity.

The analyses here show improved fit to infectious disease data, which may often show pronounced risk variability between areas. In England, high risk areas are often major urban centres, whereas the neighbouring suburban or rural hinterlands of such centres may be low risk. In such situations some modification of the uniform local borrowing of strength principle may be beneficial.