Spatial heterogeneity in Bayesian disease mapping

Disease mapping applications generally assume homogeneous regression effects and use random intercepts to account for residual spatial dependence. However, there may be local variation in the association between disease and area risk factors. We consider implications for model fit, estimated regression coefficients, and substantive inferences of allowing spatial variability in impacts of area risk factors. An application to suicide in 6791 English small areas shows that average regression coefficients and substantive inferences (e.g. about relative risk) may be considerably affected by allowing spatially varying predictor effects, while fit is improved.


Introduction
Many area disease studies consider official mortality statistics for administrative areas (e.g. US counties, English wards). Assume a region subdivided into n small areas. Let Y i denote observed deaths in such areas, and E i denote expected deaths obtained by applying national rates to small area populations. For relatively rare diseases, maximum likelihood estimates of relative risk, namely standard mortality ratios Y i =E i are unstable (Haining 2001), with extreme ratios associated with areas with the smallest populations. By contrast, disease mapping methods, usually using Bayesian inference, seek to borrow strength across areas to produce stable risk estimates.
A common analytical framework for rare diseases over small areas adopts the Besag et al. (1991) model, whereby disease counts Y i ð i ¼ 1; . . .; nÞ are taken as Poisson with means l i ¼ E i k i , where k i denote relative disease risks. These are centred around a global relative risk of 1 when P n i¼1 Y i ¼ P n i¼1 E i : Let X i denote known predictors of disease risk, such as area deprivation. Presentations of disease mapping regression (e.g. Schrödle and Held 2011;Mollié 1996;Clayton et al. 1993) typically assume a homogenous effect of such predictors, and use varying intercepts to represent and account for spatial dependence in the outcome, and hence to borrow strength in estimation. Such models incorporate spatially dependent effects, as, for example, under conditional autoregressive schemes (e.g. Besag et al. 1991;Leroux et al. 1999).
An alternative to spatial homogeneity (also known as spatial stationarity) for predictor effects is to allow spatially varying impacts. A classical technique allowing varying coefficients is geographically weighted regression (e.g. Mou et al. 2017). Analogous models using random effects Bayesian approaches have been proposed (e.g. Assunçao 2003), but their full implications for inferences from disease mapping studies are not widely studied-though see Feng et al. (2016). However spatial dependence in residuals may be due to spatial nonstationarity (Fotheringham 2009), with implication that if allowing spatially varying regression impacts removes spatial residual dependence, then additional spatially structured random intercepts may be unnecessary.
In this case study, we consider suicide counts in 6791 middle level super output areas (MSOAs) in England. Suicides are for the 5 year period, 2011-2015, and available as gender-specific totals within each MSOA; there are 23,460 suicides overall, 17,897 male and 5563 female. We consider the impact on suicide variations of three area level predictors: socioeconomic deprivation, social fragmentation and rurality.
Analysis uses Bayesian inference via integrated nested Laplace approximation (INLA), a computationally efficient alternative to Markov Chain Monte Carlo, and implemented in the R package R-INLA (Bivand et al. 2015). We compare models with homogeneous regressor effects and varying intercepts (''global models'') with models allowing spatially varying predictor effects (''local models''). This comparison shows that fit improves under local models, while residual spatial dependence is removed. Central predictor effects are different between the models, and there is significant variability in one or more predictors under local models. Implications for relative risk estimates, and for local variations in risk within subregions, are discussed.
Subsequent sections discuss specifications of the two classes of model, the definition and epidemiological role of the predictors, and the case study implementation and findings.

Model specification
A commonly adopted formulation for disease mapping involves the convolution model (Besag et al. 1991), which leads to spatially varying intercepts. This involves, at a minimum, a spatially configured effect s i , and an unstructured or iid term u i , normally distributed with mean 0 and variance r 2 u . Let L i denote the locality surrounding area i, meaning the set of d i small areas adjacent to it. Let N(m, V) denote a normal density with mean m and variance V. Then the spatial effects follow a conditional autoregressive (or CAR) scheme, where S i is the average of the spatial effects in locality L i . For identifiability purposes, spatial effects are centred to have mean 0. Then with b 0 denoting an intercept, relative risks can be modelled as a log-link regression with varying intercepts: Assume there are ecological (area-level) predictors X i relevant to the disease. Under a ''global'' or stationary model, predictors have a spatially homogeneous impact, namely with random intercepts intended to eliminate residual spatial dependence. However, spatial dependence and spatial heterogeneity (i.e. spatial non-stationarity) are often interrelated (Anselin 2010). One possible form of spatial heterogeneity is in predictor impacts. Consider a simple linear regression y i $ Nðl i ; r 2 Þ for an area outcome, and with one predictor (Fotheringham 2009). Suppose a global model l i ¼ b 0 þ X i b 1 ; is assumed, but that the true model is where b 1i are varying slopes. Assume the global regression coefficient is estimated as b Ã 1 . Then for areas where b 1i [ b Ã 1 , y i is typically underestimated under a global model, and the residual is positive. For locations where b 1i \b Ã 1 , the residual is typically negative, since y i is overestimated. If the b 1i show spatial dependence, then residuals from the global model will also show spatial dependence.
More generally if non-stationarity in regression effects is a major source of spatial dependence in residuals, such dependence may be eliminated by a local model allowing non-stationary impacts of X i . A model allowing for local variation in regression effects is also potentially more adaptive to local variations in risk and their association with local risk factor patterns: links between disease and risk factors may be stronger in some sub-regions.

Area socioeconomic influences on suicide
The above discussion assumes that area level predictors of disease risk are available. Many studies report that area poverty and deprivation increase suicide risks (e.g. Gunnell et al. 1995). Area deprivation is to some extent simply an aggregate measure of individual suicide risk factors such as unemployment and low income, so acting as a compositional factor (Collins et al. 2017). However, it may also partly reflect contextual risks, or place effects per se. This is supported by studies of mental illness reporting significant associations with area-level socio-economic status, beyond individual-level factors (e.g. Silver et al. 2002;Matheson et al. 2006). Here we use the UK government's Index of Multiple Deprivation (or IMD) as a measure of area SES.
A number of studies have considered impacts on suicide outcomes of indices of social fragmentation, meaning relatively low levels of community integration linked to high numbers of non-family households, and high residential turnover. Thus Congdon (1996) proposed a social fragmentation index based on residential turnover, one person households, renting from private sector landlords (excluding social renting), and non-married adults.
Social fragmentation is to some degree a compositional measure of individual suicide risk factors such as living alone (Holt-Lunstad et al. 2015), recent residential relocation, and being unmarried. However, it may also partly reflect contextual risks, such as negative effects of high neighborhood transience on population mental health, after control for individual risk factors (Matheson et al. 2006). Here a fragmentation score is obtained from principal component analysis of the four variables of Congdon (1996), but updated to the 2011 Census. Social fragmentation so defined tends to be higher in central city areas, but also in particular types of town (coastal resorts, university towns) with relatively high population turnover.
Findings on urbanicity or rurality in relation to suicide are inconsistent, though many confounding factors could contribute to these findings. Thus Gartner et al. (2008) report that ''mortality rates from suicide for males in England were 10 per cent lower in rural areas before adjustment for deprivation, but 11 per cent higher [...] after adjustment''; see also Saunderson et al. (1998). Various features of rural economic life and healthcare may affect suicide, such as sole entrepreneurship, easier access to suicide methods, and lesser access to mental health services.
In the analysis below a rurality score is based on the 2011 rural/urban classification (RUC2011) of UK small areas (Bibby and Brindley 2013). Specifically a ridit score (e.g. Ernstsen et al. 2012) is obtained from MSOA frequencies in eight ordered urban-rural categories (from Urban Major Conurbation at one extreme to Rural Village and Dispersed in a Sparse Setting at the other). Ridit scores are assigned to ordered categories following a procedure developed by Bross (1958).

Case study implementation
Suicide deaths Y i are available by gender, with expected deaths E i obtained by multiplying MSOA populations by England wide age specific suicide death rates, with P n i¼1 Y i ¼ P n i¼1 E i : Let X i ¼ ðX 1i ; X 2i ; X 3i Þ denote the predictors; respectively deprivation, social fragmentation and rurality. For each of three analyses (overall, male, and females) we compare four models.
Model 1 is a global model assuming the same predictor effects across all MSOAs, together with varying intercepts. Thus for latent relative risks k i , with spatial and iid effects, s i and u i respectively, defined as above. Among non-stationary models, the simplest assumes varying slopes, without varying intercepts. This is denoted model 2, or the local model, with where b 1i ; b 2i ; and b 3i each follow the autoregressive scheme (1). The data are relatively sparse (the average overall suicide count per MSOA is 3.5) so may not support complex models with additional random effects. Accordingly, we compare model 2 with two more complex models: model 2 0 , with varying slopes and spatial intercepts, and model 2 00 , containing varying predictor effects, and both spatial and iid intercepts. Thus model 2 00 is In practice, R-INLA estimates varying slopes as the sum of a fixed effect and a spatial random effect, for where w 1i is a zero mean CAR scheme. The X i are scaled to the interval [0, 1], so that the relative importance of predictors as risk factors can be directly assessed from the regression coefficients. It is important to assess how far each model eliminates residual spatial dependence. With l i denoting posterior mean of l i , Poisson residuals are defined as and one may assess significant residual dependence using an index such as Moran's I. Specifically the moran.mc procedure in R uses a Monte Carlo permutation test for Moran's I statistic, with 1000 permutations being used. Significant correlation will show in extreme p values, namely values close to zero when positive residual correlation remains, and p values close to 1 for significant negative residual correlation. Moran I calculations use a binary adjacency spatial interaction matrix for the 6791 areas. Model fit is assessed by using the deviance information criterion (DIC) (Spiegelhalter et al. 2002), and the Watanabe-Akaike information criterion (WAIC) (Watanabe 2010). Both measures incorporate a complexity penalty as well as simply measuring fit, and so are analogous to classical fit measures such as the Akaike information criterion (AIC). Smaller values of the DIC and WAIC suggest a model which is both better fitting and more parsimonious in terms of parameters. Table 1 summarises model fit across the outcomes, and levels of residual spatial dependence. The largest gains in model fit (in terms of reduced WAIC and DIC) are obtained in going from the global model 1 to the simplest local model, the varying slopes model 2. Any additional improvements in fit are slight: for overall suicides, the reduction in WAIC in going from model 2 to model 2 0 is only 3, and for males the corresponding reduction is only 4. There is no gain in fit in moving from model 2 0 to model 2 00 . For females, the DIC and WAIC are lowest for model 2, though changes in fit measures between models are small; for females, fit deteriorates slightly in moving from model 2 to more complex variants.

Case study results
Additionally under the global models for overall and male suicides, there is still evidence of some residual spatial dependence, in fact negative correlation, so that the p value is close to 1. For overall sucides, the more complex models 2 0 and 2 00 also show residual spatial dependence. So on grounds of both fit and eliminating residual dependence, the simplest local model, model 2, is the best choice for overall and female suicides. For conceptual simplicity and comparability, Tables 2, 3 and 4 compare outcome specific regression coefficients under the varying intercepts model 1 and varying slopes model 2.
It can be seen that deprivation is the leading risk factor for overall suicide, and for male suicides (which account for 76% of overall suicides). For males, the relative suicide risk under model 2 is 2.58, when comparing the most and least deprived communities and holding other influences constant. By contrast, for  While there is significant variability in SFI impacts on two suicide outcomes, this is overwhelmingly a positive risk factor on suicide in the sense that higher SFI is associated with higher suicide risk. For local SFI impacts on overall suicides, in 4750 (of 6791) areas there is a 90% chance or more of a positive effect, but

Comparison of predictions
Detecting markedly elevated or reduced risk is a primary goal in disease mapping, and often a criterion for model effectiveness.
As one approach to comparing predictions and detecting extreme risk, we focus on overall suicides, and compare relative risk estimates between global and local models for those MSOAs where predictions contrast most. The ten MSOAs where the local model 2 predicts distinctly higher relative risks are listed first (see Table 5). We include SMRs as a risk indicator, but because of their drawbacks as relative risk estimates, supplement them by probability estimates that relative risks exceed 1. These are based on Poisson random simulations, using only the observed Y i and E i (see ''Appendix 1''). These estimates take account of varying MSOA populations, and higher E i in some areas, whereas an SMR of, say, 2, does not distinguish between the scenarios ðY ¼ 10; E ¼ 5Þ and ðY ¼ 4; E ¼ 2Þ.
It can be seen that risk tends to be underpredicted under the global model 1 for the first ten areas in Table 5. In these areas, high suicide risks are associated with high scores on deprivation or fragmentation, and with above average IMD and SFI coefficients (compared to the coefficient averages in Table 2). Such high coefficients are acting adaptively to explain locally high suicide risk under model 2.
By contrast, the last ten rows of Table 5 are for ten MSOAs where the global model 1 produces distinctly higher relative risk estimates as compared to model 2. These tend to be MSOAs where actual deaths Y i are less than expected counts E i , and there is a low databased probability that relative risk exceeds 1. However, for all these MSOAs, the global model produces relative risk estimates in excess of 1.
A more generic impression of how predictions compare between models is based on forming groups of MSOAs with distinct risk levels, as assessed from probability estimates of excess relative risk (''Appendix 1'') and the population size of MSOAs (as reflected in expected suicides). The excess risk probabilities more clearly identify elevated than depressed risk. Considering overall suicides, there are 148 MSOAs with excess risk probabilities over 0.99, and 449 with probabilities over 0.95. By contrast, there are only 36 areas with probabilities under 0.01, while 199 are under 0.025, and 420 under 0.05. Therefore we define area categories according to excess risk probabilities over 0.99, over 0.95, under 0.05 and under 0.025, and expected suicides (over 5, 3-5, and under 3). Thus see the upper section of Table 6, with the last column showing the number of MSOAs in the category, and also including intermediate risk areas. Table 6 shows the average relative risks under models 1, 2, 2 0 , and 2 00 (columns 2-5) as well as the standard mortality ratio in that category (total deaths divided by total expected).
Thus for areas with elevated risk, the varying slopes model 2 identifies such risk better than the varying intercepts model 1. This advantage is most pronounced for larger areas (E[5) where the simulation probabilities exceed 0.99. Model 2 also has an advantage over model 1 for low risk areas with larger populations (E [ 5). Additionally, throughout all comparisons, there is no advantage in predicting risk for models 2 0 and 2 00 against the less complex model 2. For intermediate risk areas, the models make similar predictions.
These themes continue in the lower part of Table 6 which compares predictions between the nine English regions. Thus different models tend to be broadly similar in their predictive success within some regions as against others. For example, all models are relatively successful in predicting high risk in the North West, South West and North East regions. By contrast, the high risk in 20 London MSOAs is not well predicted by any model, suggesting that an expanded model might be needed to capture distinct regional effects. However, the advantage of model 2 over model 1 in predicting higher risk still pertains for most regions, most markedly for the East of England.
To illustrate how varying intercepts and varying slopes model compare for data following a known model form, we also conduct a simulation for one English region, the North West. Details are set out in Online Resource O1, but replicate the above discussed  features: better fit and better prediction for higher risk areas under the varying slopes model 2.

Suicide variation within local authorities
The 6791 MSOAs are nested within larger administrative units, namely 326 local authorities. To illustrate locally varying SFI impacts within a subregion, consider Tendring local authority in NE Essex, which includes coastal resort areas. As mentioned above, coastal resort towns may have high residential turnover, and high levels of private renting, leading to high fragmentation scores. They can also be relatively deprived. High suicide risks in such towns are exemplified by authority-wide SMRs (for overall suicides) of 1.57 (Tendring), 1.51 (Blackpool), 1.53 (Hastings), 1.48 (Scarborough), 1.43 (Great Yarmouth), 1.37 (Brighton and Hove) and 1.32 (Eastbourne). These authorities are all within the 25 local authorities with the highest suicide SMRs.
Within Tendring, there are wide contrasts in suicide risks, with observed counts Y i considerably exceeding expected counts E i in some MSOAs, with the reverse true in other areas (see Table 7). In six areas there is an over 90% chance that relative risk exceeds 1, based simply on the data.
As to risk-predictor associations, there is a positive correlation (0.90) between fragmentation and suicide SMRs within Tendring, and a positive correlation (0.86) between deprivation and suicide also. So both deprivation and fragmentation coefficients are above average, as the two risk factors have a clear role in explaining suicide contrasts within the subregion. Figure 3 shows the varying SFI coefficients in this local authority. The Figure highlights three adjacent coastal MSOAs (Tendring 012, 014 and 016) with SFI coefficients in the highest category and also high suicide risks.
Another example is a relatively deprived, postindustrial, local authority in northern England, namely Middlesbrough, with an authority-wide suicide SMR  Table 8). These coefficients reflect associations between suicide and risk factors: high risk MSOAs tend to be highly deprived, while low risk MSOAs have low deprivation levels.
The same applies to social fragmentation, where coefficients are above average, and high (low) risk tends to be associated with high (low) fragmentation. Areas with elevated risk, and high levels of both deprivation and fragmentation, are exemplified by Middlesbrough 001 and 003. Areas with low risk, and low levels of both deprivation and fragmentation, are exemplified by Middlesbrough 012 and 017.

Conclusions
The primary intention of the preceding analysis has been to evaluate models for a relatively rare mortality outcome, allowing for spatially varying predictor effects. This approach is compared to a varying intercepts model within the broader framework of Bayesian disease mapping. Disease mapping applications tend to assume homogeneous effects of area risk factors on disease, and use random intercepts to account for residual spatial dependence, as in suicide studies (Qi et al. 2014;Yoon et al. 2015). Area suicide studies may also omit area risk factors altogether, and use a varying intercepts model (Cheung et al. 2012). However, as demonstrated in the current application, additional substantive perspectives may be gained through exploring local variability in risk factor effects, and in some circumstances, there may be little gain in fit through using varying intercepts in combination with varying predictor effects. The latter applies for suicide across England small areas, where models with varying slopes only provide comparable fit to more complex models. However, this finding may be specific to this particularly outcome, and the analysis does not establish a generic tendency for varying slopes to dispense with the need for random intercepts.
Varying regression effects are here implemented using a Bayesian random effects approach, and analysis of locally varying regression is facilitated by software such as R-INLA. This avoids the computational burden involved in Monte Carlo Markov Chain analysis.
Geographically weighted regression (GWR) has the same focus on spatial heterogeneity, but does not use a random effects approach, but instead a series of separate weighted regressions. GWR has been widely used, with health applications included. Since readers of the journal may well be more familiar with this  approach, a supplementary analysis using GWR (applied to overall suicides) is reported on in Online Resource O2. This shows some findings common with the borrowing strength Bayesian approach, but not a strong correlation between area regression coefficients. Such contrasts are likely given the rather different methods used in the two approaches (Waller et al. 2007), and may be more apparent for a rare disease outcome. Borrowing strength operates via the assumed prior distribution of random effect across all areas, and this is especially important for rarer outcomes. The borrowing strength approach has, however, primarily been used in modelling intercept variation. Studies of spatial heterogeneity in disease mapping, and in particular the potential for spatially varying coefficients on risk factors, are far fewer, and the present paper is intended to demonstrate how this form of analysis may be approached and its potential benefits.

Compliance with ethical standards
Conflicts of interest The author declares no conflict of interest.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix 1
To indicate disease risk using simply the observed data, one may simulate death counts e Y it (for simulations t ¼ 1; . . .; T) based on the expected deaths E i , and compare these with actual deaths Y i : If for most simulations, one has Y i ! e Y it then this indicates high relative risk in area i. The relative risk interpretation holds when P n i¼1 Y i ¼ P n i¼1 E i . Let IðAÞ ¼ 1 or 0 according as condition A holds. Following Marshall and Spiegelhalter (2007), and allowing for equality of simulated and actual counts, we find probability estimates that relative risks exceed 1, based simply on the data, as R i =T where The relevant R code for T ¼ 1000 is where pr.exc are the probability estimates.