MSPOCK: Alleviating Spatial Confounding in Multivariate Disease Mapping Models

Azevedo, Douglas R. M.; Prates, Marcos O.; Bandyopadhyay, Dipankar

doi:10.1007/s13253-021-00451-5

MSPOCK: Alleviating Spatial Confounding in Multivariate Disease Mapping Models

Published: 13 April 2021

Volume 26, pages 464–491, (2021)
Cite this article

Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Douglas R. M. Azevedo¹,
Marcos O. Prates ORCID: orcid.org/0000-0001-8077-4898¹ &
Dipankar Bandyopadhyay²

341 Accesses
4 Citations
Explore all metrics

Abstract

Exploring spatial patterns in the context of disease mapping is a decisive approach to bring evidence of geographical tendencies in assessing disease status and progression. In most cases, multiple count responses (corresponding to disease incidences of multiple types, such as cancer in men and women) are recorded at each spatial location, which may exhibit similar spatial patterns in addition to disease-specific patterns. These are typically modeled using multivariate shared component models, where the spatial (random) effects may be shared between the disease types to model their association. However, this framework is not immune to spatial confounding, where the latent correlation between the spatial random effects and the fixed effects often leads to misleading interpretation. A recent approach to attenuate spatial confounding is the “SPatial Orthogonal Centroid ‘K’orrection”, aka SPOCK, which displaces the geographical centroids, ensuring orthogonality of the spatial random effects and the fixed effects. In this paper, we introduce MSPOCK, or Multiple SPOCK, to tackle spatial confounding for the multiple counts scenario. The methodology is evaluated on synthetic data, and illustrated via an application to new cases of respiratory system cancer for men and women for the US state of California in 2016. Our studies show that the MSPOCK correction leads to a reduction of the posterior variance estimates of model parameters, while maintaining the interpretation of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

COVID-19 pandemic indicators and variation with vaccinations in Malaysia: a regional-based geo-visualization and geo-ecological regression study

Article 24 May 2024

Cause-Specific Excess Mortality During the COVID-19 Pandemic (2020–2021) in 12 Countries of the C-MOR Consortium

Article Open access 22 May 2024

Local spatial difference-in-differences models: treatment correlations, response interactions, and expanded local models

Article 22 May 2024

References

Banerjee S, Carlin BP, Gelfand AE (2014) Hierarchical modeling and analysis for spatial data, 2nd ed. CRC Press
Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc: Ser B (Stat Methodol) 36:192–225
MathSciNet MATH Google Scholar
Besag J, York J, Mollié A (1991) Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 43:1–20
Article MathSciNet Google Scholar
Boloker G, Wang C, Zhang J (2018) Updated statistics of lung and bronchus cancer in united states (2018). J Thorac Dis 10:1158
Article Google Scholar
Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:9–25
MATH Google Scholar
Chew LP (1989) Constrained Delaunay triangulations. Algorithmica 4:97–108
Article MathSciNet Google Scholar
CHRR CHR R (2019) University of wisconsin population health institute. County health rankings and roadmaps 2019. http://www.countyhealthrankings.org. Accessed 16 Jan 2020
Clayton DG, Bernardinelli L, Montomoli C (1993) Spatial correlation in ecological analysis. Int J Epidemiol 22:1193–1202
Article Google Scholar
Dabney AR, Wakefield JC (2005) Issues in the mapping of two diseases. Stat Methods Med Res 14:83–112
Article MathSciNet Google Scholar
Datta A, Banerjee S, Hodges JS, Gao L et al (2019) Spatial disease mapping using directed acyclic graph auto-regressive (DAGAR) models. Bayes Anal 14:1221–1244
MathSciNet MATH Google Scholar
de Valpine P, Paciorek C, Turek D, Michaud N, Anderson-Bergman C, Obermeyer F, Wehrhahn Cortes C, Rodrìguez A, Temple Lang D, Paganin S (2020) NIMBLE User Manual. R package manual version
Fu JB, Kau TY, Severson RK, Kalemkerian GP (2005) Lung cancer in women: analysis of the national surveillance, epidemiology, and end results database. Chest 127:768–777
Article Google Scholar
Gelfand AE, Vounatsou P (2003) Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics 4:11–15
Article Google Scholar
Gelman A, Hwang J, Vehtari A (2014) Understanding predictive information criteria for Bayesian models. Stat Comput 24:997–1016
Article MathSciNet Google Scholar
Gelman A, Rubin DB et al (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–472
MATH Google Scholar
Gómez-Rubio V, Palmí-Perales F (2019) Multivariate posterior inference for spatial models with the integrated nested laplace approximation. J Roy Stat Soc: Ser C (Appl Stat) 68:199–215
MathSciNet Google Scholar
Gómez-Rubio V, Rue H (2018) Markov chain monte carlo with the integrated nested laplace approximation. Stat Comput 28:1033–1051
Article MathSciNet Google Scholar
Guan Y, Haran M (2018) A computationally efficient projection-based approach for spatial generalized linear mixed models. J Comput Gr Stat 27:701–714
Article MathSciNet Google Scholar
Guan Y, Haran M (2019) Fast expectation-maximization algorithms for spatial generalized linear mixed models. arXiv:1909.05440
Hanks EM, Schliep EM, Hooten MB, Hoeting JA (2015) Restricted spatial regression in practice: geostatistical models, confounding, and robustness under model misspecification. Environmetrics 26:243–254
Article MathSciNet Google Scholar
Hatami R (2018) A practical method to control spatiotemporal confounding in environmental impact studies. MethodsX 5:710–716
Article Google Scholar
Hefley TJ, Hooten MB, Hanks EM, Russell RE, Walsh DP (2017) The bayesian group lasso for confounded spatial data. J Agric Biol Environ Stat 22:42–59
Article MathSciNet Google Scholar
Hodges JS, Reich BJ (2010) Adding spatially-correlated errors can mess up the fixed effect you love. Am Stat 64:325–334
Article MathSciNet Google Scholar
Hughes J, Haran M (2013) Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J R Stat Soc: Ser B (Stat Methodol) 75:139–159
Article MathSciNet Google Scholar
Jiao J, Han Y (2020) Bias Correction With Jackknife, Bootstrap, and Taylor Series. IEEE Trans Inf Theory 66:4392–4418
Article MathSciNet Google Scholar
Kim H, Sun D, Tsutakawa RK (2001) A bivariate Bayes method for improving the estimates of mortality rates with a twofold conditional autoregressive model. J Am Stat Assoc 96:1506–1521
Article MathSciNet Google Scholar
Knorr-Held L, Best NG (2001) A shared component model for detecting joint and selective clustering of two diseases. J R Stat Soc: Ser A (Stat Soc) 164:73–85
Article MathSciNet Google Scholar
Knorr-Held L, Natário I, Fenton SE, Rue H, Becker N (2005) Towards joint disease mapping. Stat Methods Med Res 14:61–82
Article MathSciNet Google Scholar
Knorr-Held L, Raßer G (2000) Bayesian detection of clusters and discontinuities in disease maps. Biometrics 56:13–21
Article Google Scholar
Lawson AB (2019) Bayesian disease mapping: hierarchical modeling in spatial epidemiology, 3rd ed. Chapman and Hall/CRC
Leroux BG, Lei X, Breslow N (1999) Estimation of disease rates in small areas: a new mixed model for spatial dependence. In Statistical models in epidemiology, the environment, and clinical trials. Springer, pp 179–191
Lindgren F, Rue H et al (2015) Bayesian spatial modelling with R-INLA. J Stat Softw 63:1–25
Article Google Scholar
Moran PA (1950) A test for the serial independence of residuals. Biometrika 37:178–181
Article MathSciNet Google Scholar
Park J, Haran M (2020) Reduced-dimensional monte carlo maximum likelihood for latent gaussian random field models. J Comput Gr Stat 1–15
Prates MO, Assunção RM, Rodrigues EC et al (2019) Alleviating spatial confounding for areal data problems by displacing the geographical centroids. Bayes Anal 14:623–647
MathSciNet MATH Google Scholar
Reich BJ, Hodges JS, Zadnik V (2006) Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62:1197–1206
Article MathSciNet Google Scholar
Rodrigues EC, Assunção R (2012) Bayesian spatial models with a mixture neighborhood structure. J Multivar Anal 109:88–102
Article MathSciNet Google Scholar
Rue H, Held L (2005) Gaussian Markov random fields: theory and applications. Chapman and Hall/CRC
Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc: Ser B (Stat Methodol) 71:319–392
Article MathSciNet Google Scholar
SEER (2019) National Cancer Institute. Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) SEER*Stat Database: Incidence - SEER 9 Regs Research Data, Nov 2018 Sub (1975–2016). Released April 2019, based on the November 2018 submission
Siegel RL, Miller KD, Jemal A (2019) Cancer statistics, 2019. Cancer J Clin 69:7–34
Article Google Scholar
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc: Ser B (Stat Methodol) 64:583–639
Article MathSciNet Google Scholar
Thaden H, Kneib T (2018) Structural equation models for dealing with spatial confounding. Am Stat 72:239–252
Article MathSciNet Google Scholar
Vargas FR (2013) Bayesian estimates of the lethality rate of acute myocardial infarction. PhD thesis, Universidade Federal de Minas Gerais (UFMG)
Watanabe S (2010) Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res 11:3571–3594
MathSciNet MATH Google Scholar
WHO (2004) Gender in lung cancer and smoking research. https://apps.who.int/iris/bitstream/handle/10665/43086/9241592524.pdf

Download references

Acknowledgements

The authors thank the anonymous Associate Editor and two reviewers, whose constructive comments led to an improved presentation. Prates acknowledges partial funding support from CNPq Grants 436948/2018-4 and 307547/2018-4, and FAPEMIG grant PPM-00532-16. Bandyopadhyay acknowledges partial support from Grants R01DE024984 and P30CA016059 from the United States National Institutes of Health.

Author information

Authors and Affiliations

Department of Statistics, Universidade Federal de Minas Gerais, Av. Pres. Antônio Carlos, 6627, Belo Horizonte, 31270-901, Brazil
Douglas R. M. Azevedo & Marcos O. Prates
Department of Biostatistics, Virginia Commonwealth University, Richmond, USA
Dipankar Bandyopadhyay

Authors

Douglas R. M. Azevedo
View author publications
You can also search for this author in PubMed Google Scholar
Marcos O. Prates
View author publications
You can also search for this author in PubMed Google Scholar
Dipankar Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcos O. Prates.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A: Integrated Nested Laplace Approximation—INLA

Integrated nested Laplace approximation (INLA, Rue et al. 2009) is a powerful methodology that allows the user to fit a variety of Bayesian models. A model can be fitted in INLA, if, for a random variable ${\varvec{Y}}$, its mean ${\varvec{\mu }}$ can be modeled through a link function g(.) in an additive way as:

$$\begin{aligned} g(\mu _i) = \eta _i = \beta _0 + \sum _{j = 1}^{n_{\xi }}\xi ^{(j)}(z_{ji}) + \sum _{k = 1}^{n_{\beta }}\beta _kX_{ki} + \epsilon _i, \end{aligned}$$

(A-1)

where $\xi ^{(j)}(z_{ji})$ are unknown functions of the covariates $z_{ij}$, $\beta _0$ is an intercept, $\beta _k$ is a set of coefficients related to the fixed effects $X_{ki}$ and $\epsilon _i$ are unstructured terms. INLA assumes Gaussian priors to the vector ${\varvec{u}} = \{\beta _0, {\varvec{\xi }}, {\varvec{\beta }}, \epsilon \}$ giving rise to a Gaussian Markov random field (GMRF, Rue and Held 2005). If the latent structure of a model can be written as a GMRF, it is possible to apply the INLA methodology. Most common models belonging to the GLMM family can be fitted in this framework.

The vector ${\varvec{u}} = \{\beta _0, {\varvec{\xi }}, {\varvec{\beta }}, {\varvec{\epsilon }} \}$ may depend on some hyperparameters ${\varvec{\theta }}$, for example, variances and correlation parameters that obey, in general, $\text {dim}({\varvec{u}}) \gg \text {dim}({\varvec{\theta }}) = n_{\theta }$. That way, one must provide the prior distribution for the vector $\{{\varvec{u}}, {\varvec{\theta }}\}$. INLA assigns priors $\pi ({\varvec{u}}, {\varvec{\theta }}) = \pi ({\varvec{u}}|{\varvec{\theta }})\pi ({\varvec{\theta }})$ where $\pi ({\varvec{u}}|{\varvec{\theta }})$ is a GMRF and $\pi ({\varvec{\theta }})$ may be decomposed as $\prod _{j = 1}^{n_{\theta }}\pi ({\varvec{\theta _j}})$. The marginal posterior distributions for the set of parameters are given by:

$$\begin{aligned} \pi (u_j|{\varvec{y}}) = \int \pi (u_j, {\varvec{\theta }}|{\varvec{y}})d{\varvec{\theta }} = \int \pi (u_j|{\varvec{\theta }}, {\varvec{y}})\pi ({\varvec{\theta }}|{\varvec{y}})d{\varvec{\theta }},\\ \pi (\theta _k|{\varvec{y}}) = \int \pi ({\varvec{\theta }}|{\varvec{y}}) d{\varvec{\theta _{-k}}}. \end{aligned}$$

In the absence of analytical solution to these integrals, numerical approximations are necessary to obtain ${\tilde{\pi }}(u_j|{\varvec{y}})$ and ${\tilde{\pi }}(\theta _k|{\varvec{y}})$, where ${\tilde{\pi }}(.)$ denotes an approximate function for $\pi (.)$.

1.1.1 Marginal Distribution for $\theta _k$

We can rewrite $\displaystyle \pi ({\varvec{\theta }}|{\varvec{y}}) = \frac{\pi ({\varvec{u}}, {\varvec{\theta }}|{\varvec{y}})}{\pi ({\varvec{u}}| {\varvec{\theta }}, {\varvec{y}})}$. To approximate this quantity, Rue et al. (2009) suggest a Gaussian approximation for the denominator as:

$$\begin{aligned} {\tilde{\pi }}({\varvec{\theta }}|{\varvec{y}}) \propto \frac{\pi ({\varvec{u}}, {\varvec{\theta }}, {\varvec{y}})}{\pi _G({\varvec{u}}| {\varvec{\theta }}, {\varvec{y}})}\Bigg |_{u = u^{*}({\varvec{\theta }})}, \end{aligned}$$

where $\pi _G(.)$ is the Gaussian approximation of a density, and $u^{*}({\varvec{\theta }})$ is the mode of $\pi ({\varvec{u}}| {\varvec{\theta }}, {\varvec{y}})$ at a given ${\varvec{\theta }}$. Now, to obtain the marginal distribution ${\tilde{\pi }}(\theta _k|{\varvec{y}})$, a numerical integration is conducted. Using a grid of $\theta _k$ values, the marginal is obtained as:

$$\begin{aligned} \pi (\theta _k|{\varvec{y}}) = \sum _{h=1}^H {\tilde{\pi }}({\varvec{\theta }}|{\varvec{y}})\Delta _{kh}. \end{aligned}$$

1.1.2 Marginal Distribution for $u_j$

Rue et al. (2009) propose three different approximations to this quantity: 1) Gaussian approximation; 2) Laplace approximation, and; 3) simplified Laplace approximation. The Gaussian approximation is the easiest to be obtained, but provides poor results. At the cost of being computationally expensive, the Laplace approximation produces better results. The simplified Laplace approximation provides satisfactory results, with an improved computational time. Taking one of them as approximation for ${\tilde{\pi }}(u_j|{\varvec{\theta }}, y)$, one can calculate the posterior marginal distribution as:

$$\begin{aligned} {\tilde{\pi }}(u_j|{\varvec{y}}) \approx \sum _{h=1}^H {\tilde{\pi }}(u_j | \theta ^*_h, {\varvec{y}}) {\tilde{\pi }}(\theta ^*_h|{\varvec{y}}) \Delta _h. \end{aligned}$$

1.2 B: Additional Simulation Results

Table 6 presents the simulation results for scenario SM2 (cubic and linear) and SM3 (cubic), comparing the SCM (without MSPOCK adjustment), to the SCM with the adjustment.

Table 6 Simulation results comparing SCM (shared component model, without confounding adjustment), and MSPOCK (shared component model, with confounding adjustment) for scenario SM2 (linear and cubic) and SM3 (cubic)

Full size table

1.3 C: Widely Applicable Information Criterion

In any application, it is a common in practice to have several competitor models. These models may vary in the number of parameters and/or model likelihood and, therefore, the complexity of these models can differ. One important aspect to evaluate is the parsimony principle that consists in determining a trade-off between model fitting and model complexity. In practice, we are searching for the best fit. However, the best fit does not necessarily always mean a more complex model, since they may have undesirable properties as overfitting, computational cost, identifiability issues, and so on.

Under the Bayesian paradigm, the deviation information criterion (DIC, Spiegelhalter et al. 2002) continues to be a widely popular metric. However, Gelman et al. (2014) studied and compared different model selection criteria, and concluded that the Widely applicable information criterion (WAIC, Watanabe 2010) is a promising alternative to performing such a task. To calculate the WAIC, one must compute the following log pointwise posterior predictive density ($ {lppd}$):

$$\begin{aligned} lppd = \log \left( \prod _{i=1}^n \pi _{post}(y_i) \right) = \sum _{i=1}^n \log \left( \int \pi (y_i|{\varvec{u}},{\varvec{\theta }}) \pi _{post}({\varvec{u}},{\varvec{\theta }}) \right) , \end{aligned}$$

where $\pi _{post}(\cdot )$ represents the posterior distribution of some quantity. Next, to adjust for a possible overfitting, a term is added to correct for the effective number of parameters $p_{\text {WAIC}} = \sum _{i=1}^n V(\log f(y_i|{\varvec{u}}, {\varvec{\theta }}))$, where $V(\cdot )$ is the posterior variance of the log predictive density. Finally, the WAIC is given by:

$$\begin{aligned} \text {WAIC} = -2 (lppd - p_{\text {WAIC}}). \end{aligned}$$

The model with the smallest WAIC value is considered the model of best fit to a dataset.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Azevedo, D.R.M., Prates, M.O. & Bandyopadhyay, D. MSPOCK: Alleviating Spatial Confounding in Multivariate Disease Mapping Models. JABES 26, 464–491 (2021). https://doi.org/10.1007/s13253-021-00451-5

Download citation

Received: 27 April 2020
Accepted: 20 March 2021
Published: 13 April 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s13253-021-00451-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MSPOCK: Alleviating Spatial Confounding in Multivariate Disease Mapping Models

Abstract

Access this article

Similar content being viewed by others

COVID-19 pandemic indicators and variation with vaccinations in Malaysia: a regional-based geo-visualization and geo-ecological regression study

Cause-Specific Excess Mortality During the COVID-19 Pandemic (2020–2021) in 12 Countries of the C-MOR Consortium

Local spatial difference-in-differences models: treatment correlations, response interactions, and expanded local models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 A: Integrated Nested Laplace Approximation—INLA

1.1.1 Marginal Distribution for \(\theta _k\)

1.1.2 Marginal Distribution for \(u_j\)

1.2 B: Additional Simulation Results

1.3 C: Widely Applicable Information Criterion

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MSPOCK: Alleviating Spatial Confounding in Multivariate Disease Mapping Models

Abstract

Access this article

Similar content being viewed by others

COVID-19 pandemic indicators and variation with vaccinations in Malaysia: a regional-based geo-visualization and geo-ecological regression study

Cause-Specific Excess Mortality During the COVID-19 Pandemic (2020–2021) in 12 Countries of the C-MOR Consortium

Local spatial difference-in-differences models: treatment correlations, response interactions, and expanded local models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 A: Integrated Nested Laplace Approximation—INLA

1.1.1 Marginal Distribution for \(\theta _k\)

1.1.2 Marginal Distribution for \(u_j\)

1.2 B: Additional Simulation Results

1.3 C: Widely Applicable Information Criterion

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation