A smoothed ANOVA model for multivariate ecological regression

Marí-Dell’Olmo, Marc; Martinez-Beneito, Miguel A.; Gotsens, Mercè; Palència, Laia

doi:10.1007/s00477-013-0782-2

A smoothed ANOVA model for multivariate ecological regression

Original Paper
Published: 18 August 2013

Volume 28, pages 695–706, (2014)
Cite this article

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Marc Marí-Dell’Olmo^1,2,3,
Miguel A. Martinez-Beneito^1,4,
Mercè Gotsens^1,2,3 &
…
Laia Palència^1,2,3

459 Accesses
9 Citations
2 Altmetric
Explore all metrics

Abstract

Smoothed analysis of variance (SANOVA) has recently been proposed for carrying out disease mapping. The main advantage of this approach is its conceptual simplicity and ease of interpretation. Moreover, it allows us to fix the combination of diseases of particular interest in advance and to make specific inferences about them. In this paper we propose a reformulation of SANOVA in the context of ecological regression studies. This proposal considers the introduction in a non-parametric way of one (or several) covariate(s) into the model, explaining some pre-specified combinations of the outcome variables. In addition, random effects are also incorporated in order to model geographical variation in the combinations of outcome variables not explained by the covariate. Lastly, the model permits the decomposition of the variance in the set of outcome variables into different orthogonal components, quantifying the contribution of every one of them. The proposed model is applied to the geographical analysis of mortality due to malignant stomach neoplasm among women resident in the city of Barcelona (Spain). The available outcome variables are deaths grouped into two time periods, and a socioeconomic deprivation index is included as a covariate. The model has been implemented through INLA, a novel inference tool for Bayesian statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-dimensional order-free multivariate spatial disease mapping

Article Open access 19 July 2023

Mortality by cause of death in Colombia: a local analysis using spatial econometrics

Article 13 August 2020

Spatial Statistics and Health Sciences: Methods and Applications

References

Banerjee S, Wall MM, Carlin BP (2003) Frailty modeling for spatially correlated survival data, with application to infant mortality in Minnesota. Biostatistics (Oxford, England) 4:123–142
Article Google Scholar
Barceló MA, Saez M, Cano-Serral G, Martinez-Beneito MA, Martinez JM, Borrell C, Ocaña-Riola R, Montoya I, Calvo M, López-Abente G, Rodriguez-Sanz M, Toro S, Alcalá JT, Saurina C, Sánchez-Villegas P, Figueiras A (2008) Métodos para la suavización de indicadores de mortalidad: aplicación al análisis de desigualdades en mortalidad en ciudades del Estado español (Proyecto MEDEA). Gaceta sanitaria/S.E.S.P.A.S 22:596–608
Besag J, York J, Mollié A (1991) Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 43:1–59
Article Google Scholar
Borrell C, Marí-Dell’Olmo M, Serral G, Martinez-Beneito MA, Gotsens M, Other MEDEA members. (2010) Inequalities in mortality in small areas of eleven Spanish cities (the multicenter MEDEA project). Health & Place 16:703–711
Article Google Scholar
Botella-Rocamora P, López-Quílez A, Martinez-Beneito MA (2013) Spatial moving average risk smoothing. Stat Med 32:2595–2612
Article CAS Google Scholar
Clayton DG, Bernardinelli L, Montomoli C (1993) Spatial correlation in ecological analysis. Int J Epidemiol 22:1193–1202
Article CAS Google Scholar
Dominguez-Berjon MF, Borrell C, Cano-Serral G, Esnaola S, Nolasco A, Pasarin MI, Ramis R, Saurina C, Escolar-Pujolar A (2008) Constructing a deprivation index based on census data in large Spanish cities (the MEDEA project). Gaceta sanitaria/S.E.S.P.A.S 22:179–187
Gelman A (2006) Prior distributions for variance parameters in hierarchical models. Bayesian Anal 1:515–533
Article Google Scholar
Hodges JS, Reich BJ (2010) Adding spatially-correlated errors can mess up the fixed effect you love. Am Stat 64:325–334
Article Google Scholar
Hodges JS, Cui Y, Sargent DJ, Carlin BP (2007) Smoothing balanced single-error-term analysis of variance. Technometrics 49:12–25
Article Google Scholar
Hughes J, Haran M (2013) Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J R Stat Soc 75:139–159
Article Google Scholar
Knorr-Held L, Best NG (2001) A shared component model for detecting joint and selective clustering of two diseases. J R Stat Soc 164:73–85
Article Google Scholar
Liu X, Wall MM, Hodges JS (2005) Generalized spatial structural equation models. Biostatistics (Oxford, England) 6:539–557
Article Google Scholar
Mardia KV (1988) Multi-dimensional multivariate Gaussian Markov random fields with application to image processing. J Multivar Anal 24:265–284
Article Google Scholar
Marí-Dell’Olmo M, Martinez-Beneito MA, Borrell C, Zurriaga O, Nolasco A, Dominguez-Berjon MF (2011) Bayesian factor analysis to calculate a deprivation index and its uncertainty. Epidemiology (Cambridge, Mass) 22:356–364
Article Google Scholar
Muñoz F, Pennino MG, Conesa D, López-Quí-lez A, Bellido J (2013) Estimation and prediction of the spatial occurrence of fish species using Bayesian latent Gaussian models. Stoch Environ Res Risk Assess 27:1171–1180
Article Google Scholar
R Development Core Team (2012). R: a language and environment for statistical computing
Reich BJ, Hodges JS, Zadnik V (2006) Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62:1197–1206
Article Google Scholar
Rue H, Held L (2005) Gaussian Markov random fields: theory and applications (Chapman & Hall/CRC monographs on statistics & applied probability): Chapman and Hall/CRC, London
Rue H, Martino S (2009) INLA: functions which allow to perform a full Bayesian analysis of structured additive models using integrated nested laplace approximaxion. R package version 0.0
Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc 71:319–392
Article Google Scholar
Schrödle B, Held L (2011a) A primer on disease mapping and ecological regression using INLA. Comput Stat 26:241–258
Article Google Scholar
Schrödle B, Held L (2011b) Spatio-temporal disease mapping using INLA. Environmetrics 22(6):725–734
Article Google Scholar
Tierney L, Kadane JB (1986) Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc 81:82–86
Article Google Scholar
Tzala E, Best N (2008) Bayesian latent variable modelling of multivariate spatio-temporal variation in cancer mortality. Stat Methods Med Res 17:97–118
Article Google Scholar
Ugarte M, Ibáñez B, Militino A (2005) Detection of spatial variation in risk when using CAR models for smoothing relative risks. Stoch Environ Res Risk Assess 19:33–40
Article Google Scholar
Wang F, Wall MM (2003) Generalized common spatial factor model. Biostatistics (Oxford, England) 4:569–582
Article Google Scholar
Zhang Y, Hodges JS, Banerjee S (2009) Smoothed ANOVA with spatial effects as a competitor to MCAR in multivariate spatial smoothing. Ann Appl Stat 3:1805
Article Google Scholar

Download references

Acknowledgments

We wish to thank Carme Borrell and one reviewer for their valuable suggestions on the article. We also wish to thank Prof. Håvard Rue for his support with the R-INLA implementation of the model proposed in this study.

Funding

This article has been partially funded by: “Plan Nacional de I+D+I 2008–2011” and the “ISCIII –Subdirección General de Evaluación y Fomento de la Investigación-” (Project number PI081488); the project INEQ-CITIES, “Socioeconomic inequalities in mortality: evidence and policies of cities of Europe”, project funded by the Executive Agency for Health and Consumers (Commission of the European Union), Project No. 2008 12 13; and by CIBER Epidemiología y Salud Pública (CIBERESP), Spain.

Author information

Authors and Affiliations

CIBER Epidemiología y Salud Pública (CIBERESP), Madrid, Spain
Marc Marí-Dell’Olmo, Miguel A. Martinez-Beneito, Mercè Gotsens & Laia Palència
Agència de Salut Pública de Barcelona, Barcelona, Spain
Marc Marí-Dell’Olmo, Mercè Gotsens & Laia Palència
Institut d’Investigació Biomèdica (IIB Sant Pau), Barcelona, Spain
Marc Marí-Dell’Olmo, Mercè Gotsens & Laia Palència
Centro Superior de Investigación en Salud Pública CSISP-FISABIO, Av. Cataluña, 21., 46020, Valencia, Spain
Miguel A. Martinez-Beneito

Authors

Marc Marí-Dell’Olmo
View author publications
You can also search for this author in PubMed Google Scholar
Miguel A. Martinez-Beneito
View author publications
You can also search for this author in PubMed Google Scholar
Mercè Gotsens
View author publications
You can also search for this author in PubMed Google Scholar
Laia Palència
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miguel A. Martinez-Beneito.

Additional information

This article has been included in the doctoral thesis of one of the authors (Marc Marí-Dell’Olmo), being carried out at Pompeu Fabra University.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 21 kb)

Appendix

In what follows we explain briefly the general ideas involved in the Integrated nested Laplace approximations (INLA) inference method, the approach used in this paper to approximate the marginal posterior density of the parameters in our multivariate ecological regression model.

Consider a three-stage Bayesian hierarchical model based on an observation model $ \pi \left( {\varvec{y}|\varvec{x}} \right) = \mathop \prod_{i} \pi \left( {y_{i} |x_{i} } \right) $, a parameter model $ \pi \left( {\varvec{x}|\varvec{\theta}} \right) $, and a hyperprior $ \pi \left(\varvec{\theta}\right) $. Here $ y = \left( {y_{1} , \ldots ,y_{n} } \right) $ denotes the observed data, x are unknown parameters which typically follow a Gaussian Markov random field (GMRF) and θ are unknown hyperparameters. The marginal posterior density of x _i can be written as

$$ \pi \left( {x_{i} |\varvec{y}} \right) = \mathop \int \limits_{\varvec{\theta}}^{{}} \pi \left( {x_{i} |\varvec{\theta},\varvec{y}} \right)\pi \left( {\varvec{\theta}|\varvec{y}} \right)d\varvec{\theta} $$

and can be approximated by the finite sum

$$ \tilde{\pi }\left( {x_{i} |{\mathbf{y}}} \right) = \mathop \sum \limits_{k} \tilde{\pi }\left( {x_{i} |\varvec{\theta}_{k} ,y} \right)\tilde{\pi }\left( {\varvec{\theta}_{k} |{\mathbf{y}}} \right)\Updelta_{k} $$

Integrated nested Laplace approximations (INLA) approximates these marginal posterior densities using an approximation $ \tilde{\pi }\left( {x_{i} |\varvec{\theta},\varvec{y}} \right) $ of $ \pi \left( {x_{i} |\varvec{\theta},\varvec{y}} \right) $ and an additional approximation $ \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right) $ of the marginal posterior density of the hyperparameters $ \pi \left( {\varvec{\theta}|\varvec{y}} \right) $. $ \Updelta_{k} $ are chosen appropriately.

To approximate $ \pi \left( {\varvec{\theta}|\varvec{y}} \right) $ we consider:

$$ \pi \left( {\user2{x},\varvec{\theta},\user2{y}} \right) =\varvec{\pi} \left( {\user2{x}|\varvec{\theta},\user2{y}} \right) \times \pi \left( {\varvec{\theta}|\user2{y}} \right) \times \pi \left( \user2{y} \right),$$

therefore for any $ x $

$$ \pi \left( {\varvec{\theta}|\varvec{y}} \right) \propto \frac{{\pi \left( {\varvec{x},\varvec{\theta},\varvec{y}} \right)}}{{\pi \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right)}} $$

INLA approximates $ \pi \left( {\varvec{\theta}|\varvec{y}} \right) $ using a Laplace approximation (Tierney and Kadane 1986):

$$ \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right)\,\propto\,\frac{{\tilde{\pi }\left( {\varvec{x},\varvec{\theta},\varvec{y}} \right)}}{{\tilde{\pi }_{G} \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right)}}|_{{\varvec{x} = \varvec{x}^{*} \left(\varvec{\theta}\right)}} $$

where the denominator $ \tilde{\pi }_{G} \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right) $ is the Gaussian approximation of $ \pi \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right) $ and $ \varvec{x}^{ * } \left(\varvec{\theta}\right) $ is the mode of the full conditional $ \pi \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right) $ (Rue and Held 2005; Rue et al. 2009). Posterior marginals for the hyperparameters $ \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right) $ could be obtained using two strategies. The first, more accurate but also time consuming, consists of defining a grid of points covering the area where most of the mass of $ \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right) $ is located (GRID strategy). The second one, named central composite design (CCD strategy), consists in laying out a small amount of “points” in a m-dimensional space in order to estimate the curvature of $ \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right) $ (Rue et al. 2009). In this paper the CCD strategy has been used as it needs much less computational time and the differences between the CCD and GRID strategies are expected to be minor. Moreover, the CCD strategy is recommended for problems with high dimensionality of the hyperparameter vector θ.

Secondly, in order to approximate $ \tilde{\pi }\left( {x_{i} |\varvec{\theta},\varvec{y}} \right) $, a Gaussian approximation $ \tilde{\pi }_{G} \left( {x_{i} |\varvec{\theta},\varvec{y}} \right) = N\left( {x_{i} ;\mu_{i} \left(\varvec{\theta}\right)\sigma_{i}^{2} \left(\varvec{\theta}\right)} \right) $ could be used. However, the approximation can be improved using Laplace approximation or a simplified Laplace approximation based on the skew-normal distribution (Rue et al. 2009). In this paper the Laplace approximation has been used (Tierney and Kadane 1986). This is the most time consuming approximation but on the other hand is the most accurate.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marí-Dell’Olmo, M., Martinez-Beneito, M.A., Gotsens, M. et al. A smoothed ANOVA model for multivariate ecological regression. Stoch Environ Res Risk Assess 28, 695–706 (2014). https://doi.org/10.1007/s00477-013-0782-2

Download citation

Published: 18 August 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s00477-013-0782-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A smoothed ANOVA model for multivariate ecological regression

Abstract

Access this article

Similar content being viewed by others

High-dimensional order-free multivariate spatial disease mapping

Mortality by cause of death in Colombia: a local analysis using spatial econometrics

Spatial Statistics and Health Sciences: Methods and Applications

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (DOCX 21 kb)

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A smoothed ANOVA model for multivariate ecological regression

Abstract

Access this article

Similar content being viewed by others

High-dimensional order-free multivariate spatial disease mapping

Mortality by cause of death in Colombia: a local analysis using spatial econometrics

Spatial Statistics and Health Sciences: Methods and Applications

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (DOCX 21 kb)

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation