Skip to main content

Advertisement

Log in

A smoothed ANOVA model for multivariate ecological regression

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Smoothed analysis of variance (SANOVA) has recently been proposed for carrying out disease mapping. The main advantage of this approach is its conceptual simplicity and ease of interpretation. Moreover, it allows us to fix the combination of diseases of particular interest in advance and to make specific inferences about them. In this paper we propose a reformulation of SANOVA in the context of ecological regression studies. This proposal considers the introduction in a non-parametric way of one (or several) covariate(s) into the model, explaining some pre-specified combinations of the outcome variables. In addition, random effects are also incorporated in order to model geographical variation in the combinations of outcome variables not explained by the covariate. Lastly, the model permits the decomposition of the variance in the set of outcome variables into different orthogonal components, quantifying the contribution of every one of them. The proposed model is applied to the geographical analysis of mortality due to malignant stomach neoplasm among women resident in the city of Barcelona (Spain). The available outcome variables are deaths grouped into two time periods, and a socioeconomic deprivation index is included as a covariate. The model has been implemented through INLA, a novel inference tool for Bayesian statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Banerjee S, Wall MM, Carlin BP (2003) Frailty modeling for spatially correlated survival data, with application to infant mortality in Minnesota. Biostatistics (Oxford, England) 4:123–142

    Article  Google Scholar 

  • Barceló MA, Saez M, Cano-Serral G, Martinez-Beneito MA, Martinez JM, Borrell C, Ocaña-Riola R, Montoya I, Calvo M, López-Abente G, Rodriguez-Sanz M, Toro S, Alcalá JT, Saurina C, Sánchez-Villegas P, Figueiras A (2008) Métodos para la suavización de indicadores de mortalidad: aplicación al análisis de desigualdades en mortalidad en ciudades del Estado español (Proyecto MEDEA). Gaceta sanitaria/S.E.S.P.A.S 22:596–608

  • Besag J, York J, Mollié A (1991) Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 43:1–59

    Article  Google Scholar 

  • Borrell C, Marí-Dell’Olmo M, Serral G, Martinez-Beneito MA, Gotsens M, Other MEDEA members. (2010) Inequalities in mortality in small areas of eleven Spanish cities (the multicenter MEDEA project). Health & Place 16:703–711

    Article  Google Scholar 

  • Botella-Rocamora P, López-Quílez A, Martinez-Beneito MA (2013) Spatial moving average risk smoothing. Stat Med 32:2595–2612

    Article  CAS  Google Scholar 

  • Clayton DG, Bernardinelli L, Montomoli C (1993) Spatial correlation in ecological analysis. Int J Epidemiol 22:1193–1202

    Article  CAS  Google Scholar 

  • Dominguez-Berjon MF, Borrell C, Cano-Serral G, Esnaola S, Nolasco A, Pasarin MI, Ramis R, Saurina C, Escolar-Pujolar A (2008) Constructing a deprivation index based on census data in large Spanish cities (the MEDEA project). Gaceta sanitaria/S.E.S.P.A.S 22:179–187

  • Gelman A (2006) Prior distributions for variance parameters in hierarchical models. Bayesian Anal 1:515–533

    Article  Google Scholar 

  • Hodges JS, Reich BJ (2010) Adding spatially-correlated errors can mess up the fixed effect you love. Am Stat 64:325–334

    Article  Google Scholar 

  • Hodges JS, Cui Y, Sargent DJ, Carlin BP (2007) Smoothing balanced single-error-term analysis of variance. Technometrics 49:12–25

    Article  Google Scholar 

  • Hughes J, Haran M (2013) Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J R Stat Soc 75:139–159

    Article  Google Scholar 

  • Knorr-Held L, Best NG (2001) A shared component model for detecting joint and selective clustering of two diseases. J R Stat Soc 164:73–85

    Article  Google Scholar 

  • Liu X, Wall MM, Hodges JS (2005) Generalized spatial structural equation models. Biostatistics (Oxford, England) 6:539–557

    Article  Google Scholar 

  • Mardia KV (1988) Multi-dimensional multivariate Gaussian Markov random fields with application to image processing. J Multivar Anal 24:265–284

    Article  Google Scholar 

  • Marí-Dell’Olmo M, Martinez-Beneito MA, Borrell C, Zurriaga O, Nolasco A, Dominguez-Berjon MF (2011) Bayesian factor analysis to calculate a deprivation index and its uncertainty. Epidemiology (Cambridge, Mass) 22:356–364

    Article  Google Scholar 

  • Muñoz F, Pennino MG, Conesa D, López-Quí-lez A, Bellido J (2013) Estimation and prediction of the spatial occurrence of fish species using Bayesian latent Gaussian models. Stoch Environ Res Risk Assess 27:1171–1180

    Article  Google Scholar 

  • R Development Core Team (2012). R: a language and environment for statistical computing

  • Reich BJ, Hodges JS, Zadnik V (2006) Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62:1197–1206

    Article  Google Scholar 

  • Rue H, Held L (2005) Gaussian Markov random fields: theory and applications (Chapman & Hall/CRC monographs on statistics & applied probability): Chapman and Hall/CRC, London

  • Rue H, Martino S (2009) INLA: functions which allow to perform a full Bayesian analysis of structured additive models using integrated nested laplace approximaxion. R package version 0.0

  • Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc 71:319–392

    Article  Google Scholar 

  • Schrödle B, Held L (2011a) A primer on disease mapping and ecological regression using INLA. Comput Stat 26:241–258

    Article  Google Scholar 

  • Schrödle B, Held L (2011b) Spatio-temporal disease mapping using INLA. Environmetrics 22(6):725–734

    Article  Google Scholar 

  • Tierney L, Kadane JB (1986) Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc 81:82–86

    Article  Google Scholar 

  • Tzala E, Best N (2008) Bayesian latent variable modelling of multivariate spatio-temporal variation in cancer mortality. Stat Methods Med Res 17:97–118

    Article  Google Scholar 

  • Ugarte M, Ibáñez B, Militino A (2005) Detection of spatial variation in risk when using CAR models for smoothing relative risks. Stoch Environ Res Risk Assess 19:33–40

    Article  Google Scholar 

  • Wang F, Wall MM (2003) Generalized common spatial factor model. Biostatistics (Oxford, England) 4:569–582

    Article  Google Scholar 

  • Zhang Y, Hodges JS, Banerjee S (2009) Smoothed ANOVA with spatial effects as a competitor to MCAR in multivariate spatial smoothing. Ann Appl Stat 3:1805

    Article  Google Scholar 

Download references

Acknowledgments

We wish to thank Carme Borrell and one reviewer for their valuable suggestions on the article. We also wish to thank Prof. Håvard Rue for his support with the R-INLA implementation of the model proposed in this study.

Funding

This article has been partially funded by: “Plan Nacional de I+D+I 2008–2011” and the “ISCIII –Subdirección General de Evaluación y Fomento de la Investigación-” (Project number PI081488); the project INEQ-CITIES, “Socioeconomic inequalities in mortality: evidence and policies of cities of Europe”, project funded by the Executive Agency for Health and Consumers (Commission of the European Union), Project No. 2008 12 13; and by CIBER Epidemiología y Salud Pública (CIBERESP), Spain.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miguel A. Martinez-Beneito.

Additional information

This article has been included in the doctoral thesis of one of the authors (Marc Marí-Dell’Olmo), being carried out at Pompeu Fabra University.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 21 kb)

Appendix

Appendix

In what follows we explain briefly the general ideas involved in the Integrated nested Laplace approximations (INLA) inference method, the approach used in this paper to approximate the marginal posterior density of the parameters in our multivariate ecological regression model.

Consider a three-stage Bayesian hierarchical model based on an observation model \( \pi \left( {\varvec{y}|\varvec{x}} \right) = \mathop \prod_{i} \pi \left( {y_{i} |x_{i} } \right) \), a parameter model \( \pi \left( {\varvec{x}|\varvec{\theta}} \right) \), and a hyperprior \( \pi \left(\varvec{\theta}\right) \). Here \( y = \left( {y_{1} , \ldots ,y_{n} } \right) \) denotes the observed data, x are unknown parameters which typically follow a Gaussian Markov random field (GMRF) and θ are unknown hyperparameters. The marginal posterior density of x i can be written as

$$ \pi \left( {x_{i} |\varvec{y}} \right) = \mathop \int \limits_{\varvec{\theta}}^{{}} \pi \left( {x_{i} |\varvec{\theta},\varvec{y}} \right)\pi \left( {\varvec{\theta}|\varvec{y}} \right)d\varvec{\theta} $$

and can be approximated by the finite sum

$$ \tilde{\pi }\left( {x_{i} |{\mathbf{y}}} \right) = \mathop \sum \limits_{k} \tilde{\pi }\left( {x_{i} |\varvec{\theta}_{k} ,y} \right)\tilde{\pi }\left( {\varvec{\theta}_{k} |{\mathbf{y}}} \right)\Updelta_{k} $$

Integrated nested Laplace approximations (INLA) approximates these marginal posterior densities using an approximation \( \tilde{\pi }\left( {x_{i} |\varvec{\theta},\varvec{y}} \right) \) of \( \pi \left( {x_{i} |\varvec{\theta},\varvec{y}} \right) \) and an additional approximation \( \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right) \) of the marginal posterior density of the hyperparameters \( \pi \left( {\varvec{\theta}|\varvec{y}} \right) \). \( \Updelta_{k} \) are chosen appropriately.

To approximate \( \pi \left( {\varvec{\theta}|\varvec{y}} \right) \) we consider:

$$ \pi \left( {\user2{x},\varvec{\theta},\user2{y}} \right) =\varvec{\pi} \left( {\user2{x}|\varvec{\theta},\user2{y}} \right) \times \pi \left( {\varvec{\theta}|\user2{y}} \right) \times \pi \left( \user2{y} \right),$$

therefore for any \( x \)

$$ \pi \left( {\varvec{\theta}|\varvec{y}} \right) \propto \frac{{\pi \left( {\varvec{x},\varvec{\theta},\varvec{y}} \right)}}{{\pi \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right)}} $$

INLA approximates \( \pi \left( {\varvec{\theta}|\varvec{y}} \right) \) using a Laplace approximation (Tierney and Kadane 1986):

$$ \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right)\,\propto\,\frac{{\tilde{\pi }\left( {\varvec{x},\varvec{\theta},\varvec{y}} \right)}}{{\tilde{\pi }_{G} \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right)}}|_{{\varvec{x} = \varvec{x}^{*} \left(\varvec{\theta}\right)}} $$

where the denominator \( \tilde{\pi }_{G} \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right) \) is the Gaussian approximation of \( \pi \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right) \) and \( \varvec{x}^{ * } \left(\varvec{\theta}\right) \) is the mode of the full conditional \( \pi \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right) \) (Rue and Held 2005; Rue et al. 2009). Posterior marginals for the hyperparameters \( \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right) \) could be obtained using two strategies. The first, more accurate but also time consuming, consists of defining a grid of points covering the area where most of the mass of \( \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right) \) is located (GRID strategy). The second one, named central composite design (CCD strategy), consists in laying out a small amount of “points” in a m-dimensional space in order to estimate the curvature of \( \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right) \) (Rue et al. 2009). In this paper the CCD strategy has been used as it needs much less computational time and the differences between the CCD and GRID strategies are expected to be minor. Moreover, the CCD strategy is recommended for problems with high dimensionality of the hyperparameter vector θ.

Secondly, in order to approximate \( \tilde{\pi }\left( {x_{i} |\varvec{\theta},\varvec{y}} \right) \), a Gaussian approximation \( \tilde{\pi }_{G} \left( {x_{i} |\varvec{\theta},\varvec{y}} \right) = N\left( {x_{i} ;\mu_{i} \left(\varvec{\theta}\right)\sigma_{i}^{2} \left(\varvec{\theta}\right)} \right) \) could be used. However, the approximation can be improved using Laplace approximation or a simplified Laplace approximation based on the skew-normal distribution (Rue et al. 2009). In this paper the Laplace approximation has been used (Tierney and Kadane 1986). This is the most time consuming approximation but on the other hand is the most accurate.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marí-Dell’Olmo, M., Martinez-Beneito, M.A., Gotsens, M. et al. A smoothed ANOVA model for multivariate ecological regression. Stoch Environ Res Risk Assess 28, 695–706 (2014). https://doi.org/10.1007/s00477-013-0782-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-013-0782-2

Keywords

Navigation