Abstract
Smoothed analysis of variance (SANOVA) has recently been proposed for carrying out disease mapping. The main advantage of this approach is its conceptual simplicity and ease of interpretation. Moreover, it allows us to fix the combination of diseases of particular interest in advance and to make specific inferences about them. In this paper we propose a reformulation of SANOVA in the context of ecological regression studies. This proposal considers the introduction in a non-parametric way of one (or several) covariate(s) into the model, explaining some pre-specified combinations of the outcome variables. In addition, random effects are also incorporated in order to model geographical variation in the combinations of outcome variables not explained by the covariate. Lastly, the model permits the decomposition of the variance in the set of outcome variables into different orthogonal components, quantifying the contribution of every one of them. The proposed model is applied to the geographical analysis of mortality due to malignant stomach neoplasm among women resident in the city of Barcelona (Spain). The available outcome variables are deaths grouped into two time periods, and a socioeconomic deprivation index is included as a covariate. The model has been implemented through INLA, a novel inference tool for Bayesian statistics.
Similar content being viewed by others
References
Banerjee S, Wall MM, Carlin BP (2003) Frailty modeling for spatially correlated survival data, with application to infant mortality in Minnesota. Biostatistics (Oxford, England) 4:123–142
Barceló MA, Saez M, Cano-Serral G, Martinez-Beneito MA, Martinez JM, Borrell C, Ocaña-Riola R, Montoya I, Calvo M, López-Abente G, Rodriguez-Sanz M, Toro S, Alcalá JT, Saurina C, Sánchez-Villegas P, Figueiras A (2008) Métodos para la suavización de indicadores de mortalidad: aplicación al análisis de desigualdades en mortalidad en ciudades del Estado español (Proyecto MEDEA). Gaceta sanitaria/S.E.S.P.A.S 22:596–608
Besag J, York J, Mollié A (1991) Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 43:1–59
Borrell C, Marí-Dell’Olmo M, Serral G, Martinez-Beneito MA, Gotsens M, Other MEDEA members. (2010) Inequalities in mortality in small areas of eleven Spanish cities (the multicenter MEDEA project). Health & Place 16:703–711
Botella-Rocamora P, López-Quílez A, Martinez-Beneito MA (2013) Spatial moving average risk smoothing. Stat Med 32:2595–2612
Clayton DG, Bernardinelli L, Montomoli C (1993) Spatial correlation in ecological analysis. Int J Epidemiol 22:1193–1202
Dominguez-Berjon MF, Borrell C, Cano-Serral G, Esnaola S, Nolasco A, Pasarin MI, Ramis R, Saurina C, Escolar-Pujolar A (2008) Constructing a deprivation index based on census data in large Spanish cities (the MEDEA project). Gaceta sanitaria/S.E.S.P.A.S 22:179–187
Gelman A (2006) Prior distributions for variance parameters in hierarchical models. Bayesian Anal 1:515–533
Hodges JS, Reich BJ (2010) Adding spatially-correlated errors can mess up the fixed effect you love. Am Stat 64:325–334
Hodges JS, Cui Y, Sargent DJ, Carlin BP (2007) Smoothing balanced single-error-term analysis of variance. Technometrics 49:12–25
Hughes J, Haran M (2013) Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J R Stat Soc 75:139–159
Knorr-Held L, Best NG (2001) A shared component model for detecting joint and selective clustering of two diseases. J R Stat Soc 164:73–85
Liu X, Wall MM, Hodges JS (2005) Generalized spatial structural equation models. Biostatistics (Oxford, England) 6:539–557
Mardia KV (1988) Multi-dimensional multivariate Gaussian Markov random fields with application to image processing. J Multivar Anal 24:265–284
Marí-Dell’Olmo M, Martinez-Beneito MA, Borrell C, Zurriaga O, Nolasco A, Dominguez-Berjon MF (2011) Bayesian factor analysis to calculate a deprivation index and its uncertainty. Epidemiology (Cambridge, Mass) 22:356–364
Muñoz F, Pennino MG, Conesa D, López-Quí-lez A, Bellido J (2013) Estimation and prediction of the spatial occurrence of fish species using Bayesian latent Gaussian models. Stoch Environ Res Risk Assess 27:1171–1180
R Development Core Team (2012). R: a language and environment for statistical computing
Reich BJ, Hodges JS, Zadnik V (2006) Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62:1197–1206
Rue H, Held L (2005) Gaussian Markov random fields: theory and applications (Chapman & Hall/CRC monographs on statistics & applied probability): Chapman and Hall/CRC, London
Rue H, Martino S (2009) INLA: functions which allow to perform a full Bayesian analysis of structured additive models using integrated nested laplace approximaxion. R package version 0.0
Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc 71:319–392
Schrödle B, Held L (2011a) A primer on disease mapping and ecological regression using INLA. Comput Stat 26:241–258
Schrödle B, Held L (2011b) Spatio-temporal disease mapping using INLA. Environmetrics 22(6):725–734
Tierney L, Kadane JB (1986) Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc 81:82–86
Tzala E, Best N (2008) Bayesian latent variable modelling of multivariate spatio-temporal variation in cancer mortality. Stat Methods Med Res 17:97–118
Ugarte M, Ibáñez B, Militino A (2005) Detection of spatial variation in risk when using CAR models for smoothing relative risks. Stoch Environ Res Risk Assess 19:33–40
Wang F, Wall MM (2003) Generalized common spatial factor model. Biostatistics (Oxford, England) 4:569–582
Zhang Y, Hodges JS, Banerjee S (2009) Smoothed ANOVA with spatial effects as a competitor to MCAR in multivariate spatial smoothing. Ann Appl Stat 3:1805
Acknowledgments
We wish to thank Carme Borrell and one reviewer for their valuable suggestions on the article. We also wish to thank Prof. Håvard Rue for his support with the R-INLA implementation of the model proposed in this study.
Funding
This article has been partially funded by: “Plan Nacional de I+D+I 2008–2011” and the “ISCIII –Subdirección General de Evaluación y Fomento de la Investigación-” (Project number PI081488); the project INEQ-CITIES, “Socioeconomic inequalities in mortality: evidence and policies of cities of Europe”, project funded by the Executive Agency for Health and Consumers (Commission of the European Union), Project No. 2008 12 13; and by CIBER Epidemiología y Salud Pública (CIBERESP), Spain.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article has been included in the doctoral thesis of one of the authors (Marc Marí-Dell’Olmo), being carried out at Pompeu Fabra University.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
In what follows we explain briefly the general ideas involved in the Integrated nested Laplace approximations (INLA) inference method, the approach used in this paper to approximate the marginal posterior density of the parameters in our multivariate ecological regression model.
Consider a three-stage Bayesian hierarchical model based on an observation model \( \pi \left( {\varvec{y}|\varvec{x}} \right) = \mathop \prod_{i} \pi \left( {y_{i} |x_{i} } \right) \), a parameter model \( \pi \left( {\varvec{x}|\varvec{\theta}} \right) \), and a hyperprior \( \pi \left(\varvec{\theta}\right) \). Here \( y = \left( {y_{1} , \ldots ,y_{n} } \right) \) denotes the observed data, x are unknown parameters which typically follow a Gaussian Markov random field (GMRF) and θ are unknown hyperparameters. The marginal posterior density of x i can be written as
and can be approximated by the finite sum
Integrated nested Laplace approximations (INLA) approximates these marginal posterior densities using an approximation \( \tilde{\pi }\left( {x_{i} |\varvec{\theta},\varvec{y}} \right) \) of \( \pi \left( {x_{i} |\varvec{\theta},\varvec{y}} \right) \) and an additional approximation \( \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right) \) of the marginal posterior density of the hyperparameters \( \pi \left( {\varvec{\theta}|\varvec{y}} \right) \). \( \Updelta_{k} \) are chosen appropriately.
To approximate \( \pi \left( {\varvec{\theta}|\varvec{y}} \right) \) we consider:
therefore for any \( x \)
INLA approximates \( \pi \left( {\varvec{\theta}|\varvec{y}} \right) \) using a Laplace approximation (Tierney and Kadane 1986):
where the denominator \( \tilde{\pi }_{G} \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right) \) is the Gaussian approximation of \( \pi \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right) \) and \( \varvec{x}^{ * } \left(\varvec{\theta}\right) \) is the mode of the full conditional \( \pi \left( {\varvec{x}|\varvec{\theta},\varvec{y}} \right) \) (Rue and Held 2005; Rue et al. 2009). Posterior marginals for the hyperparameters \( \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right) \) could be obtained using two strategies. The first, more accurate but also time consuming, consists of defining a grid of points covering the area where most of the mass of \( \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right) \) is located (GRID strategy). The second one, named central composite design (CCD strategy), consists in laying out a small amount of “points” in a m-dimensional space in order to estimate the curvature of \( \tilde{\pi }\left( {\varvec{\theta}|\varvec{y}} \right) \) (Rue et al. 2009). In this paper the CCD strategy has been used as it needs much less computational time and the differences between the CCD and GRID strategies are expected to be minor. Moreover, the CCD strategy is recommended for problems with high dimensionality of the hyperparameter vector θ.
Secondly, in order to approximate \( \tilde{\pi }\left( {x_{i} |\varvec{\theta},\varvec{y}} \right) \), a Gaussian approximation \( \tilde{\pi }_{G} \left( {x_{i} |\varvec{\theta},\varvec{y}} \right) = N\left( {x_{i} ;\mu_{i} \left(\varvec{\theta}\right)\sigma_{i}^{2} \left(\varvec{\theta}\right)} \right) \) could be used. However, the approximation can be improved using Laplace approximation or a simplified Laplace approximation based on the skew-normal distribution (Rue et al. 2009). In this paper the Laplace approximation has been used (Tierney and Kadane 1986). This is the most time consuming approximation but on the other hand is the most accurate.
Rights and permissions
About this article
Cite this article
Marí-Dell’Olmo, M., Martinez-Beneito, M.A., Gotsens, M. et al. A smoothed ANOVA model for multivariate ecological regression. Stoch Environ Res Risk Assess 28, 695–706 (2014). https://doi.org/10.1007/s00477-013-0782-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-013-0782-2