Abstract
Multivariate areal data are common in many disciplines. When fitting spatial regressions for such data, one needs to account for dependence (both among and within areal units) to ensure reliable inference for the regression coefficients. Traditional multivariate conditional autoregressive (MCAR) models offer a popular and flexible approach to modeling such data, but the MCAR models suffer from two major shortcomings: (1) bias and variance inflation due to spatial confounding, and (2) high-dimensional spatial random effects that make fully Bayesian inference for such models computationally challenging. We propose the multivariate sparse areal mixed model (MSAMM) as an alternative to the MCAR models. Since the MSAMM extends the univariate SAMM, the MSAMM alleviates spatial confounding and speeds computation by greatly reducing the dimension of the spatial random effects. We specialize the MSAMM to handle zero-inflated count data, and apply our zero-inflated model to simulated data and to a large Census dataset for the state of Iowa.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, D.K., Gelfand, A.E., Citron-Pousty, S.: Zero-inflated models with application to spatial count data. Environ. Ecol. Stat. 9(4), 341–355 (2002)
Alfó, M., Nieddu, L., Vicari, D.: Finite mixture models for mapping spatially dependent disease counts. Biom. J. 51(1), 84–97 (2009). http://dx.doi.org/10.1002/bimj.200810494
Assunção, R., Krainski, E.: Neighborhood dependence in Bayesian spatial models. Biom. J. 51(5), 851–869 (2009)
Barnard, J., McCulloch, R., Meng, X.L.: Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Stat. Sin. 10(4), 1281–1312 (2000)
Besag, J., Kooperberg, C.: On conditional and intrinsic autoregression. Biometrika 82(4), 733–746 (1995)
Boots, B., Tiefelsdorf, M.: Global and local spatial autocorrelation in bounded regular tessellations. J. Geogr. Syst. 2(4), 319 (2000)
Boucher, J.P., Denuit, M., Guillen, M.: Number of accidents or number of claims? An approach with zero-inflated Poisson models for panel data. J. Risk Insur. 76(4), 821–846 (2009)
Bradley, J.R., Holan, S.H., Wikle, C.K.: Multivariate spatio-temporal models for high-dimensional areal data with application to longitudinal employer-household dynamics. Ann. Appl. Stat. 9(4), 1761–1791 (2015)
Burnham, K.P., Anderson, D.R., Huyvaert, K.P.: AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behav. Ecol. Sociobiol. 65(1), 23–35 (2011)
Carlin, B.P., Banerjee, S.: Hierarchical multivariate CAR models for spatio-temporally correlated survival data (with discussion). In: Bayarri, M., Berger, J., Bernardo, J., Dawid, A., Heckerman, D., Smith, A., West, M. (eds.), Bayesian Statistics 7, pp. 45–63. Oxford University Press, New York (2003)
Clayton, D., Bernardinelli, L., Montomoli, C.: Spatial correlation in ecological analysis. Int. J. Epidemiol. 22(6), 1193–1202 (1993)
Cohen, A.C.: Estimating the parameter in a conditional Poisson distribution. Biometrics 16(2), 203–211 (1960)
Cook, T., Norwood, J., Cork, D., Panel to Review the 2010 Census, Committee on National Statistics, Division of Behavioral and Social Sciences and Education, National Research Council: Change and the 2020 Census: Not Whether But How. National Academies Press, Washington, D.C. (2011)
Donoho, D.L., Elad, M.: Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ 1 minimization. Proc. Natl. Acad. Sci. 100(5), 2197–2202 (2003)
Eddelbuettel, D., Francois, R.: Rcpp: Seamless R and C++ integration. J. Stat. Softw. 40(8), 1–18 (2011)
Eddelbuettel, D., Sanderson, C.: RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Comput. Stat. Data Anal. 71, 1054–1063 (2014)
Flegal, J.M., Haran, M., Jones, G.L.: Markov chain Monte Carlo: can we trust the third significant figure? Stat. Sci. 23(2), 250–260 (2008)
Gelfand, A.E., Vounatsou, P.: Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics 4(1), 11–15 (2003)
Green, P.J., Richardson, S.: Hidden Markov models and disease mapping. J. Am. Stat. Assoc. 97(460), 1055–1070 (2002). https://doi.org/10.1198/016214502388618870
Griffith, D.A.: Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization. Springer, Berlin (2003)
Haran, M., Hughes, J.: batchmeans: consistent batch means estimation of Monte Carlo standard errors. Denver (2016)
Haran, M., Hodges, J., Carlin, B.: Accelerating computation in Markov random field models for spatial data via structured MCMC. J. Comput. Graph. Stat. 12(2), 249–264 (2003)
Haran, M., Tierney, L.: On automating Markov chain Monte Carlo for a class of spatial models. Preprint (2012). arXiv:12050499
Hodges, J., Reich, B.: Adding spatially-correlated errors can mess up the fixed effect you love. Am. Stat. 64(4), 325–334 (2010)
Huang, A., Wand, M.: Simple marginally noninformative prior distributions for covariance matrices. Bayesian Anal. 8(2), 439–452 (2013)
Hughes, J., Haran, M.: Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J. R. Stat. Soc. Ser. B Stat. Methodol. 75(1), 139–159 (2013)
Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996)
Jin, X., Carlin, B.P., Banerjee, S.: Generalized hierarchical multivariate CAR models for areal data. Biometrics 61(4), 950–961 (2005)
Knorr-Held, L., Rue, H.: On block updating in Markov random field models for disease mapping. Scand. J. Stat. 29(4), 597–614 (2002)
Lambert, D.: Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34(1), 1–14 (1992)
Leroux, B.G., Lei, X., Breslow, N.: Estimation of disease rates in small areas: a new mixed model for spatial dependence. Inst. Math. Appl. 116, 179–191 (2000)
Lewandowski, D., Kurowicka, D., Joe, H.: Generating random correlation matrices based on vines and extended onion method. J. Multivar. Anal. 100(9), 1989–2001 (2009)
Martinez-Beneito, M.A.: A general modelling framework for multivariate disease mapping. Biometrika 100(3), 539–553 (2013)
Moran, P.: Notes on continuous stochastic phenomena. Biometrika 37(1/2), 17–23 (1950)
Neelon, B., Ghosh, P., Loebs, P.F.: A spatial Poisson hurdle model for exploring geographic variation in emergency department visits. J. R. Stat. Soc. Ser. A Stat. Soc. 176(2), 389–413 (2013)
Neelon, B., Zhu, L., Neelon, S.E.B.: Bayesian two-part spatial models for semicontinuous data with application to emergency department expenditures. Biostatistics 16(3), 465–479 (2015)
Qiu, Y.: Spectra: sparse eigenvalue computation toolkit as a redesigned ARPACK. http://spectralib.org (2017)
Rathbun, S.L., Fei, S.: A spatial zero-inflated Poisson regression model for oak regeneration. Environ. Ecol. Stat. 13(4):409–426 (2006)
Recta, V., Haran, M., Rosenberger, J.L.: A two-stage model for incidence and prevalence in point-level spatial count data. Environmetrics 23(2), 162–174 (2012)
Reich, B., Hodges, J., Zadnik, V.: Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62(4), 1197–1206 (2006)
Sanderson, C.: Armadillo: an open source C++ linear algebra library for fast prototyping and computationally intensive experiments. Technical Report; NICTA (2010)
Singh, J.: A characterization of positive Poisson distribution and its statistical application. SIAM J. Appl. Math. 34(3), 545–548 (1978)
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64(4), 583–639 (2002)
Stroustrup, B.: The C++ Programming Language. Pearson Education, New Jersey (2013)
Tiefelsdorf, M., Griffith, D.A.: Semiparametric filtering of spatial autocorrelation: the eigenvector approach. Environ. Plan. A 39(5), 1193 (2007)
Torabi, M.: Hierarchical multivariate mixture generalized linear models for the analysis of spatial data: an application to disease mapping. Biom. J. 58(5), 1138–1150 (2016)
U.S. Census Bureau: 2020 Census operational plan: a new design for the 21st century (2015)
Ver Hoef, J.M., Jansen, J.K.: Space-time zero-inflated count models of harbor seals. Environmetrics 18(7), 697–712 (2007)
Wall, M.: A close look at the spatial structure implied by the CAR and SAR models. J. Stat. Plan. Inference 121(2), 311–324 (2004)
Wikle, C.K., Anderson, C.J.: Climatological analysis of tornado report counts using a hierarchical Bayesian spatiotemporal model. J. Geophys. Res. Atmos. (1984–2012) 108(D24), 1–15 (2003). https://doi.org/10.1029/2002JD002806
Young, D.S., Raim, A.M., Johnson, N.R.: Zero-inflated modelling for characterizing coverage errors of extracts from the US Census Bureau’s Master Address File. J. R. Stat. Soc. Ser. A Stat. Soc. 180(1), 73–97 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Supplementary Materials
Appendix: Supplementary Materials
1.1 I Multivariate Spatial Effect Reparameterization
For the multivariate sparse areal mixed model (MSAMM), when the design matrices are the same across multivariate outcomes, i.e., X 1 = X 2 = ⋯ = X J, the first and second stages can be written as
where \(\boldsymbol {\Delta }=\left (\boldsymbol {\delta }_{s1}^{\prime },\dots ,\boldsymbol {\delta }_{sJ}^{\prime }\right )'\), each δ sj is q × 1, Σ is the J × J covariance matrix, and Q s is the q × q spatial precision matrix.
Computation can be eased considerably as follows. Let R s be the upper Cholesky triangle of Q s, and let \({\mathbf {W}}_s={\mathbf {R}}_s^{-1}\) such that \({\mathbf {W}}_s{\mathbf {W}}_s^{\prime }={\mathbf {Q}}_s^{-1}\). Then, for \(\boldsymbol {\Psi }=\left (\boldsymbol {\psi }_{s1}^{\prime },\dots ,\boldsymbol {\psi }_{sJ}^{\prime }\right )'\), each ψ sj is q × 1, and \(\boldsymbol {\Psi }\mid \boldsymbol {\Sigma }\sim \mathcal {N}\left (\mathbf {0},\,\boldsymbol {\Sigma }\otimes {\mathbf {I}}_q\right )\), we have that \(\left ({\mathbf {I}}_J\otimes {\mathbf {W}}_s\right )\boldsymbol {\Psi }\) and Δ have the same distribution conditional on Σ. This is easy to see since \(\mathbb {E}\left \{\left ({\mathbf {I}}_J\otimes {\mathbf {W}}_s\right )\boldsymbol {\Psi }\right \}=\left ({\mathbf {I}}_J\otimes {\mathbf {W}}_s\right )\mathbb {E}\left (\boldsymbol {\Psi }\right )=\mathbf {0}\) and
Hence, the model’s first and second stages can now be written as
Now suppose that X 1≠X 2≠⋯≠X J. Then we have
where \(\boldsymbol {\Delta }=\left (\boldsymbol {\delta }_{s1}^{\prime },\dots ,\boldsymbol {\delta }_{sJ}^{\prime }\right )'\), \(\mathbf {R}=\mbox{bdiag}\left ({\mathbf {R}}_{s1},\dots , {\mathbf {R}}_{sJ}\right )\), and \({\mathbf {R}}_{sj}^{\prime }{\mathbf {R}}_{sj}={\mathbf {Q}}_{sj}\), where R sj is the upper Cholesky triangle of Q sj. For ease of exposition, let J = 2 (the following easily extends to the case when J > 2). The prior distribution of the spatial effects can be written
where \({\mathbf {W}}_{sj}={\mathbf {R}}_{sj}^{-1}\) (j = 1, 2), and we have used the fact that \(\left ({\mathbf {R}}_{sj}^{-1}\right )'=\left ({\mathbf {R}}_{sj}^{\prime }\right )^{-1}\). Now, suppose we have
Using basic properties of the multivariate normal distribution, we have that
Then, since
we can apply a reparameterization similar to the case where design matrices are equivalent across the outcomes. Thus we can specify the first and second stages of the model as
1.2 I Extended Simulation Results
Table 3 provides complete results for our simulation study.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Musgrove, D., Young, D.S., Hughes, J., Eberly, L.E. (2019). A Sparse Areal Mixed Model for Multivariate Outcomes, with an Application to Zero-Inflated Census Data. In: Diawara, N. (eds) Modern Statistical Methods for Spatial and Multivariate Data. STEAM-H: Science, Technology, Engineering, Agriculture, Mathematics & Health. Springer, Cham. https://doi.org/10.1007/978-3-030-11431-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-11431-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11430-5
Online ISBN: 978-3-030-11431-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)