Skip to main content

A Sparse Areal Mixed Model for Multivariate Outcomes, with an Application to Zero-Inflated Census Data

  • Chapter
  • First Online:
Modern Statistical Methods for Spatial and Multivariate Data

Abstract

Multivariate areal data are common in many disciplines. When fitting spatial regressions for such data, one needs to account for dependence (both among and within areal units) to ensure reliable inference for the regression coefficients. Traditional multivariate conditional autoregressive (MCAR) models offer a popular and flexible approach to modeling such data, but the MCAR models suffer from two major shortcomings: (1) bias and variance inflation due to spatial confounding, and (2) high-dimensional spatial random effects that make fully Bayesian inference for such models computationally challenging. We propose the multivariate sparse areal mixed model (MSAMM) as an alternative to the MCAR models. Since the MSAMM extends the univariate SAMM, the MSAMM alleviates spatial confounding and speeds computation by greatly reducing the dimension of the spatial random effects. We specialize the MSAMM to handle zero-inflated count data, and apply our zero-inflated model to simulated data and to a large Census dataset for the state of Iowa.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Agarwal, D.K., Gelfand, A.E., Citron-Pousty, S.: Zero-inflated models with application to spatial count data. Environ. Ecol. Stat. 9(4), 341–355 (2002)

    Article  MathSciNet  Google Scholar 

  • Alfó, M., Nieddu, L., Vicari, D.: Finite mixture models for mapping spatially dependent disease counts. Biom. J. 51(1), 84–97 (2009). http://dx.doi.org/10.1002/bimj.200810494

    Article  MathSciNet  Google Scholar 

  • Assunção, R., Krainski, E.: Neighborhood dependence in Bayesian spatial models. Biom. J. 51(5), 851–869 (2009)

    Article  MathSciNet  Google Scholar 

  • Barnard, J., McCulloch, R., Meng, X.L.: Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Stat. Sin. 10(4), 1281–1312 (2000)

    MathSciNet  MATH  Google Scholar 

  • Besag, J., Kooperberg, C.: On conditional and intrinsic autoregression. Biometrika 82(4), 733–746 (1995)

    MathSciNet  MATH  Google Scholar 

  • Boots, B., Tiefelsdorf, M.: Global and local spatial autocorrelation in bounded regular tessellations. J. Geogr. Syst. 2(4), 319 (2000)

    Article  Google Scholar 

  • Boucher, J.P., Denuit, M., Guillen, M.: Number of accidents or number of claims? An approach with zero-inflated Poisson models for panel data. J. Risk Insur. 76(4), 821–846 (2009)

    Google Scholar 

  • Bradley, J.R., Holan, S.H., Wikle, C.K.: Multivariate spatio-temporal models for high-dimensional areal data with application to longitudinal employer-household dynamics. Ann. Appl. Stat. 9(4), 1761–1791 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Burnham, K.P., Anderson, D.R., Huyvaert, K.P.: AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behav. Ecol. Sociobiol. 65(1), 23–35 (2011)

    Article  Google Scholar 

  • Carlin, B.P., Banerjee, S.: Hierarchical multivariate CAR models for spatio-temporally correlated survival data (with discussion). In: Bayarri, M., Berger, J., Bernardo, J., Dawid, A., Heckerman, D., Smith, A., West, M. (eds.), Bayesian Statistics 7, pp. 45–63. Oxford University Press, New York (2003)

    Google Scholar 

  • Clayton, D., Bernardinelli, L., Montomoli, C.: Spatial correlation in ecological analysis. Int. J. Epidemiol. 22(6), 1193–1202 (1993)

    Article  Google Scholar 

  • Cohen, A.C.: Estimating the parameter in a conditional Poisson distribution. Biometrics 16(2), 203–211 (1960)

    Article  MathSciNet  MATH  Google Scholar 

  • Cook, T., Norwood, J., Cork, D., Panel to Review the 2010 Census, Committee on National Statistics, Division of Behavioral and Social Sciences and Education, National Research Council: Change and the 2020 Census: Not Whether But How. National Academies Press, Washington, D.C. (2011)

    Google Scholar 

  • Donoho, D.L., Elad, M.: Optimally sparse representation in general (nonorthogonal) dictionaries via â„“ 1 minimization. Proc. Natl. Acad. Sci. 100(5), 2197–2202 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Eddelbuettel, D., Francois, R.: Rcpp: Seamless R and C++ integration. J. Stat. Softw. 40(8), 1–18 (2011)

    Article  Google Scholar 

  • Eddelbuettel, D., Sanderson, C.: RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Comput. Stat. Data Anal. 71, 1054–1063 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Flegal, J.M., Haran, M., Jones, G.L.: Markov chain Monte Carlo: can we trust the third significant figure? Stat. Sci. 23(2), 250–260 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Gelfand, A.E., Vounatsou, P.: Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics 4(1), 11–15 (2003)

    Article  MATH  Google Scholar 

  • Green, P.J., Richardson, S.: Hidden Markov models and disease mapping. J. Am. Stat. Assoc. 97(460), 1055–1070 (2002). https://doi.org/10.1198/016214502388618870

    Article  MathSciNet  MATH  Google Scholar 

  • Griffith, D.A.: Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization. Springer, Berlin (2003)

    Book  Google Scholar 

  • Haran, M., Hughes, J.: batchmeans: consistent batch means estimation of Monte Carlo standard errors. Denver (2016)

    Google Scholar 

  • Haran, M., Hodges, J., Carlin, B.: Accelerating computation in Markov random field models for spatial data via structured MCMC. J. Comput. Graph. Stat. 12(2), 249–264 (2003)

    Article  MathSciNet  Google Scholar 

  • Haran, M., Tierney, L.: On automating Markov chain Monte Carlo for a class of spatial models. Preprint (2012). arXiv:12050499

    Google Scholar 

  • Hodges, J., Reich, B.: Adding spatially-correlated errors can mess up the fixed effect you love. Am. Stat. 64(4), 325–334 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Huang, A., Wand, M.: Simple marginally noninformative prior distributions for covariance matrices. Bayesian Anal. 8(2), 439–452 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Hughes, J., Haran, M.: Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J. R. Stat. Soc. Ser. B Stat. Methodol. 75(1), 139–159 (2013)

    Article  MathSciNet  Google Scholar 

  • Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996)

    Google Scholar 

  • Jin, X., Carlin, B.P., Banerjee, S.: Generalized hierarchical multivariate CAR models for areal data. Biometrics 61(4), 950–961 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Knorr-Held, L., Rue, H.: On block updating in Markov random field models for disease mapping. Scand. J. Stat. 29(4), 597–614 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Lambert, D.: Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34(1), 1–14 (1992)

    Article  MATH  Google Scholar 

  • Leroux, B.G., Lei, X., Breslow, N.: Estimation of disease rates in small areas: a new mixed model for spatial dependence. Inst. Math. Appl. 116, 179–191 (2000)

    MathSciNet  MATH  Google Scholar 

  • Lewandowski, D., Kurowicka, D., Joe, H.: Generating random correlation matrices based on vines and extended onion method. J. Multivar. Anal. 100(9), 1989–2001 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Martinez-Beneito, M.A.: A general modelling framework for multivariate disease mapping. Biometrika 100(3), 539–553 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Moran, P.: Notes on continuous stochastic phenomena. Biometrika 37(1/2), 17–23 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  • Neelon, B., Ghosh, P., Loebs, P.F.: A spatial Poisson hurdle model for exploring geographic variation in emergency department visits. J. R. Stat. Soc. Ser. A Stat. Soc. 176(2), 389–413 (2013)

    Article  MathSciNet  Google Scholar 

  • Neelon, B., Zhu, L., Neelon, S.E.B.: Bayesian two-part spatial models for semicontinuous data with application to emergency department expenditures. Biostatistics 16(3), 465–479 (2015)

    Article  MathSciNet  Google Scholar 

  • Qiu, Y.: Spectra: sparse eigenvalue computation toolkit as a redesigned ARPACK. http://spectralib.org (2017)

  • Rathbun, S.L., Fei, S.: A spatial zero-inflated Poisson regression model for oak regeneration. Environ. Ecol. Stat. 13(4):409–426 (2006)

    Article  MathSciNet  Google Scholar 

  • Recta, V., Haran, M., Rosenberger, J.L.: A two-stage model for incidence and prevalence in point-level spatial count data. Environmetrics 23(2), 162–174 (2012)

    Article  MathSciNet  Google Scholar 

  • Reich, B., Hodges, J., Zadnik, V.: Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62(4), 1197–1206 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Sanderson, C.: Armadillo: an open source C++ linear algebra library for fast prototyping and computationally intensive experiments. Technical Report; NICTA (2010)

    Google Scholar 

  • Singh, J.: A characterization of positive Poisson distribution and its statistical application. SIAM J. Appl. Math. 34(3), 545–548 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64(4), 583–639 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Stroustrup, B.: The C++ Programming Language. Pearson Education, New Jersey (2013)

    MATH  Google Scholar 

  • Tiefelsdorf, M., Griffith, D.A.: Semiparametric filtering of spatial autocorrelation: the eigenvector approach. Environ. Plan. A 39(5), 1193 (2007)

    Article  Google Scholar 

  • Torabi, M.: Hierarchical multivariate mixture generalized linear models for the analysis of spatial data: an application to disease mapping. Biom. J. 58(5), 1138–1150 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • U.S. Census Bureau: 2020 Census operational plan: a new design for the 21st century (2015)

    Google Scholar 

  • Ver Hoef, J.M., Jansen, J.K.: Space-time zero-inflated count models of harbor seals. Environmetrics 18(7), 697–712 (2007)

    Article  MathSciNet  Google Scholar 

  • Wall, M.: A close look at the spatial structure implied by the CAR and SAR models. J. Stat. Plan. Inference 121(2), 311–324 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Wikle, C.K., Anderson, C.J.: Climatological analysis of tornado report counts using a hierarchical Bayesian spatiotemporal model. J. Geophys. Res. Atmos. (1984–2012) 108(D24), 1–15 (2003). https://doi.org/10.1029/2002JD002806

    Article  Google Scholar 

  • Young, D.S., Raim, A.M., Johnson, N.R.: Zero-inflated modelling for characterizing coverage errors of extracts from the US Census Bureau’s Master Address File. J. R. Stat. Soc. Ser. A Stat. Soc. 180(1), 73–97 (2017)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derek S. Young .

Editor information

Editors and Affiliations

Appendix: Supplementary Materials

Appendix: Supplementary Materials

1.1 I Multivariate Spatial Effect Reparameterization

For the multivariate sparse areal mixed model (MSAMM), when the design matrices are the same across multivariate outcomes, i.e., X 1 = X 2 = ⋯ = X J, the first and second stages can be written as

$$\displaystyle \begin{aligned} g_j\left\{\mathbb{E}\left(\boldsymbol{y}_j\mid\boldsymbol{\beta}_j,\,\boldsymbol{\delta}_{sj}\right)\right\} & = \mathbf{X}\boldsymbol{\beta}_j+\mathbf{M}\boldsymbol{\delta}_{sj}\;\;\;\;\;\;\;\;\;( j=1,\dots,J)\\ p\left(\boldsymbol{\Delta}\mid\boldsymbol{\Sigma}\right) & = \mathcal{N}\left(\mathbf{0},\,\boldsymbol{\Sigma}\otimes{\mathbf{Q}}_s^{-1}\right), \end{aligned} $$

where \(\boldsymbol {\Delta }=\left (\boldsymbol {\delta }_{s1}^{\prime },\dots ,\boldsymbol {\delta }_{sJ}^{\prime }\right )'\), each δ sj is q × 1, Σ is the J × J covariance matrix, and Q s is the q × q spatial precision matrix.

Computation can be eased considerably as follows. Let R s be the upper Cholesky triangle of Q s, and let \({\mathbf {W}}_s={\mathbf {R}}_s^{-1}\) such that \({\mathbf {W}}_s{\mathbf {W}}_s^{\prime }={\mathbf {Q}}_s^{-1}\). Then, for \(\boldsymbol {\Psi }=\left (\boldsymbol {\psi }_{s1}^{\prime },\dots ,\boldsymbol {\psi }_{sJ}^{\prime }\right )'\), each ψ sj is q × 1, and \(\boldsymbol {\Psi }\mid \boldsymbol {\Sigma }\sim \mathcal {N}\left (\mathbf {0},\,\boldsymbol {\Sigma }\otimes {\mathbf {I}}_q\right )\), we have that \(\left ({\mathbf {I}}_J\otimes {\mathbf {W}}_s\right )\boldsymbol {\Psi }\) and Δ have the same distribution conditional on Σ. This is easy to see since \(\mathbb {E}\left \{\left ({\mathbf {I}}_J\otimes {\mathbf {W}}_s\right )\boldsymbol {\Psi }\right \}=\left ({\mathbf {I}}_J\otimes {\mathbf {W}}_s\right )\mathbb {E}\left (\boldsymbol {\Psi }\right )=\mathbf {0}\) and

$$\displaystyle \begin{aligned} \text{cov}\left\{\left({\mathbf{I}}_J\otimes{\mathbf{W}}_s\right)\boldsymbol{\Psi}\right\} & = \left({\mathbf{I}}_J\otimes{\mathbf{W}}_s\right)\left(\boldsymbol{\Sigma}\otimes{\mathbf{I}}_q\right)\left({\mathbf{I}}_J\otimes{\mathbf{W}}_s\right)'\\ & = \boldsymbol{\Sigma}\otimes{\mathbf{Q}}_s^{-1}. \end{aligned} $$

Hence, the model’s first and second stages can now be written as

$$\displaystyle \begin{aligned} g_j\left\{\mathbb{E}\left(\boldsymbol{y}_j\mid\boldsymbol{\beta}_j,\,\boldsymbol{\psi}_{sj}\right)\right\} & = \mathbf{X}\boldsymbol{\beta}_j+\mathbf{M}{\mathbf{W}}_s\boldsymbol{\psi}_{sj}\;\;\;\;\;\;\;( j=1,\dots, J)\\ p\left(\boldsymbol{\Psi}\mid\boldsymbol{\Sigma}\right) & = \mathcal{N}\left(\mathbf{0},\,\boldsymbol{\Sigma}\otimes{\mathbf{I}}_q\right). \end{aligned} $$

Now suppose that X 1≠X 2≠⋯≠X J. Then we have

$$\displaystyle \begin{aligned} g_j\left\{ \mathbb{E}\left(\boldsymbol{y}_j\mid\boldsymbol{\beta}_j,\,\boldsymbol{\delta}_{sj}\right)\right\} & = {\mathbf{X}}_j\boldsymbol{\beta}_j+{\mathbf{M}}_j\boldsymbol{\delta}_{sj}\\ p\left(\boldsymbol{\Delta}\mid\boldsymbol{\Sigma}\right) & = \mathcal{N}\left[\mathbf{0},\,\left\{\mathbf{R}'\left(\boldsymbol{\Sigma}^{-1}\otimes {\mathbf{I}}_q\right)\mathbf{R}\right\}^{-1}\right], \end{aligned} $$

where \(\boldsymbol {\Delta }=\left (\boldsymbol {\delta }_{s1}^{\prime },\dots ,\boldsymbol {\delta }_{sJ}^{\prime }\right )'\), \(\mathbf {R}=\mbox{bdiag}\left ({\mathbf {R}}_{s1},\dots , {\mathbf {R}}_{sJ}\right )\), and \({\mathbf {R}}_{sj}^{\prime }{\mathbf {R}}_{sj}={\mathbf {Q}}_{sj}\), where R sj is the upper Cholesky triangle of Q sj. For ease of exposition, let J = 2 (the following easily extends to the case when J > 2). The prior distribution of the spatial effects can be written

$$\displaystyle \begin{aligned} \begin{pmatrix} \boldsymbol{\delta}_{s1}\\ \boldsymbol{\delta}_{s2} \end{pmatrix}\mid\boldsymbol{\Sigma} & \;\;\sim\;\; \mathcal{N}\left[\begin{pmatrix} \mathbf{0}\\ \mathbf{0} \end{pmatrix},\,\left\{\begin{pmatrix} {\mathbf{R}}_{s1} & \mathbf{0}\\ \mathbf{0} & {\mathbf{R}}_{s2} \end{pmatrix}'\left(\boldsymbol{\Sigma}^{-1}\otimes {\mathbf{I}}_q\right)\begin{pmatrix} {\mathbf{R}}_{s1} & \mathbf{0}\\ \mathbf{0} & {\mathbf{R}}_{s2} \end{pmatrix}\right\} ^{-1}\right]\\ {} & \;\;=\;\; \mathcal{N}\left[\begin{pmatrix} \mathbf{0}\\ \mathbf{0} \end{pmatrix},\,\begin{pmatrix} {\mathbf{W}}_{s1} & \mathbf{0}\\ \mathbf{0} & {\mathbf{W}}_{s2} \end{pmatrix}\left(\boldsymbol{\Sigma}\otimes {\mathbf{I}}_q\right)\begin{pmatrix} {\mathbf{W}}_{s1} & \mathbf{0}\\ \mathbf{0} & {\mathbf{W}}_{s2} \end{pmatrix}'\right], \end{aligned} $$

where \({\mathbf {W}}_{sj}={\mathbf {R}}_{sj}^{-1}\) (j = 1, 2), and we have used the fact that \(\left ({\mathbf {R}}_{sj}^{-1}\right )'=\left ({\mathbf {R}}_{sj}^{\prime }\right )^{-1}\). Now, suppose we have

$$\displaystyle \begin{aligned} \begin{pmatrix} \boldsymbol{\psi}_{s1}\\ \boldsymbol{\psi}_{s2} \end{pmatrix}\mid\boldsymbol{\Sigma}\;\;\sim\;\;\mathcal{N}\left\{ \begin{pmatrix} \mathbf{0}\\ \mathbf{0} \end{pmatrix},\,\boldsymbol{\Sigma}\otimes {\mathbf{I}}_q\right\}. \end{aligned}$$

Using basic properties of the multivariate normal distribution, we have that

$$\displaystyle \begin{aligned} \begin{pmatrix} {\mathbf{W}}_{s1} & \mathbf{0}\\ \mathbf{0} & {\mathbf{W}}_{s2} \end{pmatrix}\begin{pmatrix} \boldsymbol{\psi}_{s1}\\ \boldsymbol{\psi}_{s2} \end{pmatrix}\mid\boldsymbol{\Sigma}\;\;\sim\;\;\mathcal{N}\left\{ \begin{pmatrix} \mathbf{0}\\ \mathbf{0} \end{pmatrix},\,\begin{pmatrix} {\mathbf{W}}_{s1} & \mathbf{0}\\ \mathbf{0} & {\mathbf{W}}_{s2} \end{pmatrix}\left(\boldsymbol{\Sigma}\otimes {\mathbf{I}}_q\right)\begin{pmatrix} {\mathbf{W}}_{s1} & \mathbf{0}\\ \mathbf{0} & {\mathbf{W}}_{s2} \end{pmatrix}'\right\} . \end{aligned}$$

Then, since

$$\displaystyle \begin{aligned} \begin{pmatrix} {\mathbf{W}}_{s1} & \mathbf{0}\\ \mathbf{0} & {\mathbf{W}}_{s2} \end{pmatrix}\begin{pmatrix} \boldsymbol{\psi}_{s1}\\ \boldsymbol{\psi}_{s2} \end{pmatrix}=\begin{pmatrix} {\mathbf{W}}_{s1}\boldsymbol{\psi}_{s1}\\ {\mathbf{W}}_{s2}\boldsymbol{\psi}_{s2} \end{pmatrix}, \end{aligned}$$

we can apply a reparameterization similar to the case where design matrices are equivalent across the outcomes. Thus we can specify the first and second stages of the model as

$$\displaystyle \begin{aligned} g_j\left\{ \mathbb{E}\left(\boldsymbol{y}_j\mid\boldsymbol{\beta}_j,\,\boldsymbol{\psi}_{sj}\right)\right\} & = {\mathbf{X}}_j\boldsymbol{\beta}_j+{\mathbf{M}}_j{\mathbf{W}}_{sj}\boldsymbol{\psi}_{sj}\\ p\left(\boldsymbol{\Psi}\mid\boldsymbol{\Sigma}\right) & = \mathcal{N}\left(\mathbf{0},\,\boldsymbol{\Sigma}\otimes {\mathbf{I}}_q\right). \end{aligned} $$

1.2 I Extended Simulation Results

Table 3 provides complete results for our simulation study.

Table 3 Extended results for our simulation study

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Musgrove, D., Young, D.S., Hughes, J., Eberly, L.E. (2019). A Sparse Areal Mixed Model for Multivariate Outcomes, with an Application to Zero-Inflated Census Data. In: Diawara, N. (eds) Modern Statistical Methods for Spatial and Multivariate Data. STEAM-H: Science, Technology, Engineering, Agriculture, Mathematics & Health. Springer, Cham. https://doi.org/10.1007/978-3-030-11431-2_3

Download citation

Publish with us

Policies and ethics