A Sparse Areal Mixed Model for Multivariate Outcomes, with an Application to Zero-Inflated Census Data

  • Donald Musgrove
  • Derek S. YoungEmail author
  • John Hughes
  • Lynn E. Eberly
Part of the STEAM-H: Science, Technology, Engineering, Agriculture, Mathematics & Health book series (STEAM)


Multivariate areal data are common in many disciplines. When fitting spatial regressions for such data, one needs to account for dependence (both among and within areal units) to ensure reliable inference for the regression coefficients. Traditional multivariate conditional autoregressive (MCAR) models offer a popular and flexible approach to modeling such data, but the MCAR models suffer from two major shortcomings: (1) bias and variance inflation due to spatial confounding, and (2) high-dimensional spatial random effects that make fully Bayesian inference for such models computationally challenging. We propose the multivariate sparse areal mixed model (MSAMM) as an alternative to the MCAR models. Since the MSAMM extends the univariate SAMM, the MSAMM alleviates spatial confounding and speeds computation by greatly reducing the dimension of the spatial random effects. We specialize the MSAMM to handle zero-inflated count data, and apply our zero-inflated model to simulated data and to a large Census dataset for the state of Iowa.


Bayesian hierarchical model Dimension reduction Hurdle model Markov chain Monte Carlo Multivariate spatial data Zero-inflated data 


  1. Agarwal, D.K., Gelfand, A.E., Citron-Pousty, S.: Zero-inflated models with application to spatial count data. Environ. Ecol. Stat. 9(4), 341–355 (2002)MathSciNetCrossRefGoogle Scholar
  2. Alfó, M., Nieddu, L., Vicari, D.: Finite mixture models for mapping spatially dependent disease counts. Biom. J. 51(1), 84–97 (2009). MathSciNetCrossRefGoogle Scholar
  3. Assunção, R., Krainski, E.: Neighborhood dependence in Bayesian spatial models. Biom. J. 51(5), 851–869 (2009)MathSciNetCrossRefGoogle Scholar
  4. Barnard, J., McCulloch, R., Meng, X.L.: Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Stat. Sin. 10(4), 1281–1312 (2000)MathSciNetzbMATHGoogle Scholar
  5. Besag, J., Kooperberg, C.: On conditional and intrinsic autoregression. Biometrika 82(4), 733–746 (1995)MathSciNetzbMATHGoogle Scholar
  6. Boots, B., Tiefelsdorf, M.: Global and local spatial autocorrelation in bounded regular tessellations. J. Geogr. Syst. 2(4), 319 (2000)CrossRefGoogle Scholar
  7. Boucher, J.P., Denuit, M., Guillen, M.: Number of accidents or number of claims? An approach with zero-inflated Poisson models for panel data. J. Risk Insur. 76(4), 821–846 (2009)Google Scholar
  8. Bradley, J.R., Holan, S.H., Wikle, C.K.: Multivariate spatio-temporal models for high-dimensional areal data with application to longitudinal employer-household dynamics. Ann. Appl. Stat. 9(4), 1761–1791 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  9. Burnham, K.P., Anderson, D.R., Huyvaert, K.P.: AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behav. Ecol. Sociobiol. 65(1), 23–35 (2011)CrossRefGoogle Scholar
  10. Carlin, B.P., Banerjee, S.: Hierarchical multivariate CAR models for spatio-temporally correlated survival data (with discussion). In: Bayarri, M., Berger, J., Bernardo, J., Dawid, A., Heckerman, D., Smith, A., West, M. (eds.), Bayesian Statistics 7, pp. 45–63. Oxford University Press, New York (2003)Google Scholar
  11. Clayton, D., Bernardinelli, L., Montomoli, C.: Spatial correlation in ecological analysis. Int. J. Epidemiol. 22(6), 1193–1202 (1993)CrossRefGoogle Scholar
  12. Cohen, A.C.: Estimating the parameter in a conditional Poisson distribution. Biometrics 16(2), 203–211 (1960)MathSciNetzbMATHCrossRefGoogle Scholar
  13. Cook, T., Norwood, J., Cork, D., Panel to Review the 2010 Census, Committee on National Statistics, Division of Behavioral and Social Sciences and Education, National Research Council: Change and the 2020 Census: Not Whether But How. National Academies Press, Washington, D.C. (2011)Google Scholar
  14. Donoho, D.L., Elad, M.: Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization. Proc. Natl. Acad. Sci. 100(5), 2197–2202 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  15. Eddelbuettel, D., Francois, R.: Rcpp: Seamless R and C++ integration. J. Stat. Softw. 40(8), 1–18 (2011)CrossRefGoogle Scholar
  16. Eddelbuettel, D., Sanderson, C.: RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Comput. Stat. Data Anal. 71, 1054–1063 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
  17. Flegal, J.M., Haran, M., Jones, G.L.: Markov chain Monte Carlo: can we trust the third significant figure? Stat. Sci. 23(2), 250–260 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  18. Gelfand, A.E., Vounatsou, P.: Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics 4(1), 11–15 (2003)zbMATHCrossRefGoogle Scholar
  19. Green, P.J., Richardson, S.: Hidden Markov models and disease mapping. J. Am. Stat. Assoc. 97(460), 1055–1070 (2002). MathSciNetzbMATHCrossRefGoogle Scholar
  20. Griffith, D.A.: Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization. Springer, Berlin (2003)CrossRefGoogle Scholar
  21. Haran, M., Hughes, J.: batchmeans: consistent batch means estimation of Monte Carlo standard errors. Denver (2016)Google Scholar
  22. Haran, M., Hodges, J., Carlin, B.: Accelerating computation in Markov random field models for spatial data via structured MCMC. J. Comput. Graph. Stat. 12(2), 249–264 (2003)MathSciNetCrossRefGoogle Scholar
  23. Haran, M., Tierney, L.: On automating Markov chain Monte Carlo for a class of spatial models. Preprint (2012). arXiv:12050499Google Scholar
  24. Hodges, J., Reich, B.: Adding spatially-correlated errors can mess up the fixed effect you love. Am. Stat. 64(4), 325–334 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  25. Huang, A., Wand, M.: Simple marginally noninformative prior distributions for covariance matrices. Bayesian Anal. 8(2), 439–452 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
  26. Hughes, J., Haran, M.: Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J. R. Stat. Soc. Ser. B Stat. Methodol. 75(1), 139–159 (2013)MathSciNetCrossRefGoogle Scholar
  27. Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996)Google Scholar
  28. Jin, X., Carlin, B.P., Banerjee, S.: Generalized hierarchical multivariate CAR models for areal data. Biometrics 61(4), 950–961 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  29. Knorr-Held, L., Rue, H.: On block updating in Markov random field models for disease mapping. Scand. J. Stat. 29(4), 597–614 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  30. Lambert, D.: Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34(1), 1–14 (1992)zbMATHCrossRefGoogle Scholar
  31. Leroux, B.G., Lei, X., Breslow, N.: Estimation of disease rates in small areas: a new mixed model for spatial dependence. Inst. Math. Appl. 116, 179–191 (2000)MathSciNetzbMATHGoogle Scholar
  32. Lewandowski, D., Kurowicka, D., Joe, H.: Generating random correlation matrices based on vines and extended onion method. J. Multivar. Anal. 100(9), 1989–2001 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  33. Martinez-Beneito, M.A.: A general modelling framework for multivariate disease mapping. Biometrika 100(3), 539–553 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
  34. Moran, P.: Notes on continuous stochastic phenomena. Biometrika 37(1/2), 17–23 (1950)MathSciNetzbMATHCrossRefGoogle Scholar
  35. Neelon, B., Ghosh, P., Loebs, P.F.: A spatial Poisson hurdle model for exploring geographic variation in emergency department visits. J. R. Stat. Soc. Ser. A Stat. Soc. 176(2), 389–413 (2013)MathSciNetCrossRefGoogle Scholar
  36. Neelon, B., Zhu, L., Neelon, S.E.B.: Bayesian two-part spatial models for semicontinuous data with application to emergency department expenditures. Biostatistics 16(3), 465–479 (2015)MathSciNetCrossRefGoogle Scholar
  37. Qiu, Y.: Spectra: sparse eigenvalue computation toolkit as a redesigned ARPACK. (2017)
  38. Rathbun, S.L., Fei, S.: A spatial zero-inflated Poisson regression model for oak regeneration. Environ. Ecol. Stat. 13(4):409–426 (2006)MathSciNetCrossRefGoogle Scholar
  39. Recta, V., Haran, M., Rosenberger, J.L.: A two-stage model for incidence and prevalence in point-level spatial count data. Environmetrics 23(2), 162–174 (2012)MathSciNetCrossRefGoogle Scholar
  40. Reich, B., Hodges, J., Zadnik, V.: Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62(4), 1197–1206 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  41. Sanderson, C.: Armadillo: an open source C++ linear algebra library for fast prototyping and computationally intensive experiments. Technical Report; NICTA (2010)Google Scholar
  42. Singh, J.: A characterization of positive Poisson distribution and its statistical application. SIAM J. Appl. Math. 34(3), 545–548 (1978)MathSciNetzbMATHCrossRefGoogle Scholar
  43. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64(4), 583–639 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  44. Stroustrup, B.: The C++ Programming Language. Pearson Education, New Jersey (2013)zbMATHGoogle Scholar
  45. Tiefelsdorf, M., Griffith, D.A.: Semiparametric filtering of spatial autocorrelation: the eigenvector approach. Environ. Plan. A 39(5), 1193 (2007)CrossRefGoogle Scholar
  46. Torabi, M.: Hierarchical multivariate mixture generalized linear models for the analysis of spatial data: an application to disease mapping. Biom. J. 58(5), 1138–1150 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  47. U.S. Census Bureau: 2020 Census operational plan: a new design for the 21st century (2015)Google Scholar
  48. Ver Hoef, J.M., Jansen, J.K.: Space-time zero-inflated count models of harbor seals. Environmetrics 18(7), 697–712 (2007)MathSciNetCrossRefGoogle Scholar
  49. Wall, M.: A close look at the spatial structure implied by the CAR and SAR models. J. Stat. Plan. Inference 121(2), 311–324 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  50. Wikle, C.K., Anderson, C.J.: Climatological analysis of tornado report counts using a hierarchical Bayesian spatiotemporal model. J. Geophys. Res. Atmos. (1984–2012) 108(D24), 1–15 (2003). CrossRefGoogle Scholar
  51. Young, D.S., Raim, A.M., Johnson, N.R.: Zero-inflated modelling for characterizing coverage errors of extracts from the US Census Bureau’s Master Address File. J. R. Stat. Soc. Ser. A Stat. Soc. 180(1), 73–97 (2017)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Donald Musgrove
    • 1
  • Derek S. Young
    • 2
    Email author
  • John Hughes
    • 3
  • Lynn E. Eberly
    • 4
  1. 1.MedtronicMinneapolisUSA
  2. 2.Department of StatisticsUniversity of KentuckyLexingtonUSA
  3. 3.Department of Biostatistics and InformaticsUniversity of ColoradoDenverUSA
  4. 4.Division of BiostatisticsUniversity of MinnesotaMinneapolisUSA

Personalised recommendations