Abstract
Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
Supplementary materials accompanying this paper appear online.
Similar content being viewed by others
References
Banerjee, S., Gelfand, A., Finley, A., and Sang, H. (2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society, Series B 70, 825–848.
Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). Dirichlet-laplace priors for optimal shrinkage. Journal of the American Statistical Association 110, 1479–1490.
Clayton, D. G., Bernardinelli, L., and Montomoli, C. (1993). Spatial correlation in ecological analysis. International Journal of Epidemiology 22, 1193–1202.
Cressie, N., and Wikle, C. (2011). Statistics for Spatio-Temporal Data. Hoboken, New Jersey: John Wiley & Sons.
Diggle, P. J., Tawn, J., and Moyeed, R. (1998). Model-based geostatistics. Journal of the Royal Statistical Society, Series C 47, 299–350.
Evans, T. S., Kirchgessner, M. S., Eyler, B., Ryan, C. W., and Walter, W. D. (2016). Habitat influences distribution of chronic wasting disease in white-tailed deer. The Journal of Wildlife Management 80, 284–291.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing 24, 997–1016.
Givens, G. H., and Hoeting, J. A. (2012). Computational Statistics. Hoboken, New Jersey: John Wiley & Sons.
Gotway, C. A., and Stroup, W. W. (1997). A generalized linear model approach to spatial data analysis and prediction. Journal of Agricultural, Biological, and Environmental Statistics 2, 157–178.
Gunes, F., and Bondell, H. D. (2012). A confidence region approach to tuning for variable selection. Journal of Computational and Graphical Statistics 21, 295–314.
Hanks, E. M., Schliep, E. M., Hooten, M. B., and Hoeting, J. A. (2015). Restricted spatial regression in practice: geostatistical models, confounding, and robustness under model misspecification. Environmetrics 26, 243–254.
Hefley, T. J., Broms K. M., Brost B. M., Buderman, F. E., Kay, S. L., Scharf J. R., Williams, P. J. and Hooten, M. B. (2016). The basis function approach for modeling autocorrelation in ecological data. Ecology. doi:10.1002/ecy.1674
Hefley, T. J., and Hooten, M. B. (2016). Hierarchical species distribution models. Current Landscape Ecology Reports 1, 87–97.
Higdon, D. (2002). Space and space-time modeling using process convolutions. Quantitative Methods for Current Environmental Issues 3754.
Hodges, J. S., and Reich, B. J. (2010). Adding spatially-correlated errors can mess up the fixed effect you love. The American Statistician 64, 325–334.
Hoerl, A. E., and Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67.
Homer, C. G., Dewitz, J. A., Yang, L., Jin, S., Danielson, P., Xian, G.,et al. (2015). Completion of the 2011 National Land Cover Database for the conterminous United States-Representing a decade of land cover change information. Photogrammetric Engineering and Remote Sensing 81, 345–354.
Hooten, M. B.,, Hanks, E. M., Johnson, D. S., and Alldredge, M. W. (2013). Reconciling resource utilization and resource selection functions. Journal of Animal Ecology 82, 1146–1154.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological Monographs 83, 3–28.
Hooten, M. B., Larsen, D. R., and Wikle, C. K. (2003). Predicting the spatial distribution of ground flora on large domains using a hierarchical Bayesian model. Landscape Ecology 18, 487–502.
Hsu, N., Chang, Y., and Huang, H. (2012). A group lasso approach for non-stationary spatial–temporal covariance estimation. Environmetrics 23, 12–23.
Huang, H., Hsu, N., Theobald, D.M., and Breidt, F.J. (2010). Spatial lasso with applications to GIS model selection. Journal of Computational and Graphical Statistics 19, 963–983.
Hughes, J. and Haran, M. (2013). Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. Journal of the Royal Statistical Society, Series B 75, 139–159.
Hui, F., Müller, S., and Welsh, A. (2016). Joint selection in mixed models using regularized PQL. Journal of the American Statistical Association doi:10.1080/01621459.2016.1215989
Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis 5, 369–411.
Mallick, H. and Yi, N. (2013). Bayesian methods for high dimensional linear models. Journal of Biometrics & Biostatistics S1, 005.
Murakami, D., and Griffith, D. A. (2015). Random effects specifications in eigenvector spatial filtering: a simulation study. Journal of Geographical Systems 17, 311–331.
Paciorek, C. (2010). The importance of scale for spatial-confounding bias and precision of spatial regression estimators. Statistical Science 25, 107–125.
Park, T., and Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical Association 103, 681–686.
R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Reich, B. J., Hodges, J. S., and Zadnik, V. (2006). Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62, 1197–1206.
Schabenberger, O., and Gotway, C. A. (2004). Statistical Methods for Spatial Data Analysis. Boca Raton, Florida: Chapman & Hall/CRC Press.
Schmidt, A. M., Rodríguez, M. A., and Capistrano, E. S. (2015). Population counts along elliptical habitat contours: hierarchical modeling using poisson-lognormal mixtures with nonstationary spatial structure. Annals of Applied Statistics 9, 1372–1393.
Stroup, W. W. (2012). Generalized Linear Mixed Models: Modern Concepts, Methods and Applications. Boca Raton, Florida: CRC Press.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288.
Waller, L. A. and Gotway, C. A. (2004). Applied Spatial Statistics for Public Health Data. Hoboken, New Jersey: John Wiley & Sons.
Walter, D. W., Walsh, D. P., Farnsworth, M. L., Winkelman, D. L., and Miller, M. W. (2011). Soil clay content underlies prion infection odds. Nature Communicaitons 2, 200.
Wikle, C. K. (2010). Low Rank Representations for Spatial Processes in Handbook of Spatial Statistics, pgs. 107–118. Boca Raton, Florida: CRC Press.
Williams, E. S., Miller, M. W., Kreeger, T. J., Kahn, R. H., and Thorne, E. T. (2002). Chronic wasting disease of deer and elk: a review with recommendations for management. The Journal of Wildlife Management 3, 551–563.
Zhu, J., Huang, H., and Reyes, P. (2010). On selection of spatial linear models for lattice data. Journal of the Royal Statistical Society, Series B 72, 389–402.
Zhu, Z. and Liu, Y. (2009). Estimating spatial covariance using penalised likelihood with weighted \(L_{1}\) penalty. Journal of Nonparametric Statistics 21, 925–942.
Acknowledgements
We would like to acknowledge Dennis Heisey for his early contributions to development of this research endeavor. We thank Jun Zhu and two anonymous reviewers for valuable insight and discussions about this work. We thank the staff of the Wisconsin Department of Natural Resources for their collaboration in obtaining deer tissue samples and the Wisconsin hunters who provided them. In particular, we thank Erin Larson for maintaining the CWD sample data base. Funding for this project was provided by the USGS National Wildlife Health Center via Grant G14AC00366. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
13253_2016_274_MOESM1_ESM.pdf
Supplementary Materials: Full-conditional distributions and MCMC algorithms are presented in Appendix A. R code to implement MCMC algorithms for the spatial, group lasso, and non-spatial probit models is given in Appendix B. Data, R code, and details of implementation for the species distribution example (Appendix C), disease risk factor analysis example (Appendix D), and the simulated data example (Appendix E). (pdf4453KB)
Rights and permissions
About this article
Cite this article
Hefley, T.J., Hooten, M.B., Hanks, E.M. et al. The Bayesian Group Lasso for Confounded Spatial Data. JABES 22, 42–59 (2017). https://doi.org/10.1007/s13253-016-0274-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-016-0274-1