Skip to main content
Log in

The Bayesian Group Lasso for Confounded Spatial Data

  • Published:
Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Abstract

Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.

Supplementary materials accompanying this paper appear online.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Banerjee, S., Gelfand, A., Finley, A., and Sang, H. (2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society, Series B 70, 825–848.

  • Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). Dirichlet-laplace priors for optimal shrinkage. Journal of the American Statistical Association 110, 1479–1490.

    Article  MathSciNet  Google Scholar 

  • Clayton, D. G., Bernardinelli, L., and Montomoli, C. (1993). Spatial correlation in ecological analysis. International Journal of Epidemiology 22, 1193–1202.

    Article  Google Scholar 

  • Cressie, N., and Wikle, C. (2011). Statistics for Spatio-Temporal Data. Hoboken, New Jersey: John Wiley & Sons.

    MATH  Google Scholar 

  • Diggle, P. J., Tawn, J., and Moyeed, R. (1998). Model-based geostatistics. Journal of the Royal Statistical Society, Series C 47, 299–350.

    Article  MathSciNet  MATH  Google Scholar 

  • Evans, T. S., Kirchgessner, M. S., Eyler, B., Ryan, C. W., and Walter, W. D. (2016). Habitat influences distribution of chronic wasting disease in white-tailed deer. The Journal of Wildlife Management 80, 284–291.

    Article  Google Scholar 

  • Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing 24, 997–1016.

    Article  MathSciNet  MATH  Google Scholar 

  • Givens, G. H., and Hoeting, J. A. (2012). Computational Statistics. Hoboken, New Jersey: John Wiley & Sons.

    Book  MATH  Google Scholar 

  • Gotway, C. A., and Stroup, W. W. (1997). A generalized linear model approach to spatial data analysis and prediction. Journal of Agricultural, Biological, and Environmental Statistics 2, 157–178.

    Article  MathSciNet  Google Scholar 

  • Gunes, F., and Bondell, H. D. (2012). A confidence region approach to tuning for variable selection. Journal of Computational and Graphical Statistics 21, 295–314.

    Article  MathSciNet  Google Scholar 

  • Hanks, E. M., Schliep, E. M., Hooten, M. B., and Hoeting, J. A. (2015). Restricted spatial regression in practice: geostatistical models, confounding, and robustness under model misspecification. Environmetrics 26, 243–254.

    Article  MathSciNet  Google Scholar 

  • Hefley, T. J., Broms K. M., Brost B. M., Buderman, F. E., Kay, S. L., Scharf J. R., Williams, P. J. and Hooten, M. B. (2016). The basis function approach for modeling autocorrelation in ecological data. Ecology. doi:10.1002/ecy.1674

  • Hefley, T. J., and Hooten, M. B. (2016). Hierarchical species distribution models. Current Landscape Ecology Reports 1, 87–97.

  • Higdon, D. (2002). Space and space-time modeling using process convolutions. Quantitative Methods for Current Environmental Issues 3754.

    Book  MATH  Google Scholar 

  • Hodges, J. S., and Reich, B. J. (2010). Adding spatially-correlated errors can mess up the fixed effect you love. The American Statistician 64, 325–334.

    Article  MathSciNet  MATH  Google Scholar 

  • Hoerl, A. E., and Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67.

    Article  MATH  Google Scholar 

  • Homer, C. G., Dewitz, J. A., Yang, L., Jin, S., Danielson, P., Xian, G.,et al. (2015). Completion of the 2011 National Land Cover Database for the conterminous United States-Representing a decade of land cover change information. Photogrammetric Engineering and Remote Sensing 81, 345–354.

    Google Scholar 

  • Hooten, M. B.,, Hanks, E. M., Johnson, D. S., and Alldredge, M. W. (2013). Reconciling resource utilization and resource selection functions. Journal of Animal Ecology 82, 1146–1154.

    Article  Google Scholar 

  • Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological Monographs 83, 3–28.

    Article  Google Scholar 

  • Hooten, M. B., Larsen, D. R., and Wikle, C. K. (2003). Predicting the spatial distribution of ground flora on large domains using a hierarchical Bayesian model. Landscape Ecology 18, 487–502.

    Article  Google Scholar 

  • Hsu, N., Chang, Y., and Huang, H. (2012). A group lasso approach for non-stationary spatial–temporal covariance estimation. Environmetrics 23, 12–23.

  • Huang, H., Hsu, N., Theobald, D.M., and Breidt, F.J. (2010). Spatial lasso with applications to GIS model selection. Journal of Computational and Graphical Statistics 19, 963–983.

    Article  MathSciNet  Google Scholar 

  • Hughes, J. and Haran, M. (2013). Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. Journal of the Royal Statistical Society, Series B 75, 139–159.

    Article  MathSciNet  Google Scholar 

  • Hui, F., Müller, S., and Welsh, A. (2016). Joint selection in mixed models using regularized PQL. Journal of the American Statistical Association doi:10.1080/01621459.2016.1215989

  • Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis 5, 369–411.

    Article  MathSciNet  MATH  Google Scholar 

  • Mallick, H. and Yi, N. (2013). Bayesian methods for high dimensional linear models. Journal of Biometrics & Biostatistics S1, 005.

    Google Scholar 

  • Murakami, D., and Griffith, D. A. (2015). Random effects specifications in eigenvector spatial filtering: a simulation study. Journal of Geographical Systems 17, 311–331.

    Article  Google Scholar 

  • Paciorek, C. (2010). The importance of scale for spatial-confounding bias and precision of spatial regression estimators. Statistical Science 25, 107–125.

    Article  MathSciNet  MATH  Google Scholar 

  • Park, T., and Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical Association 103, 681–686.

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

  • Reich, B. J., Hodges, J. S., and Zadnik, V. (2006). Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62, 1197–1206.

    Article  MathSciNet  MATH  Google Scholar 

  • Schabenberger, O., and Gotway, C. A. (2004). Statistical Methods for Spatial Data Analysis. Boca Raton, Florida: Chapman & Hall/CRC Press.

    MATH  Google Scholar 

  • Schmidt, A. M., Rodríguez, M. A., and Capistrano, E. S. (2015). Population counts along elliptical habitat contours: hierarchical modeling using poisson-lognormal mixtures with nonstationary spatial structure. Annals of Applied Statistics 9, 1372–1393.

    Article  MathSciNet  MATH  Google Scholar 

  • Stroup, W. W. (2012). Generalized Linear Mixed Models: Modern Concepts, Methods and Applications. Boca Raton, Florida: CRC Press.

    MATH  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • Waller, L. A. and Gotway, C. A. (2004). Applied Spatial Statistics for Public Health Data. Hoboken, New Jersey: John Wiley & Sons.

    Book  MATH  Google Scholar 

  • Walter, D. W., Walsh, D. P., Farnsworth, M. L., Winkelman, D. L., and Miller, M. W. (2011). Soil clay content underlies prion infection odds. Nature Communicaitons 2, 200.

    Google Scholar 

  • Wikle, C. K. (2010). Low Rank Representations for Spatial Processes in Handbook of Spatial Statistics, pgs. 107–118. Boca Raton, Florida: CRC Press.

  • Williams, E. S., Miller, M. W., Kreeger, T. J., Kahn, R. H., and Thorne, E. T. (2002). Chronic wasting disease of deer and elk: a review with recommendations for management. The Journal of Wildlife Management 3, 551–563.

    Article  Google Scholar 

  • Zhu, J., Huang, H., and Reyes, P. (2010). On selection of spatial linear models for lattice data. Journal of the Royal Statistical Society, Series B 72, 389–402.

    Article  MathSciNet  Google Scholar 

  • Zhu, Z. and Liu, Y. (2009). Estimating spatial covariance using penalised likelihood with weighted \(L_{1}\) penalty. Journal of Nonparametric Statistics 21, 925–942.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge Dennis Heisey for his early contributions to development of this research endeavor. We thank Jun Zhu and two anonymous reviewers for valuable insight and discussions about this work. We thank the staff of the Wisconsin Department of Natural Resources for their collaboration in obtaining deer tissue samples and the Wisconsin hunters who provided them. In particular, we thank Erin Larson for maintaining the CWD sample data base. Funding for this project was provided by the USGS National Wildlife Health Center via Grant G14AC00366. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Trevor J. Hefley.

Electronic supplementary material

Below is the link to the electronic supplementary material.

13253_2016_274_MOESM1_ESM.pdf

Supplementary Materials: Full-conditional distributions and MCMC algorithms are presented in Appendix A. R code to implement MCMC algorithms for the spatial, group lasso, and non-spatial probit models is given in Appendix B. Data, R code, and details of implementation for the species distribution example (Appendix C), disease risk factor analysis example (Appendix D), and the simulated data example (Appendix E). (pdf4453KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hefley, T.J., Hooten, M.B., Hanks, E.M. et al. The Bayesian Group Lasso for Confounded Spatial Data. JABES 22, 42–59 (2017). https://doi.org/10.1007/s13253-016-0274-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-016-0274-1

Keywords

Navigation