Skip to main content

Advertisement

Log in

Spatial Regression Modeling for Compositional Data With Many Zeros

  • Published:
Journal of Agricultural, Biological, and Environmental Statistics Aims and scope Submit manuscript

Abstract

Compositional data analysis considers vectors of nonnegative-valued variables subject to a unit-sum constraint. Our interest lies in spatial compositional data, in particular, land use/land cover (LULC) data in the northeastern United States. Here, the observations are vectors providing the proportions of LULC types observed in each 3 km×3 km grid cell, yielding order 104 cells. On the same grid cells, we have an additional compositional dataset supplying forest fragmentation proportions. Potentially useful and available covariates include elevation range, road length, population, median household income, and housing levels.

We propose a spatial regression model that is also able to capture flexible dependence among the components of the observation vectors at each location as well as spatial dependence across the locations of the simplex-restricted measurements. A key issue is the high incidence of observed zero proportions for the LULC dataset, requiring incorporation of local point masses at 0. We build a hierarchical model prescribing a power scaling first stage and using latent variables at the second stage with spatial structure for these variables supplied through a multivariate CAR specification. Analyses for the LULC and forest fragmentation data illustrate the interpretation of the regression coefficients and the benefit of incorporating spatial smoothing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitchison, J. (1986), The Statistical Analysis of Compositional Data, New York: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Aitchison, J., and Egozcue, J. J. (2005), “Compositional Data Analysis: Where Are We and Where Should We Be Heading?” Mathematical Geology, 37, 829–850.

    Article  MathSciNet  MATH  Google Scholar 

  • Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004), Hierarchical Modeling and Analysis for Spatial Data, Boca Raton: Chapman and Hall/CRC Press.

    MATH  Google Scholar 

  • Besag, J. (1974), “Spatial Interaction and the Statistical Analysis of Lattice Systems,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 36, 192–236.

    MathSciNet  MATH  Google Scholar 

  • Besag, J., Green, P., Higdon, D., and Mengersen, K. (1995), “Bayesian Computation and Stochastic Systems,” Statistical Science, 10, 3–66.

    Article  MathSciNet  MATH  Google Scholar 

  • Billheimer, D., Cardoso, T., Freeman, E., Guttorp, P., Ko, H.-W., and Silkey, M. (1997), “Natural Variability of Benthic Species Composition in the Delaware Bay,” Environmental and Ecological Statistics, 4, 95–115.

    Article  Google Scholar 

  • Butler, A., and Glasbey, C. (2009), “Corrigendum: A Latent Gaussian Model for Compositional Data With Zeros,” Journal of the Royal Statistical Society. Series C. Applied Statistics, 58, 141.

    Article  MathSciNet  Google Scholar 

  • Chakraborty, A., Gelfand, A., Wilson, A. M., Latimer, A. M., and Silander, J. A. (2010), “Modeling Large Scale Species Abundance With Latent Spatial Processes,” The Annals of Applied Statistics, 4, 1403–1429.

    Article  MathSciNet  MATH  Google Scholar 

  • Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., and Barceló-Vidal, C. (2003), “Isometric Logratio Transformations for Compositional Data Analysis,” Mathematical Geology, 35, 279–300.

    Article  MathSciNet  Google Scholar 

  • Fry, J., Fry, T., and McLaren, K. (2000), “Compositional Data Analysis and Zeros in Micro Data,” Applied Economics, 32, 953–959.

    Article  Google Scholar 

  • Fry, J. A., Coan, M. J., Homer, C. G., Meyer, D. K., and Wickham, J. (2009), “Completion of the National Land Cover Database (NLCD) 1992–2001 Land Cover Change Retrofit Product,” U.S. Geological Survey Open-File Report 2008–1379, 18 p.

  • Gelfand, A. E., and Vounatsou, P. (2003), “Proper Multivariate Conditional Autoregressive Models for Spatial Data Analysis,” Biostatistics, 4, 11–25.

    Article  MATH  Google Scholar 

  • Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004), “Nonstationary Multivariate Process Modelling Through Spatially Varying Coregionalization” (with discussion), Test, 13, 1–50.

    Article  MathSciNet  Google Scholar 

  • Gneiting, T., and Raftery, A. E. (2007), “Strictly Proper Scoring Rules, Prediction, and Estimation,” Journal of the American Statistical Association, 102, 359–378.

    Article  MathSciNet  MATH  Google Scholar 

  • Haslett, J., Whiley, M., Bhattacharya, S., Salter-Townshend, M., Wilson, S. P., Allen, J. R. M., Huntley, B., and Mitchell, F. J. G. (2006), “Bayesian Palaeoclimate Reconstruction,” Journal of the Royal Statistical Society. Series A. Statistics in Society, 169, 395–438.

    Article  MathSciNet  Google Scholar 

  • Hughes, J., and Haran, M. (2013), “Dimension Reduction and Alleviation of Confounding for Spatial Generalized Linear Mixed Models,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 75, 139–159.

    Article  MathSciNet  Google Scholar 

  • Kent, J. T. (1982), “The Fisher-Bingham Distribution on the Sphere,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 44, 71–80.

    MathSciNet  MATH  Google Scholar 

  • Mardia, K. V. (1988), “Multi-dimensional Multivariate Gaussian Markov Random Fields With Application to Image Processing,” Journal of Multivariate Analysis, 284, 265–284.

    Article  MathSciNet  Google Scholar 

  • Martín-Fernández, J. A., Barcelo-Vidal, C., and Pawlowsky-Glahn, V. (2003), “Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation,” Mathematical Geology, 35, 253–278.

    Article  Google Scholar 

  • Minnesota Population Center (2004), “National Historical Geographic Information System: Pre-release Version, 0.1,” University of Minnesota, Minneapolis, MN, available at: http://www.nhgis.org/.

  • National Oceanic Atmospheric Administration (2006), “Coastal Change Analysis Program Land Cover,” available at: http://www.csc.noaa.gov/crs/lca/northeast.html.

  • Parent, J., and Hurd, J. (2010), “Landscape Fragmentation Tool (LFT v2.0).” Center for Land Use Education and Research, available at: http://clear.uconn.edu/tools/lft/lft2/index.htm.

  • Plummer, M., Best, N., Cowles, K., and Vines, K. (2006), “CODA: Convergence Diagnosis and Output Analysis for MCMC,” R News, 6, 7–11.

    Google Scholar 

  • R Core Team (2012), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing: Vienna. ISBN:3-900051-07-0.

    Google Scholar 

  • Reich, B. J., Hodges, J. S., and Zadnik, V. (2006), “Effects of Residual Smoothing on the Posterior of the Fixed Effects in Disease-Mapping Models,” Biometrics, 62, 1197–1206.

    Article  MathSciNet  MATH  Google Scholar 

  • Salter-Townshend, M., and Haslett, J. (2006), “Modelling Zero Inflation of Compositional Data,” in Proceedings of the 21st International Workshop on Statistical Modelling, pp. 448–456.

    Google Scholar 

  • Scealy, J. L., and Welsh, A. H. (2011), “Regression for Compositional Data by Using Distributions Defined on the Hypersphere,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 73, 351–375.

    Article  MathSciNet  Google Scholar 

  • Stephens, M. A. (1982), “Use of the von Mises Distribution to Analyse Continuous Proportions,” Biometrika, 69, 197–203.

    Article  MathSciNet  Google Scholar 

  • Stewart, C., and Field, C. (2010), “Managing the Essential Zeros in Quantitative Fatty Acid Signature Analysis,” Journal of Agricultural, Biological, and Environmental Statistics, 16, 45–69.

    Article  MathSciNet  Google Scholar 

  • Tjelmeland, H., and Lund, K. V. (2003), “Bayesian Modelling of Spatial Compositional Data,” Journal of Applied Statistics, 30, 87–100.

    Article  MathSciNet  MATH  Google Scholar 

  • Tsagris, M. T., Preston, S., and Wood, A.T. (2011), “A Data-Based Power Transformation for Compositional Data,” in Proceedings of CoDaWork: 4th International Workshop on Compositional Data Analysis, eds. J. Egozcue, R. Tolosana-Delgado, and M. Ortego.

    Google Scholar 

  • Unger, D. A. (1985), “A Method to Estimate the Continuous Ranked Probability Score,” in Preprints of the Ninth Conference on Probability and Statistics in Atmospheric Sciences, Virginia Beach, Virginia, Boston: American Meteorological Society, pp. 206–213.

    Google Scholar 

  • U.S. Census Bureau (2008), “TIGER/Line Shapefiles [machine-readable data files],” available at: http://www.census.gov/geo/maps-data/data/tiger.html.

  • U.S. Geological Survey (1999), “National Elevation Dataset,” available at: http://nationalmap.gov/viewer.html.

  • van den Boogaart, K. G., and Tolosana-Delgado, R. (2008), “Compositions: A Unified R Package to Analyze Compositional Data,” Computers and Geosciences, 34, 320–338.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan E. Gelfand.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leininger, T.J., Gelfand, A.E., Allen, J.M. et al. Spatial Regression Modeling for Compositional Data With Many Zeros. JABES 18, 314–334 (2013). https://doi.org/10.1007/s13253-013-0145-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-013-0145-y

Key Words

Navigation