Abstract
Compositional data analysis considers vectors of nonnegative-valued variables subject to a unit-sum constraint. Our interest lies in spatial compositional data, in particular, land use/land cover (LULC) data in the northeastern United States. Here, the observations are vectors providing the proportions of LULC types observed in each 3 km×3 km grid cell, yielding order 104 cells. On the same grid cells, we have an additional compositional dataset supplying forest fragmentation proportions. Potentially useful and available covariates include elevation range, road length, population, median household income, and housing levels.
We propose a spatial regression model that is also able to capture flexible dependence among the components of the observation vectors at each location as well as spatial dependence across the locations of the simplex-restricted measurements. A key issue is the high incidence of observed zero proportions for the LULC dataset, requiring incorporation of local point masses at 0. We build a hierarchical model prescribing a power scaling first stage and using latent variables at the second stage with spatial structure for these variables supplied through a multivariate CAR specification. Analyses for the LULC and forest fragmentation data illustrate the interpretation of the regression coefficients and the benefit of incorporating spatial smoothing.
Similar content being viewed by others
References
Aitchison, J. (1986), The Statistical Analysis of Compositional Data, New York: Chapman and Hall.
Aitchison, J., and Egozcue, J. J. (2005), “Compositional Data Analysis: Where Are We and Where Should We Be Heading?” Mathematical Geology, 37, 829–850.
Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004), Hierarchical Modeling and Analysis for Spatial Data, Boca Raton: Chapman and Hall/CRC Press.
Besag, J. (1974), “Spatial Interaction and the Statistical Analysis of Lattice Systems,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 36, 192–236.
Besag, J., Green, P., Higdon, D., and Mengersen, K. (1995), “Bayesian Computation and Stochastic Systems,” Statistical Science, 10, 3–66.
Billheimer, D., Cardoso, T., Freeman, E., Guttorp, P., Ko, H.-W., and Silkey, M. (1997), “Natural Variability of Benthic Species Composition in the Delaware Bay,” Environmental and Ecological Statistics, 4, 95–115.
Butler, A., and Glasbey, C. (2009), “Corrigendum: A Latent Gaussian Model for Compositional Data With Zeros,” Journal of the Royal Statistical Society. Series C. Applied Statistics, 58, 141.
Chakraborty, A., Gelfand, A., Wilson, A. M., Latimer, A. M., and Silander, J. A. (2010), “Modeling Large Scale Species Abundance With Latent Spatial Processes,” The Annals of Applied Statistics, 4, 1403–1429.
Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., and Barceló-Vidal, C. (2003), “Isometric Logratio Transformations for Compositional Data Analysis,” Mathematical Geology, 35, 279–300.
Fry, J., Fry, T., and McLaren, K. (2000), “Compositional Data Analysis and Zeros in Micro Data,” Applied Economics, 32, 953–959.
Fry, J. A., Coan, M. J., Homer, C. G., Meyer, D. K., and Wickham, J. (2009), “Completion of the National Land Cover Database (NLCD) 1992–2001 Land Cover Change Retrofit Product,” U.S. Geological Survey Open-File Report 2008–1379, 18 p.
Gelfand, A. E., and Vounatsou, P. (2003), “Proper Multivariate Conditional Autoregressive Models for Spatial Data Analysis,” Biostatistics, 4, 11–25.
Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004), “Nonstationary Multivariate Process Modelling Through Spatially Varying Coregionalization” (with discussion), Test, 13, 1–50.
Gneiting, T., and Raftery, A. E. (2007), “Strictly Proper Scoring Rules, Prediction, and Estimation,” Journal of the American Statistical Association, 102, 359–378.
Haslett, J., Whiley, M., Bhattacharya, S., Salter-Townshend, M., Wilson, S. P., Allen, J. R. M., Huntley, B., and Mitchell, F. J. G. (2006), “Bayesian Palaeoclimate Reconstruction,” Journal of the Royal Statistical Society. Series A. Statistics in Society, 169, 395–438.
Hughes, J., and Haran, M. (2013), “Dimension Reduction and Alleviation of Confounding for Spatial Generalized Linear Mixed Models,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 75, 139–159.
Kent, J. T. (1982), “The Fisher-Bingham Distribution on the Sphere,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 44, 71–80.
Mardia, K. V. (1988), “Multi-dimensional Multivariate Gaussian Markov Random Fields With Application to Image Processing,” Journal of Multivariate Analysis, 284, 265–284.
Martín-Fernández, J. A., Barcelo-Vidal, C., and Pawlowsky-Glahn, V. (2003), “Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation,” Mathematical Geology, 35, 253–278.
Minnesota Population Center (2004), “National Historical Geographic Information System: Pre-release Version, 0.1,” University of Minnesota, Minneapolis, MN, available at: http://www.nhgis.org/.
National Oceanic Atmospheric Administration (2006), “Coastal Change Analysis Program Land Cover,” available at: http://www.csc.noaa.gov/crs/lca/northeast.html.
Parent, J., and Hurd, J. (2010), “Landscape Fragmentation Tool (LFT v2.0).” Center for Land Use Education and Research, available at: http://clear.uconn.edu/tools/lft/lft2/index.htm.
Plummer, M., Best, N., Cowles, K., and Vines, K. (2006), “CODA: Convergence Diagnosis and Output Analysis for MCMC,” R News, 6, 7–11.
R Core Team (2012), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing: Vienna. ISBN:3-900051-07-0.
Reich, B. J., Hodges, J. S., and Zadnik, V. (2006), “Effects of Residual Smoothing on the Posterior of the Fixed Effects in Disease-Mapping Models,” Biometrics, 62, 1197–1206.
Salter-Townshend, M., and Haslett, J. (2006), “Modelling Zero Inflation of Compositional Data,” in Proceedings of the 21st International Workshop on Statistical Modelling, pp. 448–456.
Scealy, J. L., and Welsh, A. H. (2011), “Regression for Compositional Data by Using Distributions Defined on the Hypersphere,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 73, 351–375.
Stephens, M. A. (1982), “Use of the von Mises Distribution to Analyse Continuous Proportions,” Biometrika, 69, 197–203.
Stewart, C., and Field, C. (2010), “Managing the Essential Zeros in Quantitative Fatty Acid Signature Analysis,” Journal of Agricultural, Biological, and Environmental Statistics, 16, 45–69.
Tjelmeland, H., and Lund, K. V. (2003), “Bayesian Modelling of Spatial Compositional Data,” Journal of Applied Statistics, 30, 87–100.
Tsagris, M. T., Preston, S., and Wood, A.T. (2011), “A Data-Based Power Transformation for Compositional Data,” in Proceedings of CoDaWork: 4th International Workshop on Compositional Data Analysis, eds. J. Egozcue, R. Tolosana-Delgado, and M. Ortego.
Unger, D. A. (1985), “A Method to Estimate the Continuous Ranked Probability Score,” in Preprints of the Ninth Conference on Probability and Statistics in Atmospheric Sciences, Virginia Beach, Virginia, Boston: American Meteorological Society, pp. 206–213.
U.S. Census Bureau (2008), “TIGER/Line Shapefiles [machine-readable data files],” available at: http://www.census.gov/geo/maps-data/data/tiger.html.
U.S. Geological Survey (1999), “National Elevation Dataset,” available at: http://nationalmap.gov/viewer.html.
van den Boogaart, K. G., and Tolosana-Delgado, R. (2008), “Compositions: A Unified R Package to Analyze Compositional Data,” Computers and Geosciences, 34, 320–338.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Leininger, T.J., Gelfand, A.E., Allen, J.M. et al. Spatial Regression Modeling for Compositional Data With Many Zeros. JABES 18, 314–334 (2013). https://doi.org/10.1007/s13253-013-0145-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-013-0145-y