Spatial Regression Modeling for Compositional Data With Many Zeros

  • Thomas J. Leininger
  • Alan E. Gelfand
  • Jenica M. Allen
  • John A. SilanderJr.


Compositional data analysis considers vectors of nonnegative-valued variables subject to a unit-sum constraint. Our interest lies in spatial compositional data, in particular, land use/land cover (LULC) data in the northeastern United States. Here, the observations are vectors providing the proportions of LULC types observed in each 3 km×3 km grid cell, yielding order 104 cells. On the same grid cells, we have an additional compositional dataset supplying forest fragmentation proportions. Potentially useful and available covariates include elevation range, road length, population, median household income, and housing levels.

We propose a spatial regression model that is also able to capture flexible dependence among the components of the observation vectors at each location as well as spatial dependence across the locations of the simplex-restricted measurements. A key issue is the high incidence of observed zero proportions for the LULC dataset, requiring incorporation of local point masses at 0. We build a hierarchical model prescribing a power scaling first stage and using latent variables at the second stage with spatial structure for these variables supplied through a multivariate CAR specification. Analyses for the LULC and forest fragmentation data illustrate the interpretation of the regression coefficients and the benefit of incorporating spatial smoothing.

Key Words

Areal data Conditionally autoregressive model Continuous ranked probability score Hierarchical modeling Markov chain Monte Carlo 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aitchison, J. (1986), The Statistical Analysis of Compositional Data, New York: Chapman and Hall. CrossRefzbMATHGoogle Scholar
  2. Aitchison, J., and Egozcue, J. J. (2005), “Compositional Data Analysis: Where Are We and Where Should We Be Heading?” Mathematical Geology, 37, 829–850. MathSciNetCrossRefzbMATHGoogle Scholar
  3. Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004), Hierarchical Modeling and Analysis for Spatial Data, Boca Raton: Chapman and Hall/CRC Press. zbMATHGoogle Scholar
  4. Besag, J. (1974), “Spatial Interaction and the Statistical Analysis of Lattice Systems,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 36, 192–236. MathSciNetzbMATHGoogle Scholar
  5. Besag, J., Green, P., Higdon, D., and Mengersen, K. (1995), “Bayesian Computation and Stochastic Systems,” Statistical Science, 10, 3–66. MathSciNetCrossRefzbMATHGoogle Scholar
  6. Billheimer, D., Cardoso, T., Freeman, E., Guttorp, P., Ko, H.-W., and Silkey, M. (1997), “Natural Variability of Benthic Species Composition in the Delaware Bay,” Environmental and Ecological Statistics, 4, 95–115. CrossRefGoogle Scholar
  7. Butler, A., and Glasbey, C. (2009), “Corrigendum: A Latent Gaussian Model for Compositional Data With Zeros,” Journal of the Royal Statistical Society. Series C. Applied Statistics, 58, 141. MathSciNetCrossRefGoogle Scholar
  8. Chakraborty, A., Gelfand, A., Wilson, A. M., Latimer, A. M., and Silander, J. A. (2010), “Modeling Large Scale Species Abundance With Latent Spatial Processes,” The Annals of Applied Statistics, 4, 1403–1429. MathSciNetCrossRefzbMATHGoogle Scholar
  9. Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., and Barceló-Vidal, C. (2003), “Isometric Logratio Transformations for Compositional Data Analysis,” Mathematical Geology, 35, 279–300. MathSciNetCrossRefGoogle Scholar
  10. Fry, J., Fry, T., and McLaren, K. (2000), “Compositional Data Analysis and Zeros in Micro Data,” Applied Economics, 32, 953–959. CrossRefGoogle Scholar
  11. Fry, J. A., Coan, M. J., Homer, C. G., Meyer, D. K., and Wickham, J. (2009), “Completion of the National Land Cover Database (NLCD) 1992–2001 Land Cover Change Retrofit Product,” U.S. Geological Survey Open-File Report 2008–1379, 18 p. Google Scholar
  12. Gelfand, A. E., and Vounatsou, P. (2003), “Proper Multivariate Conditional Autoregressive Models for Spatial Data Analysis,” Biostatistics, 4, 11–25. CrossRefzbMATHGoogle Scholar
  13. Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004), “Nonstationary Multivariate Process Modelling Through Spatially Varying Coregionalization” (with discussion), Test, 13, 1–50. MathSciNetCrossRefGoogle Scholar
  14. Gneiting, T., and Raftery, A. E. (2007), “Strictly Proper Scoring Rules, Prediction, and Estimation,” Journal of the American Statistical Association, 102, 359–378. MathSciNetCrossRefzbMATHGoogle Scholar
  15. Haslett, J., Whiley, M., Bhattacharya, S., Salter-Townshend, M., Wilson, S. P., Allen, J. R. M., Huntley, B., and Mitchell, F. J. G. (2006), “Bayesian Palaeoclimate Reconstruction,” Journal of the Royal Statistical Society. Series A. Statistics in Society, 169, 395–438. MathSciNetCrossRefGoogle Scholar
  16. Hughes, J., and Haran, M. (2013), “Dimension Reduction and Alleviation of Confounding for Spatial Generalized Linear Mixed Models,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 75, 139–159. MathSciNetCrossRefGoogle Scholar
  17. Kent, J. T. (1982), “The Fisher-Bingham Distribution on the Sphere,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 44, 71–80. MathSciNetzbMATHGoogle Scholar
  18. Mardia, K. V. (1988), “Multi-dimensional Multivariate Gaussian Markov Random Fields With Application to Image Processing,” Journal of Multivariate Analysis, 284, 265–284. MathSciNetCrossRefGoogle Scholar
  19. Martín-Fernández, J. A., Barcelo-Vidal, C., and Pawlowsky-Glahn, V. (2003), “Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation,” Mathematical Geology, 35, 253–278. CrossRefGoogle Scholar
  20. Minnesota Population Center (2004), “National Historical Geographic Information System: Pre-release Version, 0.1,” University of Minnesota, Minneapolis, MN, available at:
  21. National Oceanic Atmospheric Administration (2006), “Coastal Change Analysis Program Land Cover,” available at:
  22. Parent, J., and Hurd, J. (2010), “Landscape Fragmentation Tool (LFT v2.0).” Center for Land Use Education and Research, available at:
  23. Plummer, M., Best, N., Cowles, K., and Vines, K. (2006), “CODA: Convergence Diagnosis and Output Analysis for MCMC,” R News, 6, 7–11. Google Scholar
  24. R Core Team (2012), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing: Vienna. ISBN:3-900051-07-0. Google Scholar
  25. Reich, B. J., Hodges, J. S., and Zadnik, V. (2006), “Effects of Residual Smoothing on the Posterior of the Fixed Effects in Disease-Mapping Models,” Biometrics, 62, 1197–1206. MathSciNetCrossRefzbMATHGoogle Scholar
  26. Salter-Townshend, M., and Haslett, J. (2006), “Modelling Zero Inflation of Compositional Data,” in Proceedings of the 21st International Workshop on Statistical Modelling, pp. 448–456. Google Scholar
  27. Scealy, J. L., and Welsh, A. H. (2011), “Regression for Compositional Data by Using Distributions Defined on the Hypersphere,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 73, 351–375. MathSciNetCrossRefGoogle Scholar
  28. Stephens, M. A. (1982), “Use of the von Mises Distribution to Analyse Continuous Proportions,” Biometrika, 69, 197–203. MathSciNetCrossRefGoogle Scholar
  29. Stewart, C., and Field, C. (2010), “Managing the Essential Zeros in Quantitative Fatty Acid Signature Analysis,” Journal of Agricultural, Biological, and Environmental Statistics, 16, 45–69. MathSciNetCrossRefGoogle Scholar
  30. Tjelmeland, H., and Lund, K. V. (2003), “Bayesian Modelling of Spatial Compositional Data,” Journal of Applied Statistics, 30, 87–100. MathSciNetCrossRefzbMATHGoogle Scholar
  31. Tsagris, M. T., Preston, S., and Wood, A.T. (2011), “A Data-Based Power Transformation for Compositional Data,” in Proceedings of CoDaWork: 4th International Workshop on Compositional Data Analysis, eds. J. Egozcue, R. Tolosana-Delgado, and M. Ortego. Google Scholar
  32. Unger, D. A. (1985), “A Method to Estimate the Continuous Ranked Probability Score,” in Preprints of the Ninth Conference on Probability and Statistics in Atmospheric Sciences, Virginia Beach, Virginia, Boston: American Meteorological Society, pp. 206–213. Google Scholar
  33. U.S. Census Bureau (2008), “TIGER/Line Shapefiles [machine-readable data files],” available at:
  34. U.S. Geological Survey (1999), “National Elevation Dataset,” available at:
  35. van den Boogaart, K. G., and Tolosana-Delgado, R. (2008), “Compositions: A Unified R Package to Analyze Compositional Data,” Computers and Geosciences, 34, 320–338. CrossRefGoogle Scholar

Copyright information

© International Biometric Society 2013

Authors and Affiliations

  • Thomas J. Leininger
    • 1
  • Alan E. Gelfand
    • 1
  • Jenica M. Allen
    • 2
  • John A. SilanderJr.
    • 2
  1. 1.Department of Statistical ScienceDuke UniversityDurhamUSA
  2. 2.Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsUSA

Personalised recommendations